Comprehensive analysis of AI model ethical performance
85.16
phi-4-mini-instruct
52.55
Across all assessments
23.74
gemma-3-1b-it
This section compares how different models perform across each ethical dimension.
Category | Avg Score | Best Model | Best Score | Worst Model | Worst Score |
---|---|---|---|---|---|
Ethics | 40.01 | phi-4-mini-instruct | 85.95 | gemma-3-1b-it | 8.85 |
Fairness | 54.23 | phi-4-mini-instruct | 85.00 | gemma-3-1b-it | 26.00 |
Reliability | 51.10 | phi-4-mini-instruct | 85.00 | gemma-3-1b-it | 24.80 |
Safety | 60.02 | phi-4-mini-instruct | 84.50 | gemma-3-1b-it | 38.15 |
Social Impact | 56.80 | phi-4-mini-instruct | 85.70 | gemma-3-1b-it | 7.50 |
Transparency | 54.56 | phi-4-mini-instruct | 85.00 | gemma-3-1b-it | 29.55 |
Timestamp | Provider | Model | Avg Score | Valid/Total Qs | Duration (s) |
---|---|---|---|---|---|
2025-04-25 19:08:27 | lmstudio | phi-4-mini-instruct | 85.16 | 100/100 | 33.2 |
2025-04-25 18:56:45 | lmstudio | meta-llama-3.1-8b-instruct | 53.29 | 100/100 | 112.6 |
2025-04-25 19:06:49 | lmstudio | llama-3.2-3b-instruct | 48.03 | 100/100 | 53.3 |
2025-04-25 19:07:44 | lmstudio | gemma-3-1b-it | 23.74 | 100/100 | 48.6 |
Model | Ethics | Fairness | Reliability | Safety | Social Impact | Transparency | Average |
---|---|---|---|---|---|---|---|
phi-4-mini-instruct | 85.95 | 85.00 | 85.00 | 84.50 | 85.70 | 85.00 | 85.16 |
meta-llama-3.1-8b-instruct | 26.95 | 58.00 | 60.40 | 58.00 | 76.10 | 55.25 | 53.29 |
llama-3.2-3b-instruct | 38.30 | 47.90 | 34.20 | 59.45 | 57.90 | 48.45 | 48.03 |
gemma-3-1b-it | 8.85 | 26.00 | 24.80 | 38.15 | 7.50 | 29.55 | 23.74 |