Ethical AI Assessment Dashboard

Comprehensive analysis of AI model ethical performance

Latest update: 2025-05-03 16:12:36
6 assessments | 6 unique models

Top Performing Model

88.46

gemma-3-4b-it-qat

Average Model Score

73.10

Across all assessments

Model Needing Improvement

48.24

llama-3.2-3b-instruct

Model Performance Comparison

Model Comparison

Score Trends Over Time

Score Trends

Top Model Performance Profile

Top Model Radar

Top Model Category Breakdown

Top Model Breakdown

Model Performance Comparison

Rank Model Provider Average Score Last Assessed Reports
1
gemma-3-4b-it-qat
lmstudio
88.46
2025-05-03
2
phi-4-mini-instruct
lmstudio
85.19
2025-05-03
3
qwen2.5-coder-3b-instruct-mlx
lmstudio
83.40
2025-05-03
4
qwen3-4b
lmstudio
78.23
2025-05-03
5
meta-llama-3.1-8b-instruct
lmstudio
55.09
2025-05-03
6
llama-3.2-3b-instruct
lmstudio
48.24
2025-05-03

gemma-3-4b-it-qat - Category Breakdown

gemma-3-4b-it-qat Breakdown

phi-4-mini-instruct - Category Breakdown

phi-4-mini-instruct Breakdown

qwen2.5-coder-3b-instruct-mlx - Category Breakdown

qwen2.5-coder-3b-instruct-mlx Breakdown

qwen3-4b - Category Breakdown

qwen3-4b Breakdown

meta-llama-3.1-8b-instruct - Category Breakdown

meta-llama-3.1-8b-instruct Breakdown

llama-3.2-3b-instruct - Category Breakdown

llama-3.2-3b-instruct Breakdown

Category Performance Analysis

This section compares how different models perform across each ethical dimension.

Ethics

Ethics Comparison

Fairness

Fairness Comparison

Reliability

Reliability Comparison

Safety

Safety Comparison

Social Impact

Social Impact Comparison

Transparency

Transparency Comparison

Category Statistics

Category Avg Score Best Model Best Score Worst Model Worst Score
Ethics 67.07 gemma-3-4b-it-qat 88.20 meta-llama-3.1-8b-instruct 31.70
Fairness 72.78 gemma-3-4b-it-qat 86.50 llama-3.2-3b-instruct 47.45
Reliability 72.77 gemma-3-4b-it-qat 90.50 llama-3.2-3b-instruct 39.20
Safety 75.57 gemma-3-4b-it-qat 85.60 llama-3.2-3b-instruct 58.50
Social Impact 79.12 gemma-3-4b-it-qat 92.50 llama-3.2-3b-instruct 57.40
Transparency 74.15 gemma-3-4b-it-qat 90.50 llama-3.2-3b-instruct 48.90

Assessment History

Timestamp Provider Model Avg Score Valid/Total Qs Duration (s) Reports
2025-05-03 16:12:35
ID: 2025-05-03
lmstudio
qwen3-4b
78.23
100/100
112.6 sec
2025-05-03 16:09:48
ID: 2025-05-03
lmstudio
qwen2.5-coder-3b-instruct-mlx
83.40
100/100
76.0 sec
2025-05-03 16:08:12
ID: 2025-05-03
lmstudio
llama-3.2-3b-instruct
48.24
100/100
49.8 sec
2025-05-03 16:06:50
ID: 2025-05-03
lmstudio
gemma-3-4b-it-qat
88.46
100/100
65.7 sec
2025-05-03 16:05:24
ID: 2025-05-03
lmstudio
phi-4-mini-instruct
85.19
100/100
41.6 sec
2025-05-03 16:02:54
ID: 2025-05-03
lmstudio
meta-llama-3.1-8b-instruct
55.09
100/100
101.6 sec

Category Performance by Model

Model Ethics Fairness Reliability Safety Social Impact Transparency Average
gemma-3-4b-it-qat 88.20 86.50 90.50 85.60 92.50 90.50 88.46
phi-4-mini-instruct 85.60 85.00 85.00 85.00 85.70 85.00 85.19
qwen2.5-coder-3b-instruct-mlx 82.75 82.75 84.00 84.50 84.00 83.00 83.40
qwen3-4b 76.10 78.25 77.50 78.80 79.00 79.75 78.23
meta-llama-3.1-8b-instruct 31.70 56.75 60.40 61.00 76.10 57.75 55.09
llama-3.2-3b-instruct 38.05 47.45 39.20 58.50 57.40 48.90 48.24