← Back to all models

Anthropic Model Performance

12 Anthropic models evaluated

Model Performance

Rank Model Accuracy Correct Total Incorrect Errors
1 Anthropic/Claude-3.7-Sonnet:thinking 98.9 ± 1.1% 60 60 0 0
2 Anthropic/Claude-Sonnet-4 95.8 ± 3.2% 60 62 2 0
3 Anthropic/Claude-Sonnet-4.5 93.2 ± 4.7% 50 53 2 1
4 Anthropic/Claude-3.5-Sonnet 92.0 ± 5.1% 53 57 3 1
5 Anthropic/Claude-3.7-Sonnet 91.4 ± 5.5% 49 53 3 1
6 Anthropic/Claude-3.5-Sonnet-20240620 85.9 ± 7.5% 46 53 7 0
7 Anthropic/Claude-3-Opus 70.7 ± 28.0% 1 1 0 0
7 Anthropic/Claude-Opus-4 70.7 ± 28.0% 1 1 0 0
7 Anthropic/Claude-Opus-4.1 70.7 ± 28.0% 1 1 0 0
8 Anthropic/Claude-3.5-Haiku 68.5 ± 15.9% 16 23 7 0
9 Anthropic/Claude-Haiku-4.5 59.2 ± 21.1% 9 15 1 5
10 Anthropic/Claude-3-Haiku 53.5 ± 23.5% 7 13 6 0