12 Anthropic models evaluated
| Rank | Model | Accuracy | Correct | Total | Incorrect | Errors |
|---|---|---|---|---|---|---|
| 1 | Anthropic/Claude-3.7-Sonnet:thinking |
98.9 ± 1.1% | 60 | 60 | 0 | 0 |
| 2 | Anthropic/Claude-Sonnet-4 |
95.8 ± 3.2% | 60 | 62 | 2 | 0 |
| 3 | Anthropic/Claude-Sonnet-4.5 |
93.2 ± 4.7% | 50 | 53 | 2 | 1 |
| 4 | Anthropic/Claude-3.5-Sonnet |
92.0 ± 5.1% | 53 | 57 | 3 | 1 |
| 5 | Anthropic/Claude-3.7-Sonnet |
91.4 ± 5.5% | 49 | 53 | 3 | 1 |
| 6 | Anthropic/Claude-3.5-Sonnet-20240620 |
85.9 ± 7.5% | 46 | 53 | 7 | 0 |
| 7 | Anthropic/Claude-3-Opus |
70.7 ± 28.0% | 1 | 1 | 0 | 0 |
| 7 | Anthropic/Claude-Opus-4 |
70.7 ± 28.0% | 1 | 1 | 0 | 0 |
| 7 | Anthropic/Claude-Opus-4.1 |
70.7 ± 28.0% | 1 | 1 | 0 | 0 |
| 8 | Anthropic/Claude-3.5-Haiku |
68.5 ± 15.9% | 16 | 23 | 7 | 0 |
| 9 | Anthropic/Claude-Haiku-4.5 |
59.2 ± 21.1% | 9 | 15 | 1 | 5 |
| 10 | Anthropic/Claude-3-Haiku |
53.5 ± 23.5% | 7 | 13 | 6 | 0 |