11 Deepseek models evaluated
| Rank | Model | Accuracy | Correct | Total | Incorrect | Errors |
|---|---|---|---|---|---|---|
| 1 | Deepseek/Deepseek-R1 |
95.0 ± 3.8% | 50 | 52 | 1 | 1 |
| 2 | Deepseek/Deepseek-Chat |
93.8 ± 4.3% | 55 | 58 | 3 | 0 |
| 3 | Deepseek/Deepseek-R1-0528 |
93.5 ± 4.9% | 38 | 40 | 1 | 1 |
| 4 | Deepseek/Deepseek-Prover-V2 |
90.4 ± 5.7% | 53 | 58 | 4 | 1 |
| 5 | Deepseek/Deepseek-Chat-V3-0324 |
90.3 ± 5.8% | 52 | 57 | 3 | 2 |
| 6 | Deepseek/Deepseek-V3.1-Terminus |
88.8 ± 6.3% | 52 | 58 | 5 | 1 |
| 7 | Deepseek/Deepseek-Chat-V3.1 |
85.6 ± 7.6% | 45 | 52 | 6 | 1 |
| 8 | Deepseek/Deepseek-V3.2-Exp |
81.4 ± 9.7% | 33 | 40 | 6 | 1 |
| 9 | Deepseek/Deepseek-R1-Distill-Qwen-32b |
80.7 ± 11.3% | 23 | 28 | 4 | 1 |
| 10 | Deepseek/Deepseek-R1-0528-Qwen3-8b |
75.6 ± 13.3% | 20 | 26 | 4 | 2 |
| 11 | Deepseek/Deepseek-R1-Distill-Qwen-14b |
71.9 ± 14.3% | 19 | 26 | 5 | 2 |