18 Google models evaluated
| Rank | Model | Accuracy | Correct | Total | Incorrect | Errors |
|---|---|---|---|---|---|---|
| 1 | Google/Gemini-2.5-Flash |
98.9 ± 1.1% | 62 | 62 | 0 | 0 |
| 2 | Google/Gemini-2.5-Pro |
98.9 ± 1.1% | 61 | 61 | 0 | 0 |
| 2 | Google/Gemini-2.5-Pro-Preview |
98.9 ± 1.1% | 61 | 61 | 0 | 0 |
| 3 | Google/Gemini-2.5-Flash-Image |
98.9 ± 1.1% | 60 | 60 | 0 | 0 |
| 4 | Google/Gemini-2.5-Pro-Preview-05-06 |
97.2 ± 2.4% | 58 | 59 | 1 | 0 |
| 5 | Google/Gemini-2.5-Flash-Image-Preview |
97.2 ± 2.4% | 57 | 58 | 1 | 0 |
| 6 | Google/Gemini-2.5-Flash-Lite-Preview-06-17 |
95.4 ± 3.5% | 55 | 57 | 1 | 1 |
| 7 | Google/Gemini-2.5-Flash-Preview-09-2025 |
93.8 ± 4.3% | 55 | 58 | 3 | 0 |
| 8 | Google/Gemini-2.5-Flash-Lite-Preview-09-2025 |
92.1 ± 5.1% | 54 | 58 | 3 | 1 |
| 9 | Google/Gemini-2.0-Flash-001 |
90.4 ± 5.7% | 53 | 58 | 5 | 0 |
| 10 | Google/Gemma-3-27b-It |
90.3 ± 5.8% | 52 | 57 | 5 | 0 |
| 11 | Google/Gemini-2.5-Flash-Lite |
78.9 ± 11.0% | 28 | 35 | 7 | 0 |
| 12 | Google/Gemma-3-12b-It |
70.9 ± 14.8% | 18 | 25 | 7 | 0 |
| 13 | Google/Gemini-2.0-Flash-Lite-001 |
69.7 ± 15.3% | 17 | 24 | 7 | 0 |
| 13 | Google/Gemma-3-4b-It |
69.7 ± 15.3% | 17 | 24 | 5 | 2 |
| 14 | Google/Gemma-2-9b-It |
64.1 ± 17.8% | 13 | 20 | 7 | 0 |
| 15 | Google/Gemma-3n-E4b-It |
60.3 ± 19.4% | 11 | 18 | 6 | 1 |
| 16 | Google/Gemma-2-27b-It |
53.5 ± 23.5% | 7 | 13 | 5 | 1 |