39 Qwen models evaluated
| Rank | Model | Accuracy | Correct | Total | Incorrect | Errors |
|---|---|---|---|---|---|---|
| 1 | Qwen/Qwen3-Max |
98.9 ± 1.1% | 61 | 61 | 0 | 0 |
| 2 | Qwen/Qwen3-Next-80b-A3b-Instruct |
97.3 ± 2.3% | 59 | 60 | 1 | 0 |
| 3 | Qwen/Qwen-Plus |
97.2 ± 2.4% | 57 | 58 | 1 | 0 |
| 3 | Qwen/Qwen3-Vl-30b-A3b-Instruct |
97.2 ± 2.4% | 57 | 58 | 1 | 0 |
| 4 | Qwen/Qwen3-30b-A3b-Thinking-2507 |
97.1 ± 2.5% | 56 | 57 | 1 | 0 |
| 4 | Qwen/Qwen3-Next-80b-A3b-Thinking |
97.1 ± 2.5% | 56 | 57 | 1 | 0 |
| 4 | Qwen/Qwen3-Vl-235b-A22b-Instruct |
97.1 ± 2.5% | 56 | 57 | 1 | 0 |
| 5 | Qwen/Qwen3-235b-A22b-Thinking-2507 |
96.9 ± 2.6% | 52 | 53 | 1 | 0 |
| 6 | Qwen/Qwq-32b |
96.9 ± 2.7% | 51 | 52 | 1 | 0 |
| 7 | Qwen/Qwen-Vl-Max |
95.6 ± 3.4% | 57 | 59 | 2 | 0 |
| 8 | Qwen/Qwen-2.5-Coder-32b-Instruct |
95.5 ± 3.4% | 56 | 58 | 2 | 0 |
| 8 | Qwen/Qwen-Max |
95.5 ± 3.4% | 56 | 58 | 1 | 1 |
| 8 | Qwen/Qwen-Plus-2025-07-28:thinking |
95.5 ± 3.4% | 56 | 58 | 2 | 0 |
| 9 | Qwen/Qwen-Plus-2025-07-28 |
95.4 ± 3.5% | 55 | 57 | 2 | 0 |
| 9 | Qwen/Qwen3-235b-A22b-2507 |
95.4 ± 3.5% | 55 | 57 | 2 | 0 |
| 10 | Qwen/Qwen3-235b-A22b:free |
95.0 ± 3.8% | 50 | 52 | 1 | 1 |
| 11 | Qwen/Qwen3-Vl-30b-A3b-Thinking |
94.1 ± 4.5% | 42 | 44 | 2 | 0 |
| 12 | Qwen/Qwen3-14b |
93.8 ± 4.3% | 55 | 58 | 3 | 0 |
| 12 | Qwen/Qwen3-30b-A3b-Instruct-2507 |
93.8 ± 4.3% | 55 | 58 | 3 | 0 |
| 13 | Qwen/Qwen3-Coder-Plus |
93.6 ± 4.5% | 53 | 56 | 3 | 0 |
| 14 | Qwen/Qwen3-8b |
93.1 ± 4.8% | 49 | 52 | 3 | 0 |
| 15 | Qwen/Qwen3-Vl-8b-Instruct |
92.0 ± 5.1% | 53 | 57 | 4 | 0 |
| 16 | Qwen/Qwen3-Vl-8b-Thinking |
90.3 ± 6.2% | 43 | 47 | 1 | 3 |
| 17 | Qwen/Qwen3-4b:free |
89.1 ± 10.5% | 5 | 5 | 0 | 0 |
| 18 | Qwen/Qwen3-Vl-235b-A22b-Thinking |
86.3 ± 8.2% | 35 | 40 | 2 | 3 |
| 19 | Qwen/Qwen3-Coder |
85.3 ± 7.8% | 44 | 51 | 4 | 3 |
| 20 | Qwen/Qwen2.5-Vl-32b-Instruct |
81.1 ± 10.4% | 28 | 34 | 4 | 2 |
| 21 | Qwen/Qwen-Vl-Plus |
78.9 ± 11.0% | 28 | 35 | 6 | 1 |
| 21 | Qwen/Qwen2.5-Vl-72b-Instruct |
78.9 ± 11.0% | 28 | 35 | 6 | 1 |
| 21 | Qwen/Qwen3-30b-A3b |
78.9 ± 11.0% | 28 | 35 | 6 | 1 |
| 22 | Qwen/Qwen3-235b-A22b |
77.3 ± 12.4% | 22 | 28 | 2 | 4 |
| 23 | Qwen/Qwen-2.5-72b-Instruct |
76.3 ± 12.3% | 24 | 31 | 6 | 1 |
| 24 | Qwen/Qwen3-32b |
74.7 ± 13.8% | 19 | 25 | 6 | 0 |
| 25 | Qwen/Qwen3-Coder-Flash |
71.9 ± 14.3% | 19 | 26 | 6 | 1 |
| 26 | Qwen/Qwen-2.5-7b-Instruct |
69.7 ± 15.3% | 17 | 24 | 5 | 2 |
| 26 | Qwen/Qwen-Turbo |
69.7 ± 15.3% | 17 | 24 | 5 | 2 |
| 26 | Qwen/Qwen3-Coder-30b-A3b-Instruct |
69.7 ± 15.3% | 17 | 24 | 6 | 1 |
| 27 | Qwen/Qwen-2.5-Vl-7b-Instruct |
32.1 ± 33.0% | 2 | 7 | 3 | 2 |
| 28 | Qwen/Qwen2.5-Coder-7b-Instruct |
22.8 ± 35.0% | 1 | 6 | 3 | 2 |