Qwen Model Performance

39 Qwen models evaluated

Model Performance

Rank	Model	Accuracy	Correct	Total	Incorrect	Errors
1	`Qwen/Qwen3-Max`	98.9 ± 1.1%	61	61	0	0
2	`Qwen/Qwen3-Next-80b-A3b-Instruct`	97.3 ± 2.3%	59	60	1	0
3	`Qwen/Qwen-Plus`	97.2 ± 2.4%	57	58	1	0
3	`Qwen/Qwen3-Vl-30b-A3b-Instruct`	97.2 ± 2.4%	57	58	1	0
4	`Qwen/Qwen3-30b-A3b-Thinking-2507`	97.1 ± 2.5%	56	57	1	0
4	`Qwen/Qwen3-Next-80b-A3b-Thinking`	97.1 ± 2.5%	56	57	1	0
4	`Qwen/Qwen3-Vl-235b-A22b-Instruct`	97.1 ± 2.5%	56	57	1	0
5	`Qwen/Qwen3-235b-A22b-Thinking-2507`	96.9 ± 2.6%	52	53	1	0
6	`Qwen/Qwq-32b`	96.9 ± 2.7%	51	52	1	0
7	`Qwen/Qwen-Vl-Max`	95.6 ± 3.4%	57	59	2	0
8	`Qwen/Qwen-2.5-Coder-32b-Instruct`	95.5 ± 3.4%	56	58	2	0
8	`Qwen/Qwen-Max`	95.5 ± 3.4%	56	58	1	1
8	`Qwen/Qwen-Plus-2025-07-28:thinking`	95.5 ± 3.4%	56	58	2	0
9	`Qwen/Qwen-Plus-2025-07-28`	95.4 ± 3.5%	55	57	2	0
9	`Qwen/Qwen3-235b-A22b-2507`	95.4 ± 3.5%	55	57	2	0
10	`Qwen/Qwen3-235b-A22b:free`	95.0 ± 3.8%	50	52	1	1
11	`Qwen/Qwen3-Vl-30b-A3b-Thinking`	94.1 ± 4.5%	42	44	2	0
12	`Qwen/Qwen3-14b`	93.8 ± 4.3%	55	58	3	0
12	`Qwen/Qwen3-30b-A3b-Instruct-2507`	93.8 ± 4.3%	55	58	3	0
13	`Qwen/Qwen3-Coder-Plus`	93.6 ± 4.5%	53	56	3	0
14	`Qwen/Qwen3-8b`	93.1 ± 4.8%	49	52	3	0
15	`Qwen/Qwen3-Vl-8b-Instruct`	92.0 ± 5.1%	53	57	4	0
16	`Qwen/Qwen3-Vl-8b-Thinking`	90.3 ± 6.2%	43	47	1	3
17	`Qwen/Qwen3-4b:free`	89.1 ± 10.5%	5	5	0	0
18	`Qwen/Qwen3-Vl-235b-A22b-Thinking`	86.3 ± 8.2%	35	40	2	3
19	`Qwen/Qwen3-Coder`	85.3 ± 7.8%	44	51	4	3
20	`Qwen/Qwen2.5-Vl-32b-Instruct`	81.1 ± 10.4%	28	34	4	2
21	`Qwen/Qwen-Vl-Plus`	78.9 ± 11.0%	28	35	6	1
21	`Qwen/Qwen2.5-Vl-72b-Instruct`	78.9 ± 11.0%	28	35	6	1
21	`Qwen/Qwen3-30b-A3b`	78.9 ± 11.0%	28	35	6	1
22	`Qwen/Qwen3-235b-A22b`	77.3 ± 12.4%	22	28	2	4
23	`Qwen/Qwen-2.5-72b-Instruct`	76.3 ± 12.3%	24	31	6	1
24	`Qwen/Qwen3-32b`	74.7 ± 13.8%	19	25	6	0
25	`Qwen/Qwen3-Coder-Flash`	71.9 ± 14.3%	19	26	6	1
26	`Qwen/Qwen-2.5-7b-Instruct`	69.7 ± 15.3%	17	24	5	2
26	`Qwen/Qwen-Turbo`	69.7 ± 15.3%	17	24	5	2
26	`Qwen/Qwen3-Coder-30b-A3b-Instruct`	69.7 ± 15.3%	17	24	6	1
27	`Qwen/Qwen-2.5-Vl-7b-Instruct`	32.1 ± 33.0%	2	7	3	2
28	`Qwen/Qwen2.5-Coder-7b-Instruct`	22.8 ± 35.0%	1	6	3	2