← Back to all models

Openai Model Performance

44 Openai models evaluated

Model Performance

Rank Model Accuracy Correct Total Incorrect Errors
1 Openai/Gpt-5 98.9 ± 1.1% 61 61 0 0
2 Openai/Gpt-5-Codex 98.9 ± 1.1% 60 60 0 0
2 Openai/Gpt-5-Image-Mini 98.9 ± 1.1% 60 60 0 0
2 Openai/O3-Mini 98.9 ± 1.1% 60 60 0 0
3 Openai/Gpt-5-Mini 97.3 ± 2.3% 60 61 1 0
3 Openai/O3 97.3 ± 2.3% 60 61 1 0
4 Openai/Gpt-5-Nano 97.3 ± 2.3% 59 60 1 0
4 Openai/O3-Mini-High 97.3 ± 2.3% 59 60 1 0
4 Openai/O4-Mini-Deep-Research 97.3 ± 2.3% 59 60 0 1
5 Openai/O4-Mini 97.0 ± 2.5% 54 55 1 0
5 Openai/O4-Mini-High 97.0 ± 2.5% 54 55 1 0
6 Openai/Gpt-4.1-Mini 95.5 ± 3.4% 56 58 1 1
7 Openai/Gpt-5-Chat 93.8 ± 4.3% 55 58 1 2
7 Openai/Gpt-Oss-20b 93.8 ± 4.3% 55 58 3 0
8 Openai/Gpt-Oss-120b 93.7 ± 4.4% 54 57 3 0
9 Openai/O1-Mini 93.6 ± 4.5% 53 56 3 0
10 Openai/Codex-Mini 91.6 ± 5.4% 50 54 3 1
11 Openai/Gpt-Oss-20b:free 90.9 ± 5.5% 56 61 4 1
12 Openai/O1-Mini-2024-09-12 90.1 ± 5.9% 51 56 4 1
13 Openai/Gpt-4o-Mini-2024-07-18 80.5 ± 10.2% 31 38 5 2
14 Openai/Gpt-4.1-Nano 71.9 ± 14.3% 19 26 4 3
14 Openai/Gpt-4o 71.9 ± 14.3% 19 26 3 4
14 Openai/Gpt-4o-2024-11-20 71.9 ± 14.3% 19 26 4 3
15 Openai/Chatgpt-4o-Latest 70.7 ± 28.0% 1 1 0 0
15 Openai/Gpt-4 70.7 ± 28.0% 1 1 0 0
15 Openai/Gpt-4-Turbo 70.7 ± 28.0% 1 1 0 0
15 Openai/Gpt-4o-2024-05-13 70.7 ± 28.0% 1 1 0 0
15 Openai/Gpt-5-Image 70.7 ± 28.0% 1 1 0 0
15 Openai/Gpt-5-Pro 70.7 ± 28.0% 1 1 0 0
15 Openai/O1 70.7 ± 28.0% 1 1 0 0
15 Openai/O1-Pro 70.7 ± 28.0% 1 1 0 0
15 Openai/O3-Deep-Research 70.7 ± 28.0% 1 1 0 0
16 Openai/Gpt-4o-Search-Preview 69.7 ± 15.3% 17 24 6 1
17 Openai/Gpt-4o-Mini 64.1 ± 17.8% 13 20 6 1
18 Openai/Gpt-4o-2024-08-06 60.3 ± 19.4% 11 18 6 1
19 Openai/Gpt-4o-Mini-Search-Preview 53.5 ± 23.5% 7 13 6 0
20 Openai/Gpt-3.5-Turbo 46.0 ± 26.4% 5 11 6 0
21 Openai/Gpt-3.5-Turbo-16k 32.1 ± 33.0% 2 7 5 0
22 Openai/Gpt-4-0314 29.3 ± 54.9% 0 1 0 1
22 Openai/Gpt-4-1106-Preview 29.3 ± 54.9% 0 1 1 0
22 Openai/Gpt-4-Turbo-Preview 29.3 ± 54.9% 0 1 1 0
22 Openai/Gpt-4o:extended 29.3 ± 54.9% 0 1 1 0
23 Openai/Gpt-3.5-Turbo-0613 22.8 ± 35.0% 1 6 5 0
23 Openai/Gpt-3.5-Turbo-Instruct 22.8 ± 35.0% 1 6 5 0