diff --git a/leaderboard/index.html b/leaderboard/index.html index 8ee621a..af81722 100644 --- a/leaderboard/index.html +++ b/leaderboard/index.html @@ -552,88 +552,88 @@

SciCode Leaderboard

Models Main Problem Resolve Rate -Subproblem +Subproblem 🥇 OpenAI o1-preview -
7.7
-
28.5
+
7.7
+
28.5
🥈 Claude3.5-Sonnet -
4.6
-
26.0
+
4.6
+
26.0
🥉 Claude3.5-Sonnet (new) -
4.6
-
25.3
+
4.6
+
25.3
Deepseek-Coder-v2 -
3.1
-
21.2
+
3.1
+
21.2
GPT-4o -
1.5
-
25.0
+
1.5
+
25.0
GPT-4-Turbo -
1.5
-
22.9
+
1.5
+
22.9
OpenAI o1-mini -
1.5
-
22.2
+
1.5
+
22.2
Gemini 1.5 Pro -
1.5
-
21.9
+
1.5
+
21.9
Claude3-Opus -
1.5
-
21.5
+
1.5
+
21.5
Llama-3.1-405B-Chat -
1.5
-
19.8
+
1.5
+
19.8
Claude3-Sonnet -
1.5
-
17.0
+
1.5
+
17.0
Qwen2-72B-Instruct -
1.5
-
17.0
+
1.5
+
17.0
Llama-3.1-70B-Chat -
0.0
-
17.0
+
0.0
+
17.0
Mixtral-8x22B-Instruct -
0.0
-
16.3
+
0.0
+
16.3
Llama-3-70B-Chat -
0.0
-
14.6
+
0.0
+
14.6
-

Note: If the models tie in the Main Problem resolve rate, we will then compare the Subproblems.

+

Note: If the models tie in the Main Problem resolve rate, we will then compare the Subproblems.