Skip to content

Commit

Permalink
Deployed 88a2a79 with MkDocs version: 1.6.1
Browse files Browse the repository at this point in the history
  • Loading branch information
github-actions[bot] committed Nov 4, 2024
1 parent e634166 commit b445229
Showing 1 changed file with 32 additions and 32 deletions.
64 changes: 32 additions & 32 deletions leaderboard/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -552,88 +552,88 @@ <h1 id="scicode-leaderboard">SciCode Leaderboard</h1>
<tr>
<th>Models</th>
<th>Main Problem Resolve Rate</th>
<th><span style="background-color:lightgrey">Subproblem</span></th>
<th><span style="color:grey">Subproblem</span></th>
</tr>
</thead>
<tbody>
<tr>
<td>🥇 OpenAI o1-preview</td>
<td><div align="center">7.7</div></td>
<td><div align="center" style="background-color:lightgrey">28.5</div></td>
<td><div align="center"><strong>7.7</strong></div></td>
<td><div align="center" style="color:grey">28.5</div></td>
</tr>
<tr>
<td>🥈 Claude3.5-Sonnet</td>
<td><div align="center">4.6</div></td>
<td><div align="center" style="background-color:lightgrey">26.0</div></td>
<td><div align="center"><strong>4.6</strong></div></td>
<td><div align="center" style="color:grey">26.0</div></td>
</tr>
<tr>
<td>🥉 Claude3.5-Sonnet (new)</td>
<td><div align="center">4.6</div></td>
<td><div align="center" style="background-color:lightgrey">25.3</div></td>
<td><div align="center"><strong>4.6</strong></div></td>
<td><div align="center" style="color:grey">25.3</div></td>
</tr>
<tr>
<td>Deepseek-Coder-v2</td>
<td><div align="center">3.1</div></td>
<td><div align="center" style="background-color:lightgrey">21.2</div></td>
<td><div align="center"><strong>3.1</strong></div></td>
<td><div align="center" style="color:grey">21.2</div></td>
</tr>
<tr>
<td>GPT-4o</td>
<td><div align="center">1.5</div></td>
<td><div align="center" style="background-color:lightgrey">25.0</div></td>
<td><div align="center"><strong>1.5</strong></div></td>
<td><div align="center" style="color:grey">25.0</div></td>
</tr>
<tr>
<td>GPT-4-Turbo</td>
<td><div align="center">1.5</div></td>
<td><div align="center" style="background-color:lightgrey">22.9</div></td>
<td><div align="center"><strong>1.5</strong></div></td>
<td><div align="center" style="color:grey">22.9</div></td>
</tr>
<tr>
<td>OpenAI o1-mini</td>
<td><div align="center">1.5</div></td>
<td><div align="center" style="background-color:lightgrey">22.2</div></td>
<td><div align="center"><strong>1.5</strong></div></td>
<td><div align="center" style="color:grey">22.2</div></td>
</tr>
<tr>
<td>Gemini 1.5 Pro</td>
<td><div align="center">1.5</div></td>
<td><div align="center" style="background-color:lightgrey">21.9</div></td>
<td><div align="center"><strong>1.5</strong></div></td>
<td><div align="center" style="color:grey">21.9</div></td>
</tr>
<tr>
<td>Claude3-Opus</td>
<td><div align="center">1.5</div></td>
<td><div align="center" style="background-color:lightgrey">21.5</div></td>
<td><div align="center"><strong>1.5</strong></div></td>
<td><div align="center" style="color:grey">21.5</div></td>
</tr>
<tr>
<td>Llama-3.1-405B-Chat</td>
<td><div align="center">1.5</div></td>
<td><div align="center" style="background-color:lightgrey">19.8</div></td>
<td><div align="center"><strong>1.5</strong></div></td>
<td><div align="center" style="color:grey">19.8</div></td>
</tr>
<tr>
<td>Claude3-Sonnet</td>
<td><div align="center">1.5</div></td>
<td><div align="center" style="background-color:lightgrey">17.0</div></td>
<td><div align="center"><strong>1.5</strong></div></td>
<td><div align="center" style="color:grey">17.0</div></td>
</tr>
<tr>
<td>Qwen2-72B-Instruct</td>
<td><div align="center">1.5</div></td>
<td><div align="center" style="background-color:lightgrey">17.0</div></td>
<td><div align="center"><strong>1.5</strong></div></td>
<td><div align="center" style="color:grey">17.0</div></td>
</tr>
<tr>
<td>Llama-3.1-70B-Chat</td>
<td><div align="center">0.0</div></td>
<td><div align="center" style="background-color:lightgrey">17.0</div></td>
<td><div align="center"><strong>0.0</strong></div></td>
<td><div align="center" style="color:grey">17.0</div></td>
</tr>
<tr>
<td>Mixtral-8x22B-Instruct</td>
<td><div align="center">0.0</div></td>
<td><div align="center" style="background-color:lightgrey">16.3</div></td>
<td><div align="center"><strong>0.0</strong></div></td>
<td><div align="center" style="color:grey">16.3</div></td>
</tr>
<tr>
<td>Llama-3-70B-Chat</td>
<td><div align="center">0.0</div></td>
<td><div align="center" style="background-color:lightgrey">14.6</div></td>
<td><div align="center"><strong>0.0</strong></div></td>
<td><div align="center" style="color:grey">14.6</div></td>
</tr>
</tbody>
</table>
<p>Note: If the models tie in the Main Problem resolve rate, we will then compare the Subproblems.</p>
<p><strong>Note: If the models tie in the Main Problem resolve rate, we will then compare the Subproblems.</strong></p>
<!-- Once you've added the results to the submission repository,
bring back the table here -->
<!-- include-markdown "leaderboard_table.md" -->
Expand Down

0 comments on commit b445229

Please sign in to comment.