Skip to content

Commit

Permalink
Bump to 2.1.8
Browse files Browse the repository at this point in the history
  • Loading branch information
carlosejimenez committed Jan 11, 2025
1 parent 0aac6d5 commit de6ff17
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 1 deletion.
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ Code and data for our ICLR 2024 paper <a href="http://swe-bench.github.io/paper.
Please refer our [website](http://swe-bench.github.io) for the public leaderboard and the [change log](https://github.com/princeton-nlp/SWE-bench/blob/main/CHANGELOG.md) for information on the latest updates to the SWE-bench benchmark.

## 📰 News
* **[Jan. 11, 2025]**: Thanks to [Modal](https://modal.com/), we've added a new evaluation mode that runs evaluations entirely on the cloud! See 🚀 Set Up on this page for more details.
* **[Aug. 13, 2024]**: Introducing *SWE-bench Verified*! Part 2 of our collaboration with [OpenAI Preparedness](https://openai.com/preparedness/). A subset of 500 problems that real software engineers have confirmed are solvable. Check out more in the [report](https://openai.com/index/introducing-swe-bench-verified/)!
* **[Jun. 27, 2024]**: We have an exciting update for SWE-bench - with support from [OpenAI's Preparedness](https://openai.com/preparedness/) team: We're moving to a fully containerized evaluation harness using Docker for more reproducible evaluations! Read more in our [report](https://github.com/princeton-nlp/SWE-bench/blob/main/docs/20240627_docker/README.md).
* **[Apr. 15, 2024]**: SWE-bench has gone through major improvements to resolve issues with the evaluation harness. Read more in our [report](https://github.com/princeton-nlp/SWE-bench/blob/main/docs/20240415_eval_bug/README.md).
Expand Down Expand Up @@ -69,6 +70,17 @@ python -m swebench.harness.run_evaluation \
--run_id validate-gold
```

### 🌩️ Evaluation with Modal
You can also run evaluations entirely on the cloud using [Modal](https://modal.com/) to avoid local setup and resource constraints:
```bash
python -m swebench.harness.run_evaluation \
--predictions_path gold \
--run_id validate-gold-modal \
--instance_ids sympy__sympy-20590 \
--modal true
```
This will execute the evaluation harness on Modal's cloud infrastructure, eliminating the need for local Docker setup and resource management.

## 💽 Usage
> [!WARNING]
> Running fast evaluations on SWE-bench can be resource intensive
Expand Down
2 changes: 1 addition & 1 deletion swebench/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = "2.1.7"
__version__ = "2.1.8"

from swebench.collect.build_dataset import main as build_dataset
from swebench.collect.get_tasks_pipeline import main as get_tasks_pipeline
Expand Down

0 comments on commit de6ff17

Please sign in to comment.