From dc7a2bd0f546ea29929faa57b8e618c413c86bb2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=81lvaro=20Mateos=20Labrador?= Date: Sun, 20 Aug 2023 23:03:11 +0200 Subject: [PATCH] Add benchmark results for gpt3.5 on 8358b60 (#625) --- benchmark/RESULTS.md | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/benchmark/RESULTS.md b/benchmark/RESULTS.md index fe14334060..5534f81629 100644 --- a/benchmark/RESULTS.md +++ b/benchmark/RESULTS.md @@ -4,6 +4,29 @@ python scripts/benchmark.py ``` +## 2023-08-20 (8358b60e1c6dcfc517c47c15708422d9a7d1d83a) +| Benchmark | Version | Ran | Works | Perfect | +|--------------------|---------------|-----|-------|---------| +| currency_converter | GPT3.5 default| ✅ | ❌ | ❌ | +| image_resizer | GPT3.5 default| ✅ | ✅ | ✅ | +| pomodoro_timer | GPT3.5 default| ✅ | ✅ | ❌ | +| url_shortener | GPT3.5 default| ❌ | ❌ | ❌ | +| file_explorer | GPT3.5 default| ✅ | ✅ | ❌ | +| markdown_editor | GPT3.5 default| ✅ | ❌ | ❌ | +| timer_app | GPT3.5 default| ✅ | ✅ | ✅ | +| file_organizer | GPT3.5 default| ✅ | ✅ | ❌ | +| password_generator | GPT3.5 default| ✅ | ✅ | ✅ | +| todo_list | GPT3.5 default| ✅ | ✅ | ❌ | + +### Notes on the errors + +#### GPT3.5 +- `pomodoro_timer`: notifications didn't work. +- `file_explorer`: deletion didn't work. +- `file_organizer`: only handled a very small set of formats. +- `todo_list`: tasks couldn't be marked as completed. +- `url_shortener`: file names were wrong. Nothing could be run. + ## 2023-06-21 | Benchmark | Ran | Works | Perfect |