Add standardised benchmarking capabilities, changes to lean_agent #878

AntonOsika · 2023-11-30T18:47:22Z

Benchmarking

See benchmark/types to understand how to implement a Benchmark.

New benchmarks are added as a new folder to gpt_engineer/benchmark/benchmarks and with a new entry to gpt_engineer/benchmark/benchmarks/load.py

To make it possible to benchmark an agent, just create a function default_config_agent that returns an instantiated Agent, and then pass the path to that function to the benchmark CLI tool.

How to run e.g. lean_agent on the gpteng benchmark:
python gpt_engineer/benchmark gpt_engineer/core/default/lean_agent gpteng

Results below:

Note that we add local caching to the LLM. Identical calls will reuse the cache stored in .langchain.db. Please delete this file if you want to check for how temperature affects the results.

Also – this is not meant to be merged to refactor. We should merge to main after refactor is merged!

sweep-ai · 2023-11-30T18:48:27Z

Apply Sweep Rules to your PR?

Apply: Ensure all new functions and classes have very clear, concise and up-to-date docstrings. Take gpt_engineer/ai.py as a good example.

AntonOsika added 6 commits November 30, 2023 17:25

Move evals

37b91f1

Add gptme and gpteng evals and do some fixes

b05d301

Stop running in lean agent

977d107

Add langchain caching

e189b3b

Add cache and short command

169324d

Remove langchain from cache

f9ba1a2

AntonOsika requested a review from UmerHA as a code owner November 30, 2023 18:47

AntonOsika added 3 commits November 30, 2023 19:48

Update print

76efd90

change order of args

2dbd33b

imports

1eb3208

ATheorell merged commit a5e03e5 into refactor Dec 1, 2023
2 checks passed

ATheorell deleted the ao/benchmark branch December 4, 2023 12:53

AntonOsika added the benchmark label Apr 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add standardised benchmarking capabilities, changes to lean_agent #878

Add standardised benchmarking capabilities, changes to lean_agent #878

AntonOsika commented Nov 30, 2023 •

edited

Loading

sweep-ai bot commented Nov 30, 2023

Add standardised benchmarking capabilities, changes to lean_agent #878

Add standardised benchmarking capabilities, changes to lean_agent #878

Conversation

AntonOsika commented Nov 30, 2023 • edited Loading

Benchmarking

sweep-ai bot commented Nov 30, 2023

Apply Sweep Rules to your PR?

AntonOsika commented Nov 30, 2023 •

edited

Loading