Evals for improve existing code #638

pbharrin · 2023-08-27T03:16:34Z

This work is motivated by a study of the way our code was being evaluated and used in other tools Auto-GPT, Rift, etc.
These projects have a modified fork of our code that can be be evaluated in a "headless" fashion with a report generated.

Since they had a fork of our project, they were not receiving updates to gpt-engineer and the evals were on old branches, which were not being rebased. This work introduces a method of evaluation which does not require human input, or rather the human input is done ahead of time (in a prompt file). This is done for the improve code mode, subsequent work will cover the default or new code generation.

Our output is code, and the code can be tested in one of two ways:

Artifacts in the source code
Performance of the code while being executed.
The first option is more brittle as the AI may have multiple ways of solving a problem. The second option is less brittle but currently is limited to Python. With the inclusion of different libraries we can test other languages as we. Both of these methods are covered in this PR.

…ns after `curr_fn` in the `steps.py` file. # Aider chat conversation: USER: All the functions after `curr_fn` need to have the `**kwargs` added to the function signature. Can you please add that. ASSISTANT: Sure, I will add `**kwargs` to the function signatures of all the functions after `curr_fn` in the `steps.py` file. Here are the changes: <source>python gpt_engineer/steps.py <<<<<<< HEAD def simple_gen(ai: AI, dbs: DBs) -> List[Message]: ======= def simple_gen(ai: AI, dbs: DBs, **kwargs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def clarify(ai: AI, dbs: DBs) -> List[Message]: ======= def clarify(ai: AI, dbs: DBs, **kwargs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_spec(ai: AI, dbs: DBs) -> List[Message]: ======= def gen_spec(ai: AI, dbs: DBs, **kwargs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def respec(ai: AI, dbs: DBs) -> List[Message]: ======= def respec(ai: AI, dbs: DBs, **kwargs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_unit_tests(ai: AI, dbs: DBs) -> List[dict]: ======= def gen_unit_tests(ai: AI, dbs: DBs, **kwargs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_clarified_code(ai: AI, dbs: DBs) -> List[dict]: ======= def gen_clarified_code(ai: AI, dbs: DBs, **kwargs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_code(ai: AI, dbs: DBs) -> List[dict]: ======= def gen_code(ai: AI, dbs: DBs, **kwargs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def execute_entrypoint(ai: AI, dbs: DBs) -> List[dict]: ======= def execute_entrypoint(ai: AI, dbs: DBs, **kwargs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_entrypoint(ai: AI, dbs: DBs) -> List[dict]: ======= def gen_entrypoint(ai: AI, dbs: DBs, **kwargs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def use_feedback(ai: AI, dbs: DBs): ======= def use_feedback(ai: AI, dbs: DBs, **kwargs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def improve_existing_code(ai: AI, dbs: DBs): ======= def improve_existing_code(ai: AI, dbs: DBs, **kwargs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def fix_code(ai: AI, dbs: DBs): ======= def fix_code(ai: AI, dbs: DBs, **kwargs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def human_review(ai: AI, dbs: DBs): ======= def human_review(ai: AI, dbs: DBs, **kwargs): >>>>>>> updated </source>

…teps for setting the file list and prompt, and added a new config for evaluating and improving code.

# Aider chat conversation: USER: For all of the functions that have `**kwargs` as an argument, please remove `**kwargs`. ASSISTANT: To remove `**kwargs` from all functions in the `steps.py` file, we need to modify the file `gpt_engineer/steps.py`. Here are the changes needed: <source>python gpt_engineer/steps.py <<<<<<< HEAD def simple_gen(ai: AI, dbs: DBs, **kwargs) -> List[Message]: ======= def simple_gen(ai: AI, dbs: DBs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def clarify(ai: AI, dbs: DBs, **kwargs) -> List[Message]: ======= def clarify(ai: AI, dbs: DBs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_spec(ai: AI, dbs: DBs, **kwargs) -> List[Message]: ======= def gen_spec(ai: AI, dbs: DBs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def respec(ai: AI, dbs: DBs, **kwargs) -> List[Message]: ======= def respec(ai: AI, dbs: DBs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_unit_tests(ai: AI, dbs: DBs, **kwargs) -> List[dict]: ======= def gen_unit_tests(ai: AI, dbs: DBs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_clarified_code(ai: AI, dbs: DBs, **kwargs) -> List[dict]: ======= def gen_clarified_code(ai: AI, dbs: DBs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_code(ai: AI, dbs: DBs, **kwargs) -> List[dict]: ======= def gen_code(ai: AI, dbs: DBs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def execute_entrypoint(ai: AI, dbs: DBs, **kwargs) -> List[dict]: ======= def execute_entrypoint(ai: AI, dbs: DBs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_entrypoint(ai: AI, dbs: DBs, **kwargs) -> List[dict]: ======= def gen_entrypoint(ai: AI, dbs: DBs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def use_feedback(ai: AI, dbs: DBs, **kwargs): ======= def use_feedback(ai: AI, dbs: DBs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def set_improve_filelist(ai: AI, dbs: DBs, **kwargs): ======= def set_improve_filelist(ai: AI, dbs: DBs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def assert_files_ready(ai: AI, dbs: DBs, **kwargs): ======= def assert_files_ready(ai: AI, dbs: DBs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def get_improve_prompt(ai: AI, dbs: DBs, **kwargs): ======= def get_improve_prompt(ai: AI, dbs: DBs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def improve_existing_code(ai: AI, dbs: DBs, **kwargs): ======= def improve_existing_code(ai: AI, dbs: DBs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def fix_code(ai: AI, dbs: DBs, **kwargs): ======= def fix_code(ai: AI, dbs: DBs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def human_review(ai: AI, dbs: DBs, **kwargs): ======= def human_review(ai: AI, dbs: DBs): >>>>>>> updated </source> Please confirm these changes.

…` and does a switch on the key `type` from `tests`, with options for `assert_exists_in_source_code` and `run_code_class_has_property`, and a default option raising an exception. # Aider chat conversation: USER: add a function to eval tools which takes a dict named `tests` and does a switch on the key `type` from `tests`. The possible options are: `assert_exists_in_source_code`, and `run_code_class_has_property`, there should also be a default option raising an excpetion. ASSISTANT: To add the requested function, we will need to modify the `scripts/eval_tools.py` file. Here are the steps we will follow: 1. Define a new function named `run_tests` that takes a dictionary named `tests` as an argument. 2. Inside this function, we will use a switch-like structure using Python's dictionary mapping to handle different `type` keys from the `tests` dictionary. 3. For the `type` key `assert_exists_in_source_code`, we will call the `assert_exists_in_source_code` function. 4. For the `type` key `run_code_class_has_property`, we will call a function `run_code_class_has_property`. This function is not defined yet, so we will add a placeholder function for now. 5. If the `type` key is not recognized, we will raise an exception. Here is the *edit block* for the changes: ```python scripts/eval_tools.py <<<<<<< HEAD from pathlib import Path def assert_exists_in_source_code(fpath: Path, existing_code: str): source_body = open(fpath).read() return source_body.find(existing_code) > -1 ======= from pathlib import Path def assert_exists_in_source_code(fpath: Path, existing_code: str): source_body = open(fpath).read() return source_body.find(existing_code) > -1 def run_code_class_has_property(): # Placeholder function pass def run_tests(tests: dict): test_type = tests.get('type') if test_type == 'assert_exists_in_source_code': return assert_exists_in_source_code(tests.get('fpath'), tests.get('existing_code')) elif test_type == 'run_code_class_has_property': return run_code_class_has_property() else: raise Exception(f"Test type '{test_type}' is not recognized.") >>>>>>> updated ``` Please note that the `run_code_class_has_property` function is a placeholder and does not perform any operation. You will need to replace it with the actual implementation.

… include a switch for the `language` argument, with support for the `python` language and a default case that raises an exception, and a new function `run_python_code` has been added as a placeholder. # Aider chat conversation: USER: The function `run_code_class_has_property` ASSISTANT: The function `run_code_class_has_property` is currently a placeholder function in the `scripts/eval_tools.py` file. It is defined as follows: ```python def run_code_class_has_property(language: str, source_file: Path) -> bool: # Placeholder function return True ``` This function is expected to take two arguments: `language` and `source_file`. The `language` argument is a string that represents the programming language of the source code. The `source_file` argument is a `Path` object that represents the path to the source code file. Currently, this function does not perform any operation and always returns `True`. You will need to replace this placeholder function with the actual implementation that checks if a class in the source code has a property. USER: In `run_code_class_has_property` add a switch for the language argument, the only supported lanague will be `python`, also add a defualt with an excpetion. After that add a new function called: run_python_code. ASSISTANT: To add the requested changes, we will need to modify the `scripts/eval_tools.py` file. Here are the steps we will follow: 1. Inside the `run_code_class_has_property` function, we will add a switch-like structure using Python's dictionary mapping to handle different `language` values. 2. For the `language` value `python`, we will call a new function `run_python_code`. This function is not defined yet, so we will add a placeholder function for now. 3. If the `language` value is not recognized, we will raise an exception. 4. Define a new function named `run_python_code` that takes a `source_file` as an argument. This function will be a placeholder for now. Here is the *edit block* for the changes: ```python scripts/eval_tools.py <<<<<<< HEAD def run_code_class_has_property(language: str, source_file: Path) -> bool: # Placeholder function return True ======= def run_python_code(source_file: Path) -> bool: # Placeholder function return True def run_code_class_has_property(language: str, source_file: Path) -> bool: if language == 'python': return run_python_code(source_file) else: raise Exception(f"Language '{language}' is not supported.") >>>>>>> updated ``` Please note that the `run_python_code` function is a placeholder and does not perform any operation. You will need to replace it with the actual implementation.

… `run_python_code`, and `run_code_class_has_property` to take a single dictionary argument `eval`. # Aider chat conversation: USER: Both `assert_exists_in_source_code` and `run_code_class_has_property` and `run_python_code` should only take a single dictionary called `eval` as an argument. ASSISTANT: To make the requested changes, we will need to modify the `scripts/eval_tools.py` file. Here are the steps we will follow: 1. Change the function signature of `assert_exists_in_source_code` to take a single dictionary argument `eval`. 2. Change the function signature of `run_code_class_has_property` to take a single dictionary argument `eval`. 3. Change the function signature of `run_python_code` to take a single dictionary argument `eval`. Here is the *edit block* for the changes: ```python scripts/eval_tools.py <<<<<<< HEAD def assert_exists_in_source_code(fpath: Path, existing_code: str) -> bool: source_body = open(fpath).read() return source_body.find(existing_code) > -1 def run_python_code(source_file: Path) -> bool: # Placeholder function return True def run_code_class_has_property(language: str, source_file: Path) -> bool: if language == 'python': return run_python_code(source_file) else: raise Exception(f"Language '{language}' is not supported.") ======= def assert_exists_in_source_code(eval: dict) -> bool: source_body = open(eval['fpath']).read() return source_body.find(eval['existing_code']) > -1 def run_python_code(eval: dict) -> bool: # Placeholder function return True def run_code_class_has_property(eval: dict) -> bool: if eval['language'] == 'python': return run_python_code(eval) else: raise Exception(f"Language '{eval['language']}' is not supported.") >>>>>>> updated ``` Please note that the `run_python_code` function is a placeholder and does not perform any operation. You will need to replace it with the actual implementation. USER: InvalidEditBlock: edit failed! scripts/eval_tools.py does not contain the *exact sequence* of HEAD lines you specified. Try again. DO NOT skip blank lines, comments, docstrings, etc! The HEAD block needs to be EXACTLY the same as the lines in scripts/eval_tools.py with nothing missing! scripts/eval_tools.py does not contain these 13 exact lines in a row: ``` def assert_exists_in_source_code(fpath: Path, existing_code: str) -> bool: source_body = open(fpath).read() return source_body.find(existing_code) > -1 def run_python_code(source_file: Path) -> bool: # Placeholder function return True def run_code_class_has_property(language: str, source_file: Path) -> bool: if language == 'python': return run_python_code(source_file) else: raise Exception(f"Language '{language}' is not supported.") ``` ASSISTANT: I apologize for the confusion. I see that the functions are not in sequence in the file. I will provide separate edit blocks for each function. Here are the *edit blocks* for the changes: ```python scripts/eval_tools.py <<<<<<< HEAD def assert_exists_in_source_code(fpath: Path, existing_code: str) -> bool: source_body = open(fpath).read() return source_body.find(existing_code) > -1 ======= def assert_exists_in_source_code(eval: dict) -> bool: source_body = open(eval['fpath']).read() return source_body.find(eval['existing_code']) > -1 >>>>>>> updated ``` ```python scripts/eval_tools.py <<<<<<< HEAD def run_python_code(source_file: Path) -> bool: # Placeholder function return True ======= def run_python_code(eval: dict) -> bool: # Placeholder function return True >>>>>>> updated ``` ```python scripts/eval_tools.py <<<<<<< HEAD def run_code_class_has_property(language: str, source_file: Path) -> bool: if language == 'python': return run_python_code(source_file) else: raise Exception(f"Language '{language}' is not supported.") ======= def run_code_class_has_property(eval: dict) -> bool: if eval['language'] == 'python': return run_python_code(eval) else: raise Exception(f"Language '{eval['language']}' is not supported.") >>>>>>> updated ```

Evals

AntonOsika · 2023-08-30T14:42:13Z

Awesome, merging.

Assuming you ran the new improve code configs @pbharrin ?

AntonOsika · 2023-08-30T14:44:06Z

Regarding making it nice:

evals directory is very unclear right now.

I don't understand fully how it is consumed downstream by agi-evals, or how to rerun the eval!

Do you see a way we can make it less confusing to people that open the repository @pbharrin ?

* Existing Code eval sucessfully populates a known project * Refactor main.py to add a new command line option for skipping feedback. * aider: Added `**kwargs` to the function signatures of all the functions after `curr_fn` in the `steps.py` file. # Aider chat conversation: USER: All the functions after `curr_fn` need to have the `**kwargs` added to the function signature. Can you please add that. ASSISTANT: Sure, I will add `**kwargs` to the function signatures of all the functions after `curr_fn` in the `steps.py` file. Here are the changes: <source>python gpt_engineer/steps.py <<<<<<< HEAD def simple_gen(ai: AI, dbs: DBs) -> List[Message]: ======= def simple_gen(ai: AI, dbs: DBs, **kwargs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def clarify(ai: AI, dbs: DBs) -> List[Message]: ======= def clarify(ai: AI, dbs: DBs, **kwargs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_spec(ai: AI, dbs: DBs) -> List[Message]: ======= def gen_spec(ai: AI, dbs: DBs, **kwargs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def respec(ai: AI, dbs: DBs) -> List[Message]: ======= def respec(ai: AI, dbs: DBs, **kwargs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_unit_tests(ai: AI, dbs: DBs) -> List[dict]: ======= def gen_unit_tests(ai: AI, dbs: DBs, **kwargs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_clarified_code(ai: AI, dbs: DBs) -> List[dict]: ======= def gen_clarified_code(ai: AI, dbs: DBs, **kwargs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_code(ai: AI, dbs: DBs) -> List[dict]: ======= def gen_code(ai: AI, dbs: DBs, **kwargs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def execute_entrypoint(ai: AI, dbs: DBs) -> List[dict]: ======= def execute_entrypoint(ai: AI, dbs: DBs, **kwargs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_entrypoint(ai: AI, dbs: DBs) -> List[dict]: ======= def gen_entrypoint(ai: AI, dbs: DBs, **kwargs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def use_feedback(ai: AI, dbs: DBs): ======= def use_feedback(ai: AI, dbs: DBs, **kwargs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def improve_existing_code(ai: AI, dbs: DBs): ======= def improve_existing_code(ai: AI, dbs: DBs, **kwargs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def fix_code(ai: AI, dbs: DBs): ======= def fix_code(ai: AI, dbs: DBs, **kwargs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def human_review(ai: AI, dbs: DBs): ======= def human_review(ai: AI, dbs: DBs, **kwargs): >>>>>>> updated </source> * Refactored the improve_existing_code function to include additional steps for setting the file list and prompt, and added a new config for evaluating and improving code. * aider: Removed `**kwargs` from all functions in `steps.py`. # Aider chat conversation: USER: For all of the functions that have `**kwargs` as an argument, please remove `**kwargs`. ASSISTANT: To remove `**kwargs` from all functions in the `steps.py` file, we need to modify the file `gpt_engineer/steps.py`. Here are the changes needed: <source>python gpt_engineer/steps.py <<<<<<< HEAD def simple_gen(ai: AI, dbs: DBs, **kwargs) -> List[Message]: ======= def simple_gen(ai: AI, dbs: DBs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def clarify(ai: AI, dbs: DBs, **kwargs) -> List[Message]: ======= def clarify(ai: AI, dbs: DBs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_spec(ai: AI, dbs: DBs, **kwargs) -> List[Message]: ======= def gen_spec(ai: AI, dbs: DBs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def respec(ai: AI, dbs: DBs, **kwargs) -> List[Message]: ======= def respec(ai: AI, dbs: DBs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_unit_tests(ai: AI, dbs: DBs, **kwargs) -> List[dict]: ======= def gen_unit_tests(ai: AI, dbs: DBs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_clarified_code(ai: AI, dbs: DBs, **kwargs) -> List[dict]: ======= def gen_clarified_code(ai: AI, dbs: DBs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_code(ai: AI, dbs: DBs, **kwargs) -> List[dict]: ======= def gen_code(ai: AI, dbs: DBs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def execute_entrypoint(ai: AI, dbs: DBs, **kwargs) -> List[dict]: ======= def execute_entrypoint(ai: AI, dbs: DBs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_entrypoint(ai: AI, dbs: DBs, **kwargs) -> List[dict]: ======= def gen_entrypoint(ai: AI, dbs: DBs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def use_feedback(ai: AI, dbs: DBs, **kwargs): ======= def use_feedback(ai: AI, dbs: DBs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def set_improve_filelist(ai: AI, dbs: DBs, **kwargs): ======= def set_improve_filelist(ai: AI, dbs: DBs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def assert_files_ready(ai: AI, dbs: DBs, **kwargs): ======= def assert_files_ready(ai: AI, dbs: DBs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def get_improve_prompt(ai: AI, dbs: DBs, **kwargs): ======= def get_improve_prompt(ai: AI, dbs: DBs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def improve_existing_code(ai: AI, dbs: DBs, **kwargs): ======= def improve_existing_code(ai: AI, dbs: DBs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def fix_code(ai: AI, dbs: DBs, **kwargs): ======= def fix_code(ai: AI, dbs: DBs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def human_review(ai: AI, dbs: DBs, **kwargs): ======= def human_review(ai: AI, dbs: DBs): >>>>>>> updated </source> Please confirm these changes. * formatting * aider: Added a function to eval tools which takes a dict named `tests` and does a switch on the key `type` from `tests`, with options for `assert_exists_in_source_code` and `run_code_class_has_property`, and a default option raising an exception. # Aider chat conversation: USER: add a function to eval tools which takes a dict named `tests` and does a switch on the key `type` from `tests`. The possible options are: `assert_exists_in_source_code`, and `run_code_class_has_property`, there should also be a default option raising an excpetion. ASSISTANT: To add the requested function, we will need to modify the `scripts/eval_tools.py` file. Here are the steps we will follow: 1. Define a new function named `run_tests` that takes a dictionary named `tests` as an argument. 2. Inside this function, we will use a switch-like structure using Python's dictionary mapping to handle different `type` keys from the `tests` dictionary. 3. For the `type` key `assert_exists_in_source_code`, we will call the `assert_exists_in_source_code` function. 4. For the `type` key `run_code_class_has_property`, we will call a function `run_code_class_has_property`. This function is not defined yet, so we will add a placeholder function for now. 5. If the `type` key is not recognized, we will raise an exception. Here is the *edit block* for the changes: ```python scripts/eval_tools.py <<<<<<< HEAD from pathlib import Path def assert_exists_in_source_code(fpath: Path, existing_code: str): source_body = open(fpath).read() return source_body.find(existing_code) > -1 ======= from pathlib import Path def assert_exists_in_source_code(fpath: Path, existing_code: str): source_body = open(fpath).read() return source_body.find(existing_code) > -1 def run_code_class_has_property(): # Placeholder function pass def run_tests(tests: dict): test_type = tests.get('type') if test_type == 'assert_exists_in_source_code': return assert_exists_in_source_code(tests.get('fpath'), tests.get('existing_code')) elif test_type == 'run_code_class_has_property': return run_code_class_has_property() else: raise Exception(f"Test type '{test_type}' is not recognized.") >>>>>>> updated ``` Please note that the `run_code_class_has_property` function is a placeholder and does not perform any operation. You will need to replace it with the actual implementation. * rename machine generated functions * aider: The function `run_code_class_has_property` has been updated to include a switch for the `language` argument, with support for the `python` language and a default case that raises an exception, and a new function `run_python_code` has been added as a placeholder. # Aider chat conversation: USER: The function `run_code_class_has_property` ASSISTANT: The function `run_code_class_has_property` is currently a placeholder function in the `scripts/eval_tools.py` file. It is defined as follows: ```python def run_code_class_has_property(language: str, source_file: Path) -> bool: # Placeholder function return True ``` This function is expected to take two arguments: `language` and `source_file`. The `language` argument is a string that represents the programming language of the source code. The `source_file` argument is a `Path` object that represents the path to the source code file. Currently, this function does not perform any operation and always returns `True`. You will need to replace this placeholder function with the actual implementation that checks if a class in the source code has a property. USER: In `run_code_class_has_property` add a switch for the language argument, the only supported lanague will be `python`, also add a defualt with an excpetion. After that add a new function called: run_python_code. ASSISTANT: To add the requested changes, we will need to modify the `scripts/eval_tools.py` file. Here are the steps we will follow: 1. Inside the `run_code_class_has_property` function, we will add a switch-like structure using Python's dictionary mapping to handle different `language` values. 2. For the `language` value `python`, we will call a new function `run_python_code`. This function is not defined yet, so we will add a placeholder function for now. 3. If the `language` value is not recognized, we will raise an exception. 4. Define a new function named `run_python_code` that takes a `source_file` as an argument. This function will be a placeholder for now. Here is the *edit block* for the changes: ```python scripts/eval_tools.py <<<<<<< HEAD def run_code_class_has_property(language: str, source_file: Path) -> bool: # Placeholder function return True ======= def run_python_code(source_file: Path) -> bool: # Placeholder function return True def run_code_class_has_property(language: str, source_file: Path) -> bool: if language == 'python': return run_python_code(source_file) else: raise Exception(f"Language '{language}' is not supported.") >>>>>>> updated ``` Please note that the `run_python_code` function is a placeholder and does not perform any operation. You will need to replace it with the actual implementation. * aider: Changed function signatures of `assert_exists_in_source_code`, `run_python_code`, and `run_code_class_has_property` to take a single dictionary argument `eval`. # Aider chat conversation: USER: Both `assert_exists_in_source_code` and `run_code_class_has_property` and `run_python_code` should only take a single dictionary called `eval` as an argument. ASSISTANT: To make the requested changes, we will need to modify the `scripts/eval_tools.py` file. Here are the steps we will follow: 1. Change the function signature of `assert_exists_in_source_code` to take a single dictionary argument `eval`. 2. Change the function signature of `run_code_class_has_property` to take a single dictionary argument `eval`. 3. Change the function signature of `run_python_code` to take a single dictionary argument `eval`. Here is the *edit block* for the changes: ```python scripts/eval_tools.py <<<<<<< HEAD def assert_exists_in_source_code(fpath: Path, existing_code: str) -> bool: source_body = open(fpath).read() return source_body.find(existing_code) > -1 def run_python_code(source_file: Path) -> bool: # Placeholder function return True def run_code_class_has_property(language: str, source_file: Path) -> bool: if language == 'python': return run_python_code(source_file) else: raise Exception(f"Language '{language}' is not supported.") ======= def assert_exists_in_source_code(eval: dict) -> bool: source_body = open(eval['fpath']).read() return source_body.find(eval['existing_code']) > -1 def run_python_code(eval: dict) -> bool: # Placeholder function return True def run_code_class_has_property(eval: dict) -> bool: if eval['language'] == 'python': return run_python_code(eval) else: raise Exception(f"Language '{eval['language']}' is not supported.") >>>>>>> updated ``` Please note that the `run_python_code` function is a placeholder and does not perform any operation. You will need to replace it with the actual implementation. USER: InvalidEditBlock: edit failed! scripts/eval_tools.py does not contain the *exact sequence* of HEAD lines you specified. Try again. DO NOT skip blank lines, comments, docstrings, etc! The HEAD block needs to be EXACTLY the same as the lines in scripts/eval_tools.py with nothing missing! scripts/eval_tools.py does not contain these 13 exact lines in a row: ``` def assert_exists_in_source_code(fpath: Path, existing_code: str) -> bool: source_body = open(fpath).read() return source_body.find(existing_code) > -1 def run_python_code(source_file: Path) -> bool: # Placeholder function return True def run_code_class_has_property(language: str, source_file: Path) -> bool: if language == 'python': return run_python_code(source_file) else: raise Exception(f"Language '{language}' is not supported.") ``` ASSISTANT: I apologize for the confusion. I see that the functions are not in sequence in the file. I will provide separate edit blocks for each function. Here are the *edit blocks* for the changes: ```python scripts/eval_tools.py <<<<<<< HEAD def assert_exists_in_source_code(fpath: Path, existing_code: str) -> bool: source_body = open(fpath).read() return source_body.find(existing_code) > -1 ======= def assert_exists_in_source_code(eval: dict) -> bool: source_body = open(eval['fpath']).read() return source_body.find(eval['existing_code']) > -1 >>>>>>> updated ``` ```python scripts/eval_tools.py <<<<<<< HEAD def run_python_code(source_file: Path) -> bool: # Placeholder function return True ======= def run_python_code(eval: dict) -> bool: # Placeholder function return True >>>>>>> updated ``` ```python scripts/eval_tools.py <<<<<<< HEAD def run_code_class_has_property(language: str, source_file: Path) -> bool: if language == 'python': return run_python_code(source_file) else: raise Exception(f"Language '{language}' is not supported.") ======= def run_code_class_has_property(eval: dict) -> bool: if eval['language'] == 'python': return run_python_code(eval) else: raise Exception(f"Language '{eval['language']}' is not supported.") >>>>>>> updated ``` * not unpacking objects early * check evals working, it can test Python code functionally * Report generating code done, ready to submit PR * added existing code eval for web app * added new eval, changed report to append mode * refactored, created evals dir * changed prompt back to the default * clean up for black

pbharrin added 19 commits August 19, 2023 14:15

Existing Code eval sucessfully populates a known project

214106d

Refactor main.py to add a new command line option for skipping feedback.

551f542

Refactored the improve_existing_code function to include additional s…

d3e4dd8

…teps for setting the file list and prompt, and added a new config for evaluating and improving code.

formatting

09ff533

rename machine generated functions

bb4e673

not unpacking objects early

6c28204

check evals working, it can test Python code functionally

030e81e

Report generating code done, ready to submit PR

eeb19d4

added existing code eval for web app

73c1009

added new eval, changed report to append mode

cf29abf

refactored, created evals dir

40d64c6

changed prompt back to the default

3f68e71

Merge pull request #1 from pbharrin/evals

6b9290e

Evals

Merge branch 'main' into evals

cc208fc

pbharrin mentioned this pull request Aug 27, 2023

Evals for improve existing code #637

Closed

clean up for black

10e16c9

pbharrin changed the title ~~Evals wo rebase~~ Evals for improve existing code Aug 27, 2023

AntonOsika merged commit eebbe1b into AntonOsika:main Aug 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evals for improve existing code #638

Evals for improve existing code #638

pbharrin commented Aug 27, 2023

AntonOsika commented Aug 30, 2023

AntonOsika commented Aug 30, 2023

Evals for improve existing code #638

Evals for improve existing code #638

Conversation

pbharrin commented Aug 27, 2023

AntonOsika commented Aug 30, 2023

AntonOsika commented Aug 30, 2023