-
-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evals for improve existing code #638
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ns after `curr_fn` in the `steps.py` file. # Aider chat conversation: USER: All the functions after `curr_fn` need to have the `**kwargs` added to the function signature. Can you please add that. ASSISTANT: Sure, I will add `**kwargs` to the function signatures of all the functions after `curr_fn` in the `steps.py` file. Here are the changes: <source>python gpt_engineer/steps.py <<<<<<< HEAD def simple_gen(ai: AI, dbs: DBs) -> List[Message]: ======= def simple_gen(ai: AI, dbs: DBs, **kwargs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def clarify(ai: AI, dbs: DBs) -> List[Message]: ======= def clarify(ai: AI, dbs: DBs, **kwargs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_spec(ai: AI, dbs: DBs) -> List[Message]: ======= def gen_spec(ai: AI, dbs: DBs, **kwargs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def respec(ai: AI, dbs: DBs) -> List[Message]: ======= def respec(ai: AI, dbs: DBs, **kwargs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_unit_tests(ai: AI, dbs: DBs) -> List[dict]: ======= def gen_unit_tests(ai: AI, dbs: DBs, **kwargs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_clarified_code(ai: AI, dbs: DBs) -> List[dict]: ======= def gen_clarified_code(ai: AI, dbs: DBs, **kwargs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_code(ai: AI, dbs: DBs) -> List[dict]: ======= def gen_code(ai: AI, dbs: DBs, **kwargs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def execute_entrypoint(ai: AI, dbs: DBs) -> List[dict]: ======= def execute_entrypoint(ai: AI, dbs: DBs, **kwargs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_entrypoint(ai: AI, dbs: DBs) -> List[dict]: ======= def gen_entrypoint(ai: AI, dbs: DBs, **kwargs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def use_feedback(ai: AI, dbs: DBs): ======= def use_feedback(ai: AI, dbs: DBs, **kwargs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def improve_existing_code(ai: AI, dbs: DBs): ======= def improve_existing_code(ai: AI, dbs: DBs, **kwargs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def fix_code(ai: AI, dbs: DBs): ======= def fix_code(ai: AI, dbs: DBs, **kwargs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def human_review(ai: AI, dbs: DBs): ======= def human_review(ai: AI, dbs: DBs, **kwargs): >>>>>>> updated </source>
…teps for setting the file list and prompt, and added a new config for evaluating and improving code.
# Aider chat conversation: USER: For all of the functions that have `**kwargs` as an argument, please remove `**kwargs`. ASSISTANT: To remove `**kwargs` from all functions in the `steps.py` file, we need to modify the file `gpt_engineer/steps.py`. Here are the changes needed: <source>python gpt_engineer/steps.py <<<<<<< HEAD def simple_gen(ai: AI, dbs: DBs, **kwargs) -> List[Message]: ======= def simple_gen(ai: AI, dbs: DBs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def clarify(ai: AI, dbs: DBs, **kwargs) -> List[Message]: ======= def clarify(ai: AI, dbs: DBs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_spec(ai: AI, dbs: DBs, **kwargs) -> List[Message]: ======= def gen_spec(ai: AI, dbs: DBs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def respec(ai: AI, dbs: DBs, **kwargs) -> List[Message]: ======= def respec(ai: AI, dbs: DBs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_unit_tests(ai: AI, dbs: DBs, **kwargs) -> List[dict]: ======= def gen_unit_tests(ai: AI, dbs: DBs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_clarified_code(ai: AI, dbs: DBs, **kwargs) -> List[dict]: ======= def gen_clarified_code(ai: AI, dbs: DBs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_code(ai: AI, dbs: DBs, **kwargs) -> List[dict]: ======= def gen_code(ai: AI, dbs: DBs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def execute_entrypoint(ai: AI, dbs: DBs, **kwargs) -> List[dict]: ======= def execute_entrypoint(ai: AI, dbs: DBs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_entrypoint(ai: AI, dbs: DBs, **kwargs) -> List[dict]: ======= def gen_entrypoint(ai: AI, dbs: DBs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def use_feedback(ai: AI, dbs: DBs, **kwargs): ======= def use_feedback(ai: AI, dbs: DBs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def set_improve_filelist(ai: AI, dbs: DBs, **kwargs): ======= def set_improve_filelist(ai: AI, dbs: DBs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def assert_files_ready(ai: AI, dbs: DBs, **kwargs): ======= def assert_files_ready(ai: AI, dbs: DBs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def get_improve_prompt(ai: AI, dbs: DBs, **kwargs): ======= def get_improve_prompt(ai: AI, dbs: DBs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def improve_existing_code(ai: AI, dbs: DBs, **kwargs): ======= def improve_existing_code(ai: AI, dbs: DBs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def fix_code(ai: AI, dbs: DBs, **kwargs): ======= def fix_code(ai: AI, dbs: DBs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def human_review(ai: AI, dbs: DBs, **kwargs): ======= def human_review(ai: AI, dbs: DBs): >>>>>>> updated </source> Please confirm these changes.
…` and does a switch on the key `type` from `tests`, with options for `assert_exists_in_source_code` and `run_code_class_has_property`, and a default option raising an exception. # Aider chat conversation: USER: add a function to eval tools which takes a dict named `tests` and does a switch on the key `type` from `tests`. The possible options are: `assert_exists_in_source_code`, and `run_code_class_has_property`, there should also be a default option raising an excpetion. ASSISTANT: To add the requested function, we will need to modify the `scripts/eval_tools.py` file. Here are the steps we will follow: 1. Define a new function named `run_tests` that takes a dictionary named `tests` as an argument. 2. Inside this function, we will use a switch-like structure using Python's dictionary mapping to handle different `type` keys from the `tests` dictionary. 3. For the `type` key `assert_exists_in_source_code`, we will call the `assert_exists_in_source_code` function. 4. For the `type` key `run_code_class_has_property`, we will call a function `run_code_class_has_property`. This function is not defined yet, so we will add a placeholder function for now. 5. If the `type` key is not recognized, we will raise an exception. Here is the *edit block* for the changes: ```python scripts/eval_tools.py <<<<<<< HEAD from pathlib import Path def assert_exists_in_source_code(fpath: Path, existing_code: str): source_body = open(fpath).read() return source_body.find(existing_code) > -1 ======= from pathlib import Path def assert_exists_in_source_code(fpath: Path, existing_code: str): source_body = open(fpath).read() return source_body.find(existing_code) > -1 def run_code_class_has_property(): # Placeholder function pass def run_tests(tests: dict): test_type = tests.get('type') if test_type == 'assert_exists_in_source_code': return assert_exists_in_source_code(tests.get('fpath'), tests.get('existing_code')) elif test_type == 'run_code_class_has_property': return run_code_class_has_property() else: raise Exception(f"Test type '{test_type}' is not recognized.") >>>>>>> updated ``` Please note that the `run_code_class_has_property` function is a placeholder and does not perform any operation. You will need to replace it with the actual implementation.
… include a switch for the `language` argument, with support for the `python` language and a default case that raises an exception, and a new function `run_python_code` has been added as a placeholder. # Aider chat conversation: USER: The function `run_code_class_has_property` ASSISTANT: The function `run_code_class_has_property` is currently a placeholder function in the `scripts/eval_tools.py` file. It is defined as follows: ```python def run_code_class_has_property(language: str, source_file: Path) -> bool: # Placeholder function return True ``` This function is expected to take two arguments: `language` and `source_file`. The `language` argument is a string that represents the programming language of the source code. The `source_file` argument is a `Path` object that represents the path to the source code file. Currently, this function does not perform any operation and always returns `True`. You will need to replace this placeholder function with the actual implementation that checks if a class in the source code has a property. USER: In `run_code_class_has_property` add a switch for the language argument, the only supported lanague will be `python`, also add a defualt with an excpetion. After that add a new function called: run_python_code. ASSISTANT: To add the requested changes, we will need to modify the `scripts/eval_tools.py` file. Here are the steps we will follow: 1. Inside the `run_code_class_has_property` function, we will add a switch-like structure using Python's dictionary mapping to handle different `language` values. 2. For the `language` value `python`, we will call a new function `run_python_code`. This function is not defined yet, so we will add a placeholder function for now. 3. If the `language` value is not recognized, we will raise an exception. 4. Define a new function named `run_python_code` that takes a `source_file` as an argument. This function will be a placeholder for now. Here is the *edit block* for the changes: ```python scripts/eval_tools.py <<<<<<< HEAD def run_code_class_has_property(language: str, source_file: Path) -> bool: # Placeholder function return True ======= def run_python_code(source_file: Path) -> bool: # Placeholder function return True def run_code_class_has_property(language: str, source_file: Path) -> bool: if language == 'python': return run_python_code(source_file) else: raise Exception(f"Language '{language}' is not supported.") >>>>>>> updated ``` Please note that the `run_python_code` function is a placeholder and does not perform any operation. You will need to replace it with the actual implementation.
… `run_python_code`, and `run_code_class_has_property` to take a single dictionary argument `eval`. # Aider chat conversation: USER: Both `assert_exists_in_source_code` and `run_code_class_has_property` and `run_python_code` should only take a single dictionary called `eval` as an argument. ASSISTANT: To make the requested changes, we will need to modify the `scripts/eval_tools.py` file. Here are the steps we will follow: 1. Change the function signature of `assert_exists_in_source_code` to take a single dictionary argument `eval`. 2. Change the function signature of `run_code_class_has_property` to take a single dictionary argument `eval`. 3. Change the function signature of `run_python_code` to take a single dictionary argument `eval`. Here is the *edit block* for the changes: ```python scripts/eval_tools.py <<<<<<< HEAD def assert_exists_in_source_code(fpath: Path, existing_code: str) -> bool: source_body = open(fpath).read() return source_body.find(existing_code) > -1 def run_python_code(source_file: Path) -> bool: # Placeholder function return True def run_code_class_has_property(language: str, source_file: Path) -> bool: if language == 'python': return run_python_code(source_file) else: raise Exception(f"Language '{language}' is not supported.") ======= def assert_exists_in_source_code(eval: dict) -> bool: source_body = open(eval['fpath']).read() return source_body.find(eval['existing_code']) > -1 def run_python_code(eval: dict) -> bool: # Placeholder function return True def run_code_class_has_property(eval: dict) -> bool: if eval['language'] == 'python': return run_python_code(eval) else: raise Exception(f"Language '{eval['language']}' is not supported.") >>>>>>> updated ``` Please note that the `run_python_code` function is a placeholder and does not perform any operation. You will need to replace it with the actual implementation. USER: InvalidEditBlock: edit failed! scripts/eval_tools.py does not contain the *exact sequence* of HEAD lines you specified. Try again. DO NOT skip blank lines, comments, docstrings, etc! The HEAD block needs to be EXACTLY the same as the lines in scripts/eval_tools.py with nothing missing! scripts/eval_tools.py does not contain these 13 exact lines in a row: ``` def assert_exists_in_source_code(fpath: Path, existing_code: str) -> bool: source_body = open(fpath).read() return source_body.find(existing_code) > -1 def run_python_code(source_file: Path) -> bool: # Placeholder function return True def run_code_class_has_property(language: str, source_file: Path) -> bool: if language == 'python': return run_python_code(source_file) else: raise Exception(f"Language '{language}' is not supported.") ``` ASSISTANT: I apologize for the confusion. I see that the functions are not in sequence in the file. I will provide separate edit blocks for each function. Here are the *edit blocks* for the changes: ```python scripts/eval_tools.py <<<<<<< HEAD def assert_exists_in_source_code(fpath: Path, existing_code: str) -> bool: source_body = open(fpath).read() return source_body.find(existing_code) > -1 ======= def assert_exists_in_source_code(eval: dict) -> bool: source_body = open(eval['fpath']).read() return source_body.find(eval['existing_code']) > -1 >>>>>>> updated ``` ```python scripts/eval_tools.py <<<<<<< HEAD def run_python_code(source_file: Path) -> bool: # Placeholder function return True ======= def run_python_code(eval: dict) -> bool: # Placeholder function return True >>>>>>> updated ``` ```python scripts/eval_tools.py <<<<<<< HEAD def run_code_class_has_property(language: str, source_file: Path) -> bool: if language == 'python': return run_python_code(source_file) else: raise Exception(f"Language '{language}' is not supported.") ======= def run_code_class_has_property(eval: dict) -> bool: if eval['language'] == 'python': return run_python_code(eval) else: raise Exception(f"Language '{eval['language']}' is not supported.") >>>>>>> updated ```
Awesome, merging. Assuming you ran the new improve code configs @pbharrin ? |
Regarding making it nice: evals directory is very unclear right now. I don't understand fully how it is consumed downstream by agi-evals, or how to rerun the eval! Do you see a way we can make it less confusing to people that open the repository @pbharrin ? |
70ziko
pushed a commit
to 70ziko/gpt-engineer
that referenced
this pull request
Oct 25, 2023
* Existing Code eval sucessfully populates a known project * Refactor main.py to add a new command line option for skipping feedback. * aider: Added `**kwargs` to the function signatures of all the functions after `curr_fn` in the `steps.py` file. # Aider chat conversation: USER: All the functions after `curr_fn` need to have the `**kwargs` added to the function signature. Can you please add that. ASSISTANT: Sure, I will add `**kwargs` to the function signatures of all the functions after `curr_fn` in the `steps.py` file. Here are the changes: <source>python gpt_engineer/steps.py <<<<<<< HEAD def simple_gen(ai: AI, dbs: DBs) -> List[Message]: ======= def simple_gen(ai: AI, dbs: DBs, **kwargs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def clarify(ai: AI, dbs: DBs) -> List[Message]: ======= def clarify(ai: AI, dbs: DBs, **kwargs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_spec(ai: AI, dbs: DBs) -> List[Message]: ======= def gen_spec(ai: AI, dbs: DBs, **kwargs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def respec(ai: AI, dbs: DBs) -> List[Message]: ======= def respec(ai: AI, dbs: DBs, **kwargs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_unit_tests(ai: AI, dbs: DBs) -> List[dict]: ======= def gen_unit_tests(ai: AI, dbs: DBs, **kwargs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_clarified_code(ai: AI, dbs: DBs) -> List[dict]: ======= def gen_clarified_code(ai: AI, dbs: DBs, **kwargs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_code(ai: AI, dbs: DBs) -> List[dict]: ======= def gen_code(ai: AI, dbs: DBs, **kwargs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def execute_entrypoint(ai: AI, dbs: DBs) -> List[dict]: ======= def execute_entrypoint(ai: AI, dbs: DBs, **kwargs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_entrypoint(ai: AI, dbs: DBs) -> List[dict]: ======= def gen_entrypoint(ai: AI, dbs: DBs, **kwargs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def use_feedback(ai: AI, dbs: DBs): ======= def use_feedback(ai: AI, dbs: DBs, **kwargs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def improve_existing_code(ai: AI, dbs: DBs): ======= def improve_existing_code(ai: AI, dbs: DBs, **kwargs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def fix_code(ai: AI, dbs: DBs): ======= def fix_code(ai: AI, dbs: DBs, **kwargs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def human_review(ai: AI, dbs: DBs): ======= def human_review(ai: AI, dbs: DBs, **kwargs): >>>>>>> updated </source> * Refactored the improve_existing_code function to include additional steps for setting the file list and prompt, and added a new config for evaluating and improving code. * aider: Removed `**kwargs` from all functions in `steps.py`. # Aider chat conversation: USER: For all of the functions that have `**kwargs` as an argument, please remove `**kwargs`. ASSISTANT: To remove `**kwargs` from all functions in the `steps.py` file, we need to modify the file `gpt_engineer/steps.py`. Here are the changes needed: <source>python gpt_engineer/steps.py <<<<<<< HEAD def simple_gen(ai: AI, dbs: DBs, **kwargs) -> List[Message]: ======= def simple_gen(ai: AI, dbs: DBs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def clarify(ai: AI, dbs: DBs, **kwargs) -> List[Message]: ======= def clarify(ai: AI, dbs: DBs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_spec(ai: AI, dbs: DBs, **kwargs) -> List[Message]: ======= def gen_spec(ai: AI, dbs: DBs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def respec(ai: AI, dbs: DBs, **kwargs) -> List[Message]: ======= def respec(ai: AI, dbs: DBs) -> List[Message]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_unit_tests(ai: AI, dbs: DBs, **kwargs) -> List[dict]: ======= def gen_unit_tests(ai: AI, dbs: DBs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_clarified_code(ai: AI, dbs: DBs, **kwargs) -> List[dict]: ======= def gen_clarified_code(ai: AI, dbs: DBs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_code(ai: AI, dbs: DBs, **kwargs) -> List[dict]: ======= def gen_code(ai: AI, dbs: DBs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def execute_entrypoint(ai: AI, dbs: DBs, **kwargs) -> List[dict]: ======= def execute_entrypoint(ai: AI, dbs: DBs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def gen_entrypoint(ai: AI, dbs: DBs, **kwargs) -> List[dict]: ======= def gen_entrypoint(ai: AI, dbs: DBs) -> List[dict]: >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def use_feedback(ai: AI, dbs: DBs, **kwargs): ======= def use_feedback(ai: AI, dbs: DBs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def set_improve_filelist(ai: AI, dbs: DBs, **kwargs): ======= def set_improve_filelist(ai: AI, dbs: DBs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def assert_files_ready(ai: AI, dbs: DBs, **kwargs): ======= def assert_files_ready(ai: AI, dbs: DBs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def get_improve_prompt(ai: AI, dbs: DBs, **kwargs): ======= def get_improve_prompt(ai: AI, dbs: DBs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def improve_existing_code(ai: AI, dbs: DBs, **kwargs): ======= def improve_existing_code(ai: AI, dbs: DBs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def fix_code(ai: AI, dbs: DBs, **kwargs): ======= def fix_code(ai: AI, dbs: DBs): >>>>>>> updated </source> <source>python gpt_engineer/steps.py <<<<<<< HEAD def human_review(ai: AI, dbs: DBs, **kwargs): ======= def human_review(ai: AI, dbs: DBs): >>>>>>> updated </source> Please confirm these changes. * formatting * aider: Added a function to eval tools which takes a dict named `tests` and does a switch on the key `type` from `tests`, with options for `assert_exists_in_source_code` and `run_code_class_has_property`, and a default option raising an exception. # Aider chat conversation: USER: add a function to eval tools which takes a dict named `tests` and does a switch on the key `type` from `tests`. The possible options are: `assert_exists_in_source_code`, and `run_code_class_has_property`, there should also be a default option raising an excpetion. ASSISTANT: To add the requested function, we will need to modify the `scripts/eval_tools.py` file. Here are the steps we will follow: 1. Define a new function named `run_tests` that takes a dictionary named `tests` as an argument. 2. Inside this function, we will use a switch-like structure using Python's dictionary mapping to handle different `type` keys from the `tests` dictionary. 3. For the `type` key `assert_exists_in_source_code`, we will call the `assert_exists_in_source_code` function. 4. For the `type` key `run_code_class_has_property`, we will call a function `run_code_class_has_property`. This function is not defined yet, so we will add a placeholder function for now. 5. If the `type` key is not recognized, we will raise an exception. Here is the *edit block* for the changes: ```python scripts/eval_tools.py <<<<<<< HEAD from pathlib import Path def assert_exists_in_source_code(fpath: Path, existing_code: str): source_body = open(fpath).read() return source_body.find(existing_code) > -1 ======= from pathlib import Path def assert_exists_in_source_code(fpath: Path, existing_code: str): source_body = open(fpath).read() return source_body.find(existing_code) > -1 def run_code_class_has_property(): # Placeholder function pass def run_tests(tests: dict): test_type = tests.get('type') if test_type == 'assert_exists_in_source_code': return assert_exists_in_source_code(tests.get('fpath'), tests.get('existing_code')) elif test_type == 'run_code_class_has_property': return run_code_class_has_property() else: raise Exception(f"Test type '{test_type}' is not recognized.") >>>>>>> updated ``` Please note that the `run_code_class_has_property` function is a placeholder and does not perform any operation. You will need to replace it with the actual implementation. * rename machine generated functions * aider: The function `run_code_class_has_property` has been updated to include a switch for the `language` argument, with support for the `python` language and a default case that raises an exception, and a new function `run_python_code` has been added as a placeholder. # Aider chat conversation: USER: The function `run_code_class_has_property` ASSISTANT: The function `run_code_class_has_property` is currently a placeholder function in the `scripts/eval_tools.py` file. It is defined as follows: ```python def run_code_class_has_property(language: str, source_file: Path) -> bool: # Placeholder function return True ``` This function is expected to take two arguments: `language` and `source_file`. The `language` argument is a string that represents the programming language of the source code. The `source_file` argument is a `Path` object that represents the path to the source code file. Currently, this function does not perform any operation and always returns `True`. You will need to replace this placeholder function with the actual implementation that checks if a class in the source code has a property. USER: In `run_code_class_has_property` add a switch for the language argument, the only supported lanague will be `python`, also add a defualt with an excpetion. After that add a new function called: run_python_code. ASSISTANT: To add the requested changes, we will need to modify the `scripts/eval_tools.py` file. Here are the steps we will follow: 1. Inside the `run_code_class_has_property` function, we will add a switch-like structure using Python's dictionary mapping to handle different `language` values. 2. For the `language` value `python`, we will call a new function `run_python_code`. This function is not defined yet, so we will add a placeholder function for now. 3. If the `language` value is not recognized, we will raise an exception. 4. Define a new function named `run_python_code` that takes a `source_file` as an argument. This function will be a placeholder for now. Here is the *edit block* for the changes: ```python scripts/eval_tools.py <<<<<<< HEAD def run_code_class_has_property(language: str, source_file: Path) -> bool: # Placeholder function return True ======= def run_python_code(source_file: Path) -> bool: # Placeholder function return True def run_code_class_has_property(language: str, source_file: Path) -> bool: if language == 'python': return run_python_code(source_file) else: raise Exception(f"Language '{language}' is not supported.") >>>>>>> updated ``` Please note that the `run_python_code` function is a placeholder and does not perform any operation. You will need to replace it with the actual implementation. * aider: Changed function signatures of `assert_exists_in_source_code`, `run_python_code`, and `run_code_class_has_property` to take a single dictionary argument `eval`. # Aider chat conversation: USER: Both `assert_exists_in_source_code` and `run_code_class_has_property` and `run_python_code` should only take a single dictionary called `eval` as an argument. ASSISTANT: To make the requested changes, we will need to modify the `scripts/eval_tools.py` file. Here are the steps we will follow: 1. Change the function signature of `assert_exists_in_source_code` to take a single dictionary argument `eval`. 2. Change the function signature of `run_code_class_has_property` to take a single dictionary argument `eval`. 3. Change the function signature of `run_python_code` to take a single dictionary argument `eval`. Here is the *edit block* for the changes: ```python scripts/eval_tools.py <<<<<<< HEAD def assert_exists_in_source_code(fpath: Path, existing_code: str) -> bool: source_body = open(fpath).read() return source_body.find(existing_code) > -1 def run_python_code(source_file: Path) -> bool: # Placeholder function return True def run_code_class_has_property(language: str, source_file: Path) -> bool: if language == 'python': return run_python_code(source_file) else: raise Exception(f"Language '{language}' is not supported.") ======= def assert_exists_in_source_code(eval: dict) -> bool: source_body = open(eval['fpath']).read() return source_body.find(eval['existing_code']) > -1 def run_python_code(eval: dict) -> bool: # Placeholder function return True def run_code_class_has_property(eval: dict) -> bool: if eval['language'] == 'python': return run_python_code(eval) else: raise Exception(f"Language '{eval['language']}' is not supported.") >>>>>>> updated ``` Please note that the `run_python_code` function is a placeholder and does not perform any operation. You will need to replace it with the actual implementation. USER: InvalidEditBlock: edit failed! scripts/eval_tools.py does not contain the *exact sequence* of HEAD lines you specified. Try again. DO NOT skip blank lines, comments, docstrings, etc! The HEAD block needs to be EXACTLY the same as the lines in scripts/eval_tools.py with nothing missing! scripts/eval_tools.py does not contain these 13 exact lines in a row: ``` def assert_exists_in_source_code(fpath: Path, existing_code: str) -> bool: source_body = open(fpath).read() return source_body.find(existing_code) > -1 def run_python_code(source_file: Path) -> bool: # Placeholder function return True def run_code_class_has_property(language: str, source_file: Path) -> bool: if language == 'python': return run_python_code(source_file) else: raise Exception(f"Language '{language}' is not supported.") ``` ASSISTANT: I apologize for the confusion. I see that the functions are not in sequence in the file. I will provide separate edit blocks for each function. Here are the *edit blocks* for the changes: ```python scripts/eval_tools.py <<<<<<< HEAD def assert_exists_in_source_code(fpath: Path, existing_code: str) -> bool: source_body = open(fpath).read() return source_body.find(existing_code) > -1 ======= def assert_exists_in_source_code(eval: dict) -> bool: source_body = open(eval['fpath']).read() return source_body.find(eval['existing_code']) > -1 >>>>>>> updated ``` ```python scripts/eval_tools.py <<<<<<< HEAD def run_python_code(source_file: Path) -> bool: # Placeholder function return True ======= def run_python_code(eval: dict) -> bool: # Placeholder function return True >>>>>>> updated ``` ```python scripts/eval_tools.py <<<<<<< HEAD def run_code_class_has_property(language: str, source_file: Path) -> bool: if language == 'python': return run_python_code(source_file) else: raise Exception(f"Language '{language}' is not supported.") ======= def run_code_class_has_property(eval: dict) -> bool: if eval['language'] == 'python': return run_python_code(eval) else: raise Exception(f"Language '{eval['language']}' is not supported.") >>>>>>> updated ``` * not unpacking objects early * check evals working, it can test Python code functionally * Report generating code done, ready to submit PR * added existing code eval for web app * added new eval, changed report to append mode * refactored, created evals dir * changed prompt back to the default * clean up for black
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This work is motivated by a study of the way our code was being evaluated and used in other tools Auto-GPT, Rift, etc.
These projects have a modified fork of our code that can be be evaluated in a "headless" fashion with a report generated.
Since they had a fork of our project, they were not receiving updates to gpt-engineer and the evals were on old branches, which were not being rebased. This work introduces a method of evaluation which does not require human input, or rather the human input is done ahead of time (in a prompt file). This is done for the improve code mode, subsequent work will cover the default or new code generation.
Our output is code, and the code can be tested in one of two ways:
Artifacts in the source code
Performance of the code while being executed.
The first option is more brittle as the AI may have multiple ways of solving a problem. The second option is less brittle but currently is limited to Python. With the inclusion of different libraries we can test other languages as we. Both of these methods are covered in this PR.