Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evals for improve existing code #638

Merged
merged 20 commits into from
Aug 30, 2023
Merged

Evals for improve existing code #638

merged 20 commits into from
Aug 30, 2023

Conversation

pbharrin
Copy link
Contributor

This work is motivated by a study of the way our code was being evaluated and used in other tools Auto-GPT, Rift, etc.
These projects have a modified fork of our code that can be be evaluated in a "headless" fashion with a report generated.

Since they had a fork of our project, they were not receiving updates to gpt-engineer and the evals were on old branches, which were not being rebased. This work introduces a method of evaluation which does not require human input, or rather the human input is done ahead of time (in a prompt file). This is done for the improve code mode, subsequent work will cover the default or new code generation.

Our output is code, and the code can be tested in one of two ways:

Artifacts in the source code
Performance of the code while being executed.
The first option is more brittle as the AI may have multiple ways of solving a problem. The second option is less brittle but currently is limited to Python. With the inclusion of different libraries we can test other languages as we. Both of these methods are covered in this PR.

…ns after `curr_fn` in the `steps.py` file.

# Aider chat conversation:

USER: All the functions after `curr_fn` need to have the `**kwargs` added to the function signature.  Can you please add that.

ASSISTANT: Sure, I will add `**kwargs` to the function signatures of all the functions after `curr_fn` in the `steps.py` file.

Here are the changes:

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def simple_gen(ai: AI, dbs: DBs) -> List[Message]:
=======
def simple_gen(ai: AI, dbs: DBs, **kwargs) -> List[Message]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def clarify(ai: AI, dbs: DBs) -> List[Message]:
=======
def clarify(ai: AI, dbs: DBs, **kwargs) -> List[Message]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def gen_spec(ai: AI, dbs: DBs) -> List[Message]:
=======
def gen_spec(ai: AI, dbs: DBs, **kwargs) -> List[Message]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def respec(ai: AI, dbs: DBs) -> List[Message]:
=======
def respec(ai: AI, dbs: DBs, **kwargs) -> List[Message]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def gen_unit_tests(ai: AI, dbs: DBs) -> List[dict]:
=======
def gen_unit_tests(ai: AI, dbs: DBs, **kwargs) -> List[dict]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def gen_clarified_code(ai: AI, dbs: DBs) -> List[dict]:
=======
def gen_clarified_code(ai: AI, dbs: DBs, **kwargs) -> List[dict]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def gen_code(ai: AI, dbs: DBs) -> List[dict]:
=======
def gen_code(ai: AI, dbs: DBs, **kwargs) -> List[dict]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def execute_entrypoint(ai: AI, dbs: DBs) -> List[dict]:
=======
def execute_entrypoint(ai: AI, dbs: DBs, **kwargs) -> List[dict]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def gen_entrypoint(ai: AI, dbs: DBs) -> List[dict]:
=======
def gen_entrypoint(ai: AI, dbs: DBs, **kwargs) -> List[dict]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def use_feedback(ai: AI, dbs: DBs):
=======
def use_feedback(ai: AI, dbs: DBs, **kwargs):
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def improve_existing_code(ai: AI, dbs: DBs):
=======
def improve_existing_code(ai: AI, dbs: DBs, **kwargs):
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def fix_code(ai: AI, dbs: DBs):
=======
def fix_code(ai: AI, dbs: DBs, **kwargs):
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def human_review(ai: AI, dbs: DBs):
=======
def human_review(ai: AI, dbs: DBs, **kwargs):
>>>>>>> updated
</source>
…teps for setting the file list and prompt, and added a new config for evaluating and improving code.
# Aider chat conversation:

USER: For all of the functions that have `**kwargs` as an argument, please remove `**kwargs`.

ASSISTANT: To remove `**kwargs` from all functions in the `steps.py` file, we need to modify the file `gpt_engineer/steps.py`.

Here are the changes needed:

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def simple_gen(ai: AI, dbs: DBs, **kwargs) -> List[Message]:
=======
def simple_gen(ai: AI, dbs: DBs) -> List[Message]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def clarify(ai: AI, dbs: DBs, **kwargs) -> List[Message]:
=======
def clarify(ai: AI, dbs: DBs) -> List[Message]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def gen_spec(ai: AI, dbs: DBs, **kwargs) -> List[Message]:
=======
def gen_spec(ai: AI, dbs: DBs) -> List[Message]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def respec(ai: AI, dbs: DBs, **kwargs) -> List[Message]:
=======
def respec(ai: AI, dbs: DBs) -> List[Message]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def gen_unit_tests(ai: AI, dbs: DBs, **kwargs) -> List[dict]:
=======
def gen_unit_tests(ai: AI, dbs: DBs) -> List[dict]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def gen_clarified_code(ai: AI, dbs: DBs, **kwargs) -> List[dict]:
=======
def gen_clarified_code(ai: AI, dbs: DBs) -> List[dict]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def gen_code(ai: AI, dbs: DBs, **kwargs) -> List[dict]:
=======
def gen_code(ai: AI, dbs: DBs) -> List[dict]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def execute_entrypoint(ai: AI, dbs: DBs, **kwargs) -> List[dict]:
=======
def execute_entrypoint(ai: AI, dbs: DBs) -> List[dict]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def gen_entrypoint(ai: AI, dbs: DBs, **kwargs) -> List[dict]:
=======
def gen_entrypoint(ai: AI, dbs: DBs) -> List[dict]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def use_feedback(ai: AI, dbs: DBs, **kwargs):
=======
def use_feedback(ai: AI, dbs: DBs):
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def set_improve_filelist(ai: AI, dbs: DBs, **kwargs):
=======
def set_improve_filelist(ai: AI, dbs: DBs):
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def assert_files_ready(ai: AI, dbs: DBs, **kwargs):
=======
def assert_files_ready(ai: AI, dbs: DBs):
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def get_improve_prompt(ai: AI, dbs: DBs, **kwargs):
=======
def get_improve_prompt(ai: AI, dbs: DBs):
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def improve_existing_code(ai: AI, dbs: DBs, **kwargs):
=======
def improve_existing_code(ai: AI, dbs: DBs):
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def fix_code(ai: AI, dbs: DBs, **kwargs):
=======
def fix_code(ai: AI, dbs: DBs):
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def human_review(ai: AI, dbs: DBs, **kwargs):
=======
def human_review(ai: AI, dbs: DBs):
>>>>>>> updated
</source>

Please confirm these changes.
…` and does a switch on the key `type` from `tests`, with options for `assert_exists_in_source_code` and `run_code_class_has_property`, and a default option raising an exception.

# Aider chat conversation:

USER: add a function to eval tools which takes a dict named `tests` and does a switch on the key `type` from `tests`.  The possible options are: `assert_exists_in_source_code`, and `run_code_class_has_property`, there should also be a default option raising an excpetion.

ASSISTANT: To add the requested function, we will need to modify the `scripts/eval_tools.py` file.

Here are the steps we will follow:

1. Define a new function named `run_tests` that takes a dictionary named `tests` as an argument.
2. Inside this function, we will use a switch-like structure using Python's dictionary mapping to handle different `type` keys from the `tests` dictionary.
3. For the `type` key `assert_exists_in_source_code`, we will call the `assert_exists_in_source_code` function.
4. For the `type` key `run_code_class_has_property`, we will call a function `run_code_class_has_property`. This function is not defined yet, so we will add a placeholder function for now.
5. If the `type` key is not recognized, we will raise an exception.

Here is the *edit block* for the changes:

```python
scripts/eval_tools.py
<<<<<<< HEAD
from pathlib import Path

def assert_exists_in_source_code(fpath: Path, existing_code: str):
    source_body = open(fpath).read()
    return source_body.find(existing_code) > -1
=======
from pathlib import Path

def assert_exists_in_source_code(fpath: Path, existing_code: str):
    source_body = open(fpath).read()
    return source_body.find(existing_code) > -1

def run_code_class_has_property():
    # Placeholder function
    pass

def run_tests(tests: dict):
    test_type = tests.get('type')
    if test_type == 'assert_exists_in_source_code':
        return assert_exists_in_source_code(tests.get('fpath'), tests.get('existing_code'))
    elif test_type == 'run_code_class_has_property':
        return run_code_class_has_property()
    else:
        raise Exception(f"Test type '{test_type}' is not recognized.")
>>>>>>> updated
```

Please note that the `run_code_class_has_property` function is a placeholder and does not perform any operation. You will need to replace it with the actual implementation.
… include a switch for the `language` argument, with support for the `python` language and a default case that raises an exception, and a new function `run_python_code` has been added as a placeholder.

# Aider chat conversation:

USER: The function `run_code_class_has_property`

ASSISTANT: The function `run_code_class_has_property` is currently a placeholder function in the `scripts/eval_tools.py` file. It is defined as follows:

```python
def run_code_class_has_property(language: str, source_file: Path) -> bool:
    # Placeholder function
    return True
```

This function is expected to take two arguments: `language` and `source_file`. The `language` argument is a string that represents the programming language of the source code. The `source_file` argument is a `Path` object that represents the path to the source code file.

Currently, this function does not perform any operation and always returns `True`. You will need to replace this placeholder function with the actual implementation that checks if a class in the source code has a property.

USER: In `run_code_class_has_property` add a switch for the language argument, the only supported lanague will be `python`, also add a defualt with an excpetion.  After that add a new function called: run_python_code.

ASSISTANT: To add the requested changes, we will need to modify the `scripts/eval_tools.py` file.

Here are the steps we will follow:

1. Inside the `run_code_class_has_property` function, we will add a switch-like structure using Python's dictionary mapping to handle different `language` values.
2. For the `language` value `python`, we will call a new function `run_python_code`. This function is not defined yet, so we will add a placeholder function for now.
3. If the `language` value is not recognized, we will raise an exception.
4. Define a new function named `run_python_code` that takes a `source_file` as an argument. This function will be a placeholder for now.

Here is the *edit block* for the changes:

```python
scripts/eval_tools.py
<<<<<<< HEAD
def run_code_class_has_property(language: str, source_file: Path) -> bool:
    # Placeholder function
    return True
=======
def run_python_code(source_file: Path) -> bool:
    # Placeholder function
    return True

def run_code_class_has_property(language: str, source_file: Path) -> bool:
    if language == 'python':
        return run_python_code(source_file)
    else:
        raise Exception(f"Language '{language}' is not supported.")
>>>>>>> updated
```

Please note that the `run_python_code` function is a placeholder and does not perform any operation. You will need to replace it with the actual implementation.
… `run_python_code`, and `run_code_class_has_property` to take a single dictionary argument `eval`.

# Aider chat conversation:

USER: Both `assert_exists_in_source_code` and `run_code_class_has_property` and `run_python_code` should only take a single dictionary called `eval` as an argument.

ASSISTANT: To make the requested changes, we will need to modify the `scripts/eval_tools.py` file.

Here are the steps we will follow:

1. Change the function signature of `assert_exists_in_source_code` to take a single dictionary argument `eval`.
2. Change the function signature of `run_code_class_has_property` to take a single dictionary argument `eval`.
3. Change the function signature of `run_python_code` to take a single dictionary argument `eval`.

Here is the *edit block* for the changes:

```python
scripts/eval_tools.py
<<<<<<< HEAD
def assert_exists_in_source_code(fpath: Path, existing_code: str) -> bool:
    source_body = open(fpath).read()
    return source_body.find(existing_code) > -1

def run_python_code(source_file: Path) -> bool:
    # Placeholder function
    return True

def run_code_class_has_property(language: str, source_file: Path) -> bool:
    if language == 'python':
        return run_python_code(source_file)
    else:
        raise Exception(f"Language '{language}' is not supported.")
=======
def assert_exists_in_source_code(eval: dict) -> bool:
    source_body = open(eval['fpath']).read()
    return source_body.find(eval['existing_code']) > -1

def run_python_code(eval: dict) -> bool:
    # Placeholder function
    return True

def run_code_class_has_property(eval: dict) -> bool:
    if eval['language'] == 'python':
        return run_python_code(eval)
    else:
        raise Exception(f"Language '{eval['language']}' is not supported.")
>>>>>>> updated
```

Please note that the `run_python_code` function is a placeholder and does not perform any operation. You will need to replace it with the actual implementation.

USER: InvalidEditBlock: edit failed!

scripts/eval_tools.py does not contain the *exact sequence* of HEAD lines you specified.
Try again.
DO NOT skip blank lines, comments, docstrings, etc!
The HEAD block needs to be EXACTLY the same as the lines in scripts/eval_tools.py with nothing missing!

scripts/eval_tools.py does not contain these 13 exact lines in a row:
```
def assert_exists_in_source_code(fpath: Path, existing_code: str) -> bool:
    source_body = open(fpath).read()
    return source_body.find(existing_code) > -1

def run_python_code(source_file: Path) -> bool:
    # Placeholder function
    return True

def run_code_class_has_property(language: str, source_file: Path) -> bool:
    if language == 'python':
        return run_python_code(source_file)
    else:
        raise Exception(f"Language '{language}' is not supported.")
```

ASSISTANT: I apologize for the confusion. I see that the functions are not in sequence in the file. I will provide separate edit blocks for each function.

Here are the *edit blocks* for the changes:

```python
scripts/eval_tools.py
<<<<<<< HEAD
def assert_exists_in_source_code(fpath: Path, existing_code: str) -> bool:
    source_body = open(fpath).read()
    return source_body.find(existing_code) > -1
=======
def assert_exists_in_source_code(eval: dict) -> bool:
    source_body = open(eval['fpath']).read()
    return source_body.find(eval['existing_code']) > -1
>>>>>>> updated
```

```python
scripts/eval_tools.py
<<<<<<< HEAD
def run_python_code(source_file: Path) -> bool:
    # Placeholder function
    return True
=======
def run_python_code(eval: dict) -> bool:
    # Placeholder function
    return True
>>>>>>> updated
```

```python
scripts/eval_tools.py
<<<<<<< HEAD
def run_code_class_has_property(language: str, source_file: Path) -> bool:
    if language == 'python':
        return run_python_code(source_file)
    else:
        raise Exception(f"Language '{language}' is not supported.")
=======
def run_code_class_has_property(eval: dict) -> bool:
    if eval['language'] == 'python':
        return run_python_code(eval)
    else:
        raise Exception(f"Language '{eval['language']}' is not supported.")
>>>>>>> updated
```
@pbharrin pbharrin changed the title Evals wo rebase Evals for improve existing code Aug 27, 2023
@AntonOsika
Copy link
Owner

Awesome, merging.

Assuming you ran the new improve code configs @pbharrin ?

@AntonOsika AntonOsika merged commit eebbe1b into AntonOsika:main Aug 30, 2023
@AntonOsika
Copy link
Owner

Regarding making it nice:

evals directory is very unclear right now.

I don't understand fully how it is consumed downstream by agi-evals, or how to rerun the eval!

Do you see a way we can make it less confusing to people that open the repository @pbharrin ?

70ziko pushed a commit to 70ziko/gpt-engineer that referenced this pull request Oct 25, 2023
* Existing Code eval sucessfully populates a known project

* Refactor main.py to add a new command line option for skipping feedback.

* aider: Added `**kwargs` to the function signatures of all the functions after `curr_fn` in the `steps.py` file.

# Aider chat conversation:

USER: All the functions after `curr_fn` need to have the `**kwargs` added to the function signature.  Can you please add that.

ASSISTANT: Sure, I will add `**kwargs` to the function signatures of all the functions after `curr_fn` in the `steps.py` file.

Here are the changes:

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def simple_gen(ai: AI, dbs: DBs) -> List[Message]:
=======
def simple_gen(ai: AI, dbs: DBs, **kwargs) -> List[Message]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def clarify(ai: AI, dbs: DBs) -> List[Message]:
=======
def clarify(ai: AI, dbs: DBs, **kwargs) -> List[Message]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def gen_spec(ai: AI, dbs: DBs) -> List[Message]:
=======
def gen_spec(ai: AI, dbs: DBs, **kwargs) -> List[Message]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def respec(ai: AI, dbs: DBs) -> List[Message]:
=======
def respec(ai: AI, dbs: DBs, **kwargs) -> List[Message]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def gen_unit_tests(ai: AI, dbs: DBs) -> List[dict]:
=======
def gen_unit_tests(ai: AI, dbs: DBs, **kwargs) -> List[dict]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def gen_clarified_code(ai: AI, dbs: DBs) -> List[dict]:
=======
def gen_clarified_code(ai: AI, dbs: DBs, **kwargs) -> List[dict]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def gen_code(ai: AI, dbs: DBs) -> List[dict]:
=======
def gen_code(ai: AI, dbs: DBs, **kwargs) -> List[dict]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def execute_entrypoint(ai: AI, dbs: DBs) -> List[dict]:
=======
def execute_entrypoint(ai: AI, dbs: DBs, **kwargs) -> List[dict]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def gen_entrypoint(ai: AI, dbs: DBs) -> List[dict]:
=======
def gen_entrypoint(ai: AI, dbs: DBs, **kwargs) -> List[dict]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def use_feedback(ai: AI, dbs: DBs):
=======
def use_feedback(ai: AI, dbs: DBs, **kwargs):
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def improve_existing_code(ai: AI, dbs: DBs):
=======
def improve_existing_code(ai: AI, dbs: DBs, **kwargs):
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def fix_code(ai: AI, dbs: DBs):
=======
def fix_code(ai: AI, dbs: DBs, **kwargs):
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def human_review(ai: AI, dbs: DBs):
=======
def human_review(ai: AI, dbs: DBs, **kwargs):
>>>>>>> updated
</source>

* Refactored the improve_existing_code function to include additional steps for setting the file list and prompt, and added a new config for evaluating and improving code.

* aider: Removed `**kwargs` from all functions in `steps.py`.

# Aider chat conversation:

USER: For all of the functions that have `**kwargs` as an argument, please remove `**kwargs`.

ASSISTANT: To remove `**kwargs` from all functions in the `steps.py` file, we need to modify the file `gpt_engineer/steps.py`.

Here are the changes needed:

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def simple_gen(ai: AI, dbs: DBs, **kwargs) -> List[Message]:
=======
def simple_gen(ai: AI, dbs: DBs) -> List[Message]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def clarify(ai: AI, dbs: DBs, **kwargs) -> List[Message]:
=======
def clarify(ai: AI, dbs: DBs) -> List[Message]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def gen_spec(ai: AI, dbs: DBs, **kwargs) -> List[Message]:
=======
def gen_spec(ai: AI, dbs: DBs) -> List[Message]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def respec(ai: AI, dbs: DBs, **kwargs) -> List[Message]:
=======
def respec(ai: AI, dbs: DBs) -> List[Message]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def gen_unit_tests(ai: AI, dbs: DBs, **kwargs) -> List[dict]:
=======
def gen_unit_tests(ai: AI, dbs: DBs) -> List[dict]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def gen_clarified_code(ai: AI, dbs: DBs, **kwargs) -> List[dict]:
=======
def gen_clarified_code(ai: AI, dbs: DBs) -> List[dict]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def gen_code(ai: AI, dbs: DBs, **kwargs) -> List[dict]:
=======
def gen_code(ai: AI, dbs: DBs) -> List[dict]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def execute_entrypoint(ai: AI, dbs: DBs, **kwargs) -> List[dict]:
=======
def execute_entrypoint(ai: AI, dbs: DBs) -> List[dict]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def gen_entrypoint(ai: AI, dbs: DBs, **kwargs) -> List[dict]:
=======
def gen_entrypoint(ai: AI, dbs: DBs) -> List[dict]:
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def use_feedback(ai: AI, dbs: DBs, **kwargs):
=======
def use_feedback(ai: AI, dbs: DBs):
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def set_improve_filelist(ai: AI, dbs: DBs, **kwargs):
=======
def set_improve_filelist(ai: AI, dbs: DBs):
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def assert_files_ready(ai: AI, dbs: DBs, **kwargs):
=======
def assert_files_ready(ai: AI, dbs: DBs):
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def get_improve_prompt(ai: AI, dbs: DBs, **kwargs):
=======
def get_improve_prompt(ai: AI, dbs: DBs):
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def improve_existing_code(ai: AI, dbs: DBs, **kwargs):
=======
def improve_existing_code(ai: AI, dbs: DBs):
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def fix_code(ai: AI, dbs: DBs, **kwargs):
=======
def fix_code(ai: AI, dbs: DBs):
>>>>>>> updated
</source>

<source>python
gpt_engineer/steps.py
<<<<<<< HEAD
def human_review(ai: AI, dbs: DBs, **kwargs):
=======
def human_review(ai: AI, dbs: DBs):
>>>>>>> updated
</source>

Please confirm these changes.

* formatting

* aider: Added a function to eval tools which takes a dict named `tests` and does a switch on the key `type` from `tests`, with options for `assert_exists_in_source_code` and `run_code_class_has_property`, and a default option raising an exception.

# Aider chat conversation:

USER: add a function to eval tools which takes a dict named `tests` and does a switch on the key `type` from `tests`.  The possible options are: `assert_exists_in_source_code`, and `run_code_class_has_property`, there should also be a default option raising an excpetion.

ASSISTANT: To add the requested function, we will need to modify the `scripts/eval_tools.py` file.

Here are the steps we will follow:

1. Define a new function named `run_tests` that takes a dictionary named `tests` as an argument.
2. Inside this function, we will use a switch-like structure using Python's dictionary mapping to handle different `type` keys from the `tests` dictionary.
3. For the `type` key `assert_exists_in_source_code`, we will call the `assert_exists_in_source_code` function.
4. For the `type` key `run_code_class_has_property`, we will call a function `run_code_class_has_property`. This function is not defined yet, so we will add a placeholder function for now.
5. If the `type` key is not recognized, we will raise an exception.

Here is the *edit block* for the changes:

```python
scripts/eval_tools.py
<<<<<<< HEAD
from pathlib import Path

def assert_exists_in_source_code(fpath: Path, existing_code: str):
    source_body = open(fpath).read()
    return source_body.find(existing_code) > -1
=======
from pathlib import Path

def assert_exists_in_source_code(fpath: Path, existing_code: str):
    source_body = open(fpath).read()
    return source_body.find(existing_code) > -1

def run_code_class_has_property():
    # Placeholder function
    pass

def run_tests(tests: dict):
    test_type = tests.get('type')
    if test_type == 'assert_exists_in_source_code':
        return assert_exists_in_source_code(tests.get('fpath'), tests.get('existing_code'))
    elif test_type == 'run_code_class_has_property':
        return run_code_class_has_property()
    else:
        raise Exception(f"Test type '{test_type}' is not recognized.")
>>>>>>> updated
```

Please note that the `run_code_class_has_property` function is a placeholder and does not perform any operation. You will need to replace it with the actual implementation.

* rename machine generated functions

* aider: The function `run_code_class_has_property` has been updated to include a switch for the `language` argument, with support for the `python` language and a default case that raises an exception, and a new function `run_python_code` has been added as a placeholder.

# Aider chat conversation:

USER: The function `run_code_class_has_property`

ASSISTANT: The function `run_code_class_has_property` is currently a placeholder function in the `scripts/eval_tools.py` file. It is defined as follows:

```python
def run_code_class_has_property(language: str, source_file: Path) -> bool:
    # Placeholder function
    return True
```

This function is expected to take two arguments: `language` and `source_file`. The `language` argument is a string that represents the programming language of the source code. The `source_file` argument is a `Path` object that represents the path to the source code file.

Currently, this function does not perform any operation and always returns `True`. You will need to replace this placeholder function with the actual implementation that checks if a class in the source code has a property.

USER: In `run_code_class_has_property` add a switch for the language argument, the only supported lanague will be `python`, also add a defualt with an excpetion.  After that add a new function called: run_python_code.

ASSISTANT: To add the requested changes, we will need to modify the `scripts/eval_tools.py` file.

Here are the steps we will follow:

1. Inside the `run_code_class_has_property` function, we will add a switch-like structure using Python's dictionary mapping to handle different `language` values.
2. For the `language` value `python`, we will call a new function `run_python_code`. This function is not defined yet, so we will add a placeholder function for now.
3. If the `language` value is not recognized, we will raise an exception.
4. Define a new function named `run_python_code` that takes a `source_file` as an argument. This function will be a placeholder for now.

Here is the *edit block* for the changes:

```python
scripts/eval_tools.py
<<<<<<< HEAD
def run_code_class_has_property(language: str, source_file: Path) -> bool:
    # Placeholder function
    return True
=======
def run_python_code(source_file: Path) -> bool:
    # Placeholder function
    return True

def run_code_class_has_property(language: str, source_file: Path) -> bool:
    if language == 'python':
        return run_python_code(source_file)
    else:
        raise Exception(f"Language '{language}' is not supported.")
>>>>>>> updated
```

Please note that the `run_python_code` function is a placeholder and does not perform any operation. You will need to replace it with the actual implementation.

* aider: Changed function signatures of `assert_exists_in_source_code`, `run_python_code`, and `run_code_class_has_property` to take a single dictionary argument `eval`.

# Aider chat conversation:

USER: Both `assert_exists_in_source_code` and `run_code_class_has_property` and `run_python_code` should only take a single dictionary called `eval` as an argument.

ASSISTANT: To make the requested changes, we will need to modify the `scripts/eval_tools.py` file.

Here are the steps we will follow:

1. Change the function signature of `assert_exists_in_source_code` to take a single dictionary argument `eval`.
2. Change the function signature of `run_code_class_has_property` to take a single dictionary argument `eval`.
3. Change the function signature of `run_python_code` to take a single dictionary argument `eval`.

Here is the *edit block* for the changes:

```python
scripts/eval_tools.py
<<<<<<< HEAD
def assert_exists_in_source_code(fpath: Path, existing_code: str) -> bool:
    source_body = open(fpath).read()
    return source_body.find(existing_code) > -1

def run_python_code(source_file: Path) -> bool:
    # Placeholder function
    return True

def run_code_class_has_property(language: str, source_file: Path) -> bool:
    if language == 'python':
        return run_python_code(source_file)
    else:
        raise Exception(f"Language '{language}' is not supported.")
=======
def assert_exists_in_source_code(eval: dict) -> bool:
    source_body = open(eval['fpath']).read()
    return source_body.find(eval['existing_code']) > -1

def run_python_code(eval: dict) -> bool:
    # Placeholder function
    return True

def run_code_class_has_property(eval: dict) -> bool:
    if eval['language'] == 'python':
        return run_python_code(eval)
    else:
        raise Exception(f"Language '{eval['language']}' is not supported.")
>>>>>>> updated
```

Please note that the `run_python_code` function is a placeholder and does not perform any operation. You will need to replace it with the actual implementation.

USER: InvalidEditBlock: edit failed!

scripts/eval_tools.py does not contain the *exact sequence* of HEAD lines you specified.
Try again.
DO NOT skip blank lines, comments, docstrings, etc!
The HEAD block needs to be EXACTLY the same as the lines in scripts/eval_tools.py with nothing missing!

scripts/eval_tools.py does not contain these 13 exact lines in a row:
```
def assert_exists_in_source_code(fpath: Path, existing_code: str) -> bool:
    source_body = open(fpath).read()
    return source_body.find(existing_code) > -1

def run_python_code(source_file: Path) -> bool:
    # Placeholder function
    return True

def run_code_class_has_property(language: str, source_file: Path) -> bool:
    if language == 'python':
        return run_python_code(source_file)
    else:
        raise Exception(f"Language '{language}' is not supported.")
```

ASSISTANT: I apologize for the confusion. I see that the functions are not in sequence in the file. I will provide separate edit blocks for each function.

Here are the *edit blocks* for the changes:

```python
scripts/eval_tools.py
<<<<<<< HEAD
def assert_exists_in_source_code(fpath: Path, existing_code: str) -> bool:
    source_body = open(fpath).read()
    return source_body.find(existing_code) > -1
=======
def assert_exists_in_source_code(eval: dict) -> bool:
    source_body = open(eval['fpath']).read()
    return source_body.find(eval['existing_code']) > -1
>>>>>>> updated
```

```python
scripts/eval_tools.py
<<<<<<< HEAD
def run_python_code(source_file: Path) -> bool:
    # Placeholder function
    return True
=======
def run_python_code(eval: dict) -> bool:
    # Placeholder function
    return True
>>>>>>> updated
```

```python
scripts/eval_tools.py
<<<<<<< HEAD
def run_code_class_has_property(language: str, source_file: Path) -> bool:
    if language == 'python':
        return run_python_code(source_file)
    else:
        raise Exception(f"Language '{language}' is not supported.")
=======
def run_code_class_has_property(eval: dict) -> bool:
    if eval['language'] == 'python':
        return run_python_code(eval)
    else:
        raise Exception(f"Language '{eval['language']}' is not supported.")
>>>>>>> updated
```

* not unpacking objects early

* check evals working, it can test Python code functionally

* Report generating code done, ready to submit PR

* added existing code eval for web app

* added new eval, changed report to append mode

* refactored, created evals dir

* changed prompt back to the default

* clean up for black
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants