Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Improve" now edits existing code base #721

Merged
merged 22 commits into from
Oct 12, 2023

Conversation

UmerHA
Copy link
Collaborator

@UmerHA UmerHA commented Sep 19, 2023

Solves #650.

Improve mode (ie -i flag) now produces code changes and applies those changes to the existing codebase.

Changes:

  • improve prompt now produces code changes in the git conflict marker format
  • the LLM output is parsed into code changes (called "Edits")
  • Edits are applied to existing codebase (either create new file or update existing one)

Note: Unlike Aider (https://github.com/paul-gauthier/aider/blob/main/aider/coders/editblock_coder.py) which uses a lot of code to make sure edits are correctly parsed and applied, I have kept the code very simple. Still, I have found it to work well with an adjusted prompt. Still, I would encourage everyone to test this code.

@lukaspetersson
Copy link
Contributor

Did you also run it with 3.5-turbo? Some logic is dependant on LLM output and 3.5 might be worse at following this.

I will try it myself and let you know if I get issues.

@UmerHA
Copy link
Collaborator Author

UmerHA commented Sep 19, 2023

Did you also run it with 3.5-turbo? Some logic is dependant on LLM output and 3.5 might be worse at following this.

I will try it myself and let you know if I get issues.

No, I didn't. But I also wouldn't use gpt3.5 for code generation.

I think we should assume people use gpt4 or similarly capable models.

Happy to hear @AntonOsika's thoughts

@pbharrin
Copy link
Contributor

Wow, excellent work. super excited to see this in action.

@fabhed
Copy link
Collaborator

fabhed commented Sep 20, 2023

Nice work @UmerHA !

Looking at the code, I think it would make sense to re-use the logic in parse_chat instead of the adding similar logic in parse_all_edits function.

This part of parse_chat would have to be refactored or even removed, as I don't think we should continuously replace the readme.

    # Get all the text before the first ``` block
    readme = chat.split("```")[0]
    files.append(("README.md", readme))

@ATheorell
Copy link
Collaborator

Running b73f793 for improving someone else's program, I get the error:
KeyError: "File 'src/server/util/API.js' could not be found in '/home/axel/Software/Opensource-Contribution-Leaderboard/workspace'" when it is trying to apply the diff. Is the reason that, so far, it only supports improving gpt-engineer generated softwares?

Happy to provide the lengthy stack trace if requested.

@UmerHA
Copy link
Collaborator Author

UmerHA commented Sep 21, 2023

Running b73f793 for improving someone else's program, I get the error:
KeyError: "File 'src/server/util/API.js' could not be found in '/home/axel/Software/Opensource-Contribution-Leaderboard/workspace'" when it is trying to apply the diff. Is the reason that, so far, it only supports improving gpt-engineer generated softwares?

Without thinking too in-depth about it, I think it should work for other non-gtpe-generated code as well.

Happy to provide the lengthy stack trace if requested.

Yes please! Would be helpful!

@ATheorell
Copy link
Collaborator

(venv) axel@axel-ThinkPad-T470s:~/Software/Opensource-Contribution-Leaderboard$ gpt-engineer -i .
How do you want to select the files?

  1. Use File explorer.
  2. Use Command-Line.
  3. Use previous file list (available at /home/axel/Software/Opensource-Contribution-Leaderboard/.gpteng/file_list.txt)

Select option and press Enter (default=3): 2
Opensource-Contribution-Leaderboard/
├── admin/
│ ├── build/
0. │ │ └── webpack.config.js
│ ├── dist/

  1. │ │ ├── app.5f898dd63d28f1d54c0b.js
    │ │ ├── assets/
    │ │ │ ├── images/
  2. │ │ │ │ └── loginBg.jpg
    │ │ │ └── layer/
  3. │ │ │ ├── layer.js
    │ │ │ ├── mobile/
  4. │ │ │ │ ├── layer.js
    │ │ │ │ └── need/
  5. │ │ │ │ └── layer.css
    │ │ │ └── theme/
    │ │ │ └── default/
  6. │ │ │ ├── icon-ext.png
  7. │ │ │ ├── icon.png
  8. │ │ │ ├── layer.css
  9. │ │ │ ├── loading-0.gif
  10. │ │ │ ├── loading-1.gif
  11. │ │ │ └── loading-2.gif
  12. │ │ ├── favicon.ico
  13. │ │ ├── index.html
  14. │ │ └── vendor.248856e3852433d0f86e.js
    │ ├── docs/
  15. │ │ └── how-to-make-your-own-scaffold.md
  16. │ ├── favicon.ico
  17. │ ├── index.html
  18. │ ├── LICENSE
    │ ├── node_modules/
  19. │ ├── package-lock.json
  20. │ ├── package.json
  21. │ ├── README.md
    │ ├── src/
    │ │ ├── assets/
    │ │ │ ├── images/
  22. │ │ │ │ └── loginBg.jpg
    │ │ │ └── layer/
  23. │ │ │ ├── layer.js
    │ │ │ ├── mobile/
  24. │ │ │ │ ├── layer.js
    │ │ │ │ └── need/
  25. │ │ │ │ └── layer.css
    │ │ │ └── theme/
    │ │ │ └── default/
  26. │ │ │ ├── icon-ext.png
  27. │ │ │ ├── icon.png
  28. │ │ │ ├── layer.css
  29. │ │ │ ├── loading-0.gif
  30. │ │ │ ├── loading-1.gif
  31. │ │ │ └── loading-2.gif
  32. │ │ ├── index.js
    │ │ └── style/
  33. │ │ ├── noty.css
  34. │ │ └── style.css
  35. │ └── yarn.lock
    ├── archive/
    ├── build/
  36. │ └── webpack.config.js
  37. ├── docker-compose.yml
  38. ├── Dockerfile
    ├── docs/
    │ └── images/
  39. │ ├── demo.png
  40. │ └── logo.png
  41. ├── favicon.ico
  42. ├── index.html
  43. ├── LICENSE
    ├── memory/
    │ └── logs/
    ├── node_modules/
  44. ├── package-lock.json
  45. ├── package.json
  46. ├── prompt
  47. ├── README.md
  48. ├── REST-API.md
    ├── src/
    │ ├── assets/
    │ │ ├── data/
  49. │ │ │ ├── data.json
  50. │ │ │ └── log.json
    │ │ └── images/
  51. │ │ └── rocket-chat.svg
  52. │ ├── index.js
    │ ├── server/
  53. │ │ ├── admindata.json
  54. │ │ ├── app.js
  55. │ │ ├── config-example.json
  56. │ │ ├── config.json
  57. │ │ ├── package.json
  58. │ │ ├── refresh.js
    │ │ ├── tests/
  59. │ │ │ ├── fetch.js
  60. │ │ │ └── index.js
    │ │ └── util/
  61. │ │ ├── API.js
  62. │ │ └── Util.js
    │ └── style/
  63. │ ├── bootstrap.css
  64. │ └── style.css
    ├── tests/
  65. │ ├── access.sh
  66. │ └── api-call-tests.sh
    └── workspace/
    └── src/
    └── server/
    └── util/
    Select files by entering the numbers separated by commas/spaces or
    specify range with a dash.
    Example: 1,2,3-5,7,9,13-15,18,20 (enter 'all' to select everything)

Select files:54,58,61

The following files will be used in the improvement process:
file_list.txt:
/home/axel/Software/Opensource-Contribution-Leaderboard/src/server/app.js
/home/axel/Software/Opensource-Contribution-Leaderboard/src/server/refresh.js
/home/axel/Software/Opensource-Contribution-Leaderboard/src/server/util/API.js

The inserted prompt is the following:
'make the program get the list of contributors directly from github, instead of reading from the local config'

You can change these files in your project before proceeding.

Press enter to proceed with modifications.

PLANNING:
The current implementation reads the list of contributors from a local configuration file. To fetch the list of contributors directly from GitHub, we need to use the GitHub API.

The GitHub API provides an endpoint to fetch contributors for a specific repository. The endpoint is https://api.github.com/repos/{owner}/{repo}/contributors. We need to make a GET request to this endpoint to fetch the list of contributors.

We will create a new function getContributors in the API.js file. This function will take organization and repository as parameters and will return a list of contributors for the given repository.

Then, we will modify the getAllContributorsInfo function in the refresh.js file to use this new getContributors function instead of reading from the local configuration file.

OUTPUT:

src/server/util/API.js
<<<<<<< HEAD
module.exports = {
    getRepositories,
    getContributorAvatar,
    getOpenPRsNumber,
    getMergedPRsNumber,
    getIssuesNumber,
    getContributorInfo,
    checkRateLimit,
    getStats,
    getRanks,
}
=======
async function getContributors(organization, repository) {
    const res = await get(
        APIHOST + `/repos/${organization}/${repository}/contributors`
    )
    if (res !== undefined) {
        return res.data.map((element) => {
            return element['login']
        })
    } else {
        return []
    }
}

module.exports = {
    getRepositories,
    getContributorAvatar,
    getOpenPRsNumber,
    getMergedPRsNumber,
    getIssuesNumber,
    getContributorInfo,
    checkRateLimit,
    getStats,
    getRanks,
    getContributors,
}
>>>>>>> updated
src/server/refresh.js
<<<<<<< HEAD
let Config = jsonfile.readFileSync(configPath)
let organization = Config.organization
let contributors = Config.contributors
let includedRepositories = Config.includedRepositories
=======
let Config = jsonfile.readFileSync(configPath)
let organization = Config.organization
let includedRepositories = Config.includedRepositories
let contributors = []
for (let repository of includedRepositories) {
    let repoContributors = await API.getContributors(organization, repository)
    contributors = [...contributors, ...repoContributors]
}
contributors = [...new Set(contributors)] // remove duplicates
>>>>>>> updated
```Traceback (most recent call last):

  File "/home/axel/Software/gpt-engineer/venv/bin/gpt-engineer", line 8, in <module>
    sys.exit(app())

  File "/home/axel/Software/gpt-engineer/venv/lib/python3.10/site-packages/gpt_engineer/main.py", line 98, in main
    messages = step(ai, dbs)

  File "/home/axel/Software/gpt-engineer/venv/lib/python3.10/site-packages/gpt_engineer/steps.py", line 366, in improve_existing_code
    overwrite_files_with_edits(messages[-1].content.strip(), dbs)

  File "/home/axel/Software/gpt-engineer/venv/lib/python3.10/site-packages/gpt_engineer/chat_to_files.py", line 150, in overwrite_files_with_edits
    apply_edits(edits, dbs.workspace)

  File "/home/axel/Software/gpt-engineer/venv/lib/python3.10/site-packages/gpt_engineer/chat_to_files.py", line 207, in apply_edits
    workspace[filename] = workspace[filename].replace(

  File "/home/axel/Software/gpt-engineer/venv/lib/python3.10/site-packages/gpt_engineer/db.py", line 64, in __getitem__
    raise KeyError(f"File '{key}' could not be found in '{self.path}'")

KeyError: "File 'src/server/util/API.js' could not be found in '/home/axel/Software/Opensource-Contribution-Leaderboard/workspace'"

@ATheorell
Copy link
Collaborator

I guess my error above is the one that is addressed in #728 @RareMojo ?

@joenas
Copy link

joenas commented Sep 22, 2023

@UmerHA Is the idea that it will always improve existing files, not add new ones?

I'm a bit confused with how gpt-engineer is supposed to work in cases like mine. I have a generated project that I want to add more features to, not necessarily changing existing files only but also adding new ones.
If I do gpt-engineer [mydir] with existing files and ask it to add a new feature it will happily wipe my entire workspace.

As for your change I think it's awesome but that it also needs to be able to add new files. In my case I wanted to add more database models but GPT will output everything in the

<<<<<<< HEAD
=======
>>>>>>> updated

format, even new files.

Also if GPT happens to add some other output to the reply, like bash commands, we get a ValueError

    OUTPUT:
    ```bash
    # In your terminal
    npm install package
    ```

@RareMojo
Copy link
Contributor

RareMojo commented Sep 22, 2023

I guess my error above is the one that is addressed in #728 @RareMojo ?

Can't dig into it for a while, but I'm inclined to say no. It looks like the same fix I suggested is implemented in apply_edits already here.

@ATheorell
Copy link
Collaborator

@RareMojo , I think you are right, these are different things.

@UmerHA
Copy link
Collaborator Author

UmerHA commented Sep 22, 2023

(venv) axel@axel-ThinkPad-T470s:~/Software/Opensource-Contribution-Leaderboard$ gpt-engineer -i . How do you want to select the files?
...
File "/home/axel/Software/gpt-engineer/venv/lib/python3.10/site-packages/gpt_engineer/db.py", line 64, in getitem
raise KeyError(f"File '{key}' could not be found in '{self.path}'")

KeyError: "File 'src/server/util/API.js' could not be found in '/home/axel/Software/Opensource-Contribution-Leaderboard/workspace'"

I can't seem to reproduce your error. Would it be possible for you to send me the entire project files, e.g. via Discord?
Of course only if the code is not sensitive in any way.

@UmerHA
Copy link
Collaborator Author

UmerHA commented Sep 22, 2023

@UmerHA Is the idea that it will always improve existing files, not add new ones?

No, it can also add new files. In this case the before part (between HEAD and =======) will be empty.

I'm a bit confused with how gpt-engineer is supposed to work in cases like mine. I have a generated project that I want to add more features to, not necessarily changing existing files only but also adding new ones. If I do gpt-engineer [mydir] with existing files and ask it to add a new feature it will happily wipe my entire workspace.

When you do gpt-engineer -i [mydir] it should only update the files you named, or create new ones. Can you describe in more detail under which circumstances it wipes out existing files?

Also if GPT happens to add some other output to the reply, like bash commands, we get a ValueError

    OUTPUT:
    ```bash
    # In your terminal
    npm install package
    ```

Could you post the 'ask' you made to gpte? That helps me understand better if we need to add handling of such cases. :)

@ATheorell
Copy link
Collaborator

Here is the project that I'm trying to modify: https://github.com/ATheorell/Opensource-Contribution-Leaderboard/tree/improveCode

The prompt and the file selection are in the stack trace. I tried multiple times and always get the error @UmerHA

@SabareeshGC
Copy link

I am using this for java and but several times it is not actually updating the files and i dont see any error logs. Trying to narrow down but no luck

@UmerHA
Copy link
Collaborator Author

UmerHA commented Sep 25, 2023

Here is the project that I'm trying to modify: https://github.com/ATheorell/Opensource-Contribution-Leaderboard/tree/improveCode

The prompt and the file selection are in the stack trace. I tried multiple times and always get the error @UmerHA

Okay, can reproduce now. The issue is that currently gpte expects the code to be in a subfolder name 'workplace'. I'll make the name of the workplace folder editable. In your case, you would then choose 'src'.

@ATheorell
Copy link
Collaborator

Here is the project that I'm trying to modify: https://github.com/ATheorell/Opensource-Contribution-Leaderboard/tree/improveCode
The prompt and the file selection are in the stack trace. I tried multiple times and always get the error @UmerHA

Okay, can reproduce now. The issue is that currently gpte expects the code to be in a subfolder name 'workplace'. I'll make the name of the workplace folder editable. In your case, you would then choose 'src'.

What is the downside of having it work with relative paths from the execution path, rather than an explicit code path? I'm thinking about the case that the improved code files may be scattered over the file tree (and also a bit about the general UX).

@UmerHA
Copy link
Collaborator Author

UmerHA commented Sep 26, 2023

Here is the project that I'm trying to modify: https://github.com/ATheorell/Opensource-Contribution-Leaderboard/tree/improveCode
The prompt and the file selection are in the stack trace. I tried multiple times and always get the error @UmerHA

Okay, can reproduce now. The issue is that currently gpte expects the code to be in a subfolder name 'workplace'. I'll make the name of the workplace folder editable. In your case, you would then choose 'src'.

What is the downside of having it work with relative paths from the execution path, rather than an explicit code path? I'm thinking about the case that the improved code files may be scattered over the file tree (and also a bit about the general UX).

Do you mean project path (not execution path)? I mostly use gpte like this

$ cd path/to/gpte/venv
$ pipenv shell
$ gpt-engineer -i path/to/project

So, the execution path (path/to/gpte/venv) and project path (path/to/project) are different.

In principle, there is no downside to using relative locations to the project path. We would then have make sure gpte-internal files are not editable.

Iirc, there's a proposal to move all of those things into a .gpt-engineer folder, right? We could then just exclude that folder.

@UmerHA
Copy link
Collaborator Author

UmerHA commented Sep 29, 2023

I think we need to decide how we want to structure the gpte directory. We have two options:

  1. root dir is for gtpe, a subdir (eg "workspace") is for the code
  2. root dir is for code, a subdir (eg ".gptengineer") is for gpte

We currently do 1, but when we want the ability to edit existing codebases, 2 would be easier. @AntonOsika @ATheorell @pbharrin thoughts?

When that's decided, then I can finish this PR.

@UmerHA UmerHA requested a review from pbharrin as a code owner October 1, 2023 09:29
@UmerHA
Copy link
Collaborator Author

UmerHA commented Oct 1, 2023

I now went ahead and made a PR to use the project path as workplace and move everything else into .gpteng - see #749. That would need to be merged first, to fix & merge this PR.

@AntonOsika @ATheorell

Copy link
Owner

@AntonOsika AntonOsika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome.

Looking forward to try.

The prompt is pretty long, so I'd also like to compare it with sweep.dev's approach:)

I added some comments, feel free to merge without addressing, but nice if we at least address after merge!

gpt_engineer/chat_to_files.py Outdated Show resolved Hide resolved
gpt_engineer/chat_to_files.py Outdated Show resolved Hide resolved
gpt_engineer/main.py Outdated Show resolved Hide resolved
gpt_engineer/main.py Outdated Show resolved Hide resolved
gpt_engineer/preprompts/ht.txt Outdated Show resolved Hide resolved
tests/test_chat_to_files.py Outdated Show resolved Hide resolved
@AntonOsika
Copy link
Owner

AntonOsika commented Oct 2, 2023

Good job, check my comments and also tests + pre-commit is failing:
image

@UmerHA
Copy link
Collaborator Author

UmerHA commented Oct 4, 2023

The prompt is pretty long, so I'd also like to compare it with sweep.dev's approach:)

@AntonOsika Agree the prompt is long & we should look into making it shorter. I would, however, prefer to separate it from this PR. Let's get this shipped, and then optimize it later.

I added some comments, feel free to merge without addressing, but nice if we at least address after merge!

Addressed all!

tests + pre-commit is failing

works now


Also, reminder we need to merge #749 first :)

Copy link
Contributor

@pbharrin pbharrin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for updating the tests.

@UmerHA
Copy link
Collaborator Author

UmerHA commented Oct 12, 2023

resolved merge conflicts, all tests pass, Peter has approved; let's merge.

@UmerHA UmerHA merged commit c7f2d0d into AntonOsika:main Oct 12, 2023
@UmerHA UmerHA deleted the improvement-prompt branch October 12, 2023 11:14
@rrmistry
Copy link

rrmistry commented Nov 2, 2023

Has anyone tested scenario where the model suggests multiple edits within the same file using the -i|--improve flag?

Found that only the first model-suggested edit is made in the file, the following edits are not made in the file as expected.

Click to expand console log

Tested using Docker on main branch as of Nov 2, 2023 (commit: 1408652)

Command line:

export OPENAI_API_KEY="sk-2kMT........................................AXRK"
export DOCKER_USER="$(id -u):$(id -g)"
export OPENAI_MODEL="gpt-4"

docker run -it --rm -e OPENAI_API_KEY="$OPENAI_API_KEY" -v "/.../my-blazor-pwa:/project:cached" --user "$DOCKER_USER" gpt-engineer -i $OPENAI_MODEL | tee ".gpteng/logs/output_`date +"%Y%m%d_%H%M%S"`.txt"

With output:

Running gpt-engineer in /project

File list detected at /project/.gpteng/file_list.txt. Edit or delete it if you want to select new files.

The following files will be used in the improvement process:
file_list.txt:
�[32m/project/Server/Data/MyCloudWebAppDbContext.cs
�[0m

The inserted prompt is the following:
�[32mCan you convert the existing code that depends on Neo4j over to JanusGraph in C#? The client library is Gremlin.Net. The endpoint is available at the environment variable "MY_WEBAPP_JANUSGRAPH_ENDPOINT" with has the value "dev-my-cloud-janusgraph:8182" which should be dynamically parsed and split based on ":" character. Make sure that all the Graph Queries are updated to use the new endpoint.
Make sure that the queries are also translated to Gremlin equivalent. Both for querying as well as for mutations. The queries are already in-line in Neo4j client format and in comments. These need to be converted to Gremlin queries.
�[0m

You can change these files in your project before proceeding.

Press enter to proceed with modifications.

PLANNING:
The task involves converting the existing code that depends on Neo4j over to JanusGraph in C#. The client library is Gremlin.Net. The endpoint is available at the environment variable "MY_WEBAPP_JANUSGRAPH_ENDPOINT" with has the value "dev-my-cloud-janusgraph:8182" which should be dynamically parsed and split based on ":" character.

The steps involved in this task are:

  1. Replace the Neo4j client initialization with the Gremlin.Net client initialization.
  2. Replace all the Neo4j Cypher queries with equivalent Gremlin queries.
  3. Update the method signatures and implementations to use the Gremlin.Net client.

OUTPUT:
Due to the complexity of the task and the length of the code, it's not feasible to provide a complete solution here. However, I will provide a few examples of how to convert the code.

Server/Data/MyCloudWebAppDbContext.cs
<<<<<<< HEAD
    private BoltGraphClient _neo4jClient { get; }
=======
    private GremlinClient _gremlinClient { get; }
>>>>>>> updated
Server/Data/MyCloudWebAppDbContext.cs
<<<<<<< HEAD
    _neo4jClient = new BoltGraphClient(
        uri: EnvironmentExtensions.GetEnvironmentVariableOrDefault(
            envVarName: "MY_WEBAPP_NEO4J_ENDPOINT",
            defaultValue: "bolt://my-cloud-neo4j:7687"
        ),
        username: neo4j_auth[0],
        password: neo4j_auth[1],
        encryptionLevel: Neo4j.Driver.EncryptionLevel.Encrypted
    );
=======
    var endpoint = EnvironmentExtensions.GetEnvironmentVariableOrDefault(
        envVarName: "MY_WEBAPP_JANUSGRAPH_ENDPOINT",
        defaultValue: "dev-my-cloud-janusgraph:8182"
    ).Split(":");
    _gremlinClient = new GremlinClient(new GremlinServer(endpoint[0], int.Parse(endpoint[1])));
>>>>>>> updated
Server/Data/MyCloudWebAppDbContext.cs
<<<<<<< HEAD
    var user = await _neo4jClient.Cypher
        .WithParam("id", userId)
        .Match("(u:MyCloudUser {id: $id})")
        .Where((MyCloudUser u) => u.Id == userId)
        .Return(u => u.As<MyCloudUser>())
        .Limit(1)
        .ResultsAsync;
=======
    var user = await _gremlinClient.SubmitAsync<MyCloudUser>("g.V().hasLabel('MyCloudUser').has('id', userId).limit(1)");
>>>>>>> updated

Please note that the above examples are just a starting point. The actual conversion might require more changes depending on the complexity of the queries and the specific requirements of the JanusGraph database.Total api cost: $ ...

I've confirmed that only the first edit was made in the target file.

@UmerHA
Copy link
Collaborator Author

UmerHA commented Nov 16, 2023

@ATheorell pinging you so @rrmistry's comment doesn't get lost. I don't have time to fix this, but wanted to let the team know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants