Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large uncommitted modifications make it impossible to use aider on a small file tracked in git #5

Closed
mobyvb opened this issue May 19, 2023 · 7 comments

Comments

@mobyvb
Copy link
Contributor

mobyvb commented May 19, 2023

Reproduction steps:

  • Pick any git repo and run aider smallfile, where smallfile is already committed and there are no uncommitted changes - you should successfully end up at the prompt smallfile>
  • Shut down aider
  • Modify some file other than smallfile and add or remove a bunch of lines (enough changes that there will be too many tokens in the prompt)
  • Run aider smallfile again - this time, you should see an error that looks like the following:
<large diff>
<big error message>
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 9132 tokens. Please reduce the length of the messages.

Current workaround:

cp smallfile untrackedsmallfile
aider untrackedsmallfile

^and since it is not tracked, the diff is irrelevant. I think it would be nice if this was still possible if the file was tracked
e.g. if the diff is too large, forget about committing files that were not explicitly mentioned in the aider invocation

Feel free to close this issue if you do not feel like this is a priority. The current workaround is fine for me at the moment. And it is simple enough to just make sure not to have uncommitted changes before running aider

@mobyvb
Copy link
Contributor Author

mobyvb commented May 19, 2023

also, I would be happy to implement a fix for this issue myself if you decide that it is in line with your vision for this project @paul-gauthier

@paul-gauthier
Copy link
Collaborator

aider notices if your repo is dirty, and asks if you want to auto commit the changes. You can suppress this behavior with the --no-auto-commit cmd line arg. But this will also stop auto committing the changes that aider itself makes.

The problem you are encountering is that Coder.get_commit_message() uses gpt-3.5-turbo to summarize the diffs into a sensible commit message. If the diff is large, it exceeds the 4K context window.

A simple fix would be:

  1. Check if the diff is close to or larger than 4 kbytes (4k tokens * 4 bytes/token) and don't even bother sending the messages to gpt. Just return.
  2. Catch the openai.error.InvalidRequestError and just return.

Maybe print a nice tool_error() message for each of these.

Actually, after typing the above I pasted it into aider and it fixed itself:
32e40a3

@paul-gauthier
Copy link
Collaborator

Let me know if the latest version solves the problem for you.

@mobyvb
Copy link
Contributor Author

mobyvb commented May 19, 2023

Let me know if the latest version solves the problem for you.

I will check and confirm later today. Thanks

@mobyvb
Copy link
Contributor Author

mobyvb commented May 19, 2023

@paul-gauthier hmm it doesn't seem like the latest version solves the issue for me, with or without the --no-auto-commit flag.

I actually just noticed that reproducing this issue does not even require unstaged git changes.
I was able to cause it with a few simple steps:

  1. check out very large repo (I used https://github.com/storj/storj)
  2. without making any modifications from HEAD, open aider on a small file, (I used aider private/apigen/common.go and aider private/apigen/common.go --no-auto-commit)
  3. make a request, e.g. "remove all comments"
    My output:
Traceback (most recent call last):
  File "/home/moby/.local/bin/aider", line 33, in <module>
    sys.exit(load_entry_point('aider', 'console_scripts', 'aider')())
  File "/home/moby/dev/storj/aider/aider/main.py", line 138, in main
    coder.run()
  File "/home/moby/dev/storj/aider/aider/coder.py", line 217, in run
    new_user_message = self.run_loop()
  File "/home/moby/dev/storj/aider/aider/coder.py", line 270, in run_loop
    return self.send_new_user_message(inp)
  File "/home/moby/dev/storj/aider/aider/coder.py", line 287, in send_new_user_message
    content, interrupted = self.send(messages)
  File "/home/moby/dev/storj/aider/aider/coder.py", line 376, in send
    completion = openai.ChatCompletion.create(
  File "/home/moby/.local/lib/python3.10/site-packages/openai/api_resources/chat_completion.py", line 25, in create
    return super().create(*args, **kwargs)
  File "/home/moby/.local/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
    response, _, api_key = requestor.request(
  File "/home/moby/.local/lib/python3.10/site-packages/openai/api_requestor.py", line 230, in request
    resp, got_stream = self._interpret_response(result, stream)
  File "/home/moby/.local/lib/python3.10/site-packages/openai/api_requestor.py", line 624, in _interpret_response
    self._interpret_response_line(
  File "/home/moby/.local/lib/python3.10/site-packages/openai/api_requestor.py", line 687, in _interpret_response_line
    raise self.handle_error_response(
openai.error.InvalidRequestError: This model's maximum context length is 8192 tokens. However, your messages resulted in 32700 tokens. Please reduce the length of the messages.

It is worth noting that the repo I'm using is large enough that a list of filenames alone is sufficient to exceed the token limit. I'll look deeper into the internals of aider in case I can figure out a potential workaround.

@paul-gauthier
Copy link
Collaborator

I think this is a new issue you're seeing, likely caused by the giant file list as you say.

I just added a check, and aider will only send the file list to gpt if it is less than 100 files. In that case, gpt will only know about the specific files you add to the chat. It won't even know about the existence of other files in the repo, so it won't be able to ask to see any that are relevant.

But ya, if you add the files it needs to the chat then this should work better on large repos.

I am also working on a much more ambitious attempt to manage the limited context window size. But so far it's just some early experiments.

@mobyvb
Copy link
Contributor Author

mobyvb commented May 20, 2023

awesome, this works perfectly! I'll close this issue. Thanks!
Looking forward to seeing your experiments to manage the context window size. I've had some ideas about that for my own project but haven't spent any time on it yet.

@mobyvb mobyvb closed this as completed May 20, 2023
paul-gauthier added a commit that referenced this issue May 15, 2024
# Aider chat conversation:

USER: will this pickup the github pages jekyll site that is in this repy? Like the _layouts _data _posts etc dirs?

ASSISTANT: Yes, the current Dockerfile and scripts should pick up the GitHub Pages Jekyll site, including directories like `_layouts`, `_data`, `_posts`, etc., because the `COPY . /srv/jekyll` command in the Dockerfile copies the entire current directory (which should include those directories) into the Docker container.

However, to ensure everything works correctly, let's make sure the Dockerfile and scripts are set up to handle the Jekyll site properly. Here are the steps:

1. Ensure the Dockerfile is correctly set up to copy the entire site and install dependencies.
2. Ensure the build and run scripts are correctly set up to build the Docker image and run the container.

Let's review the current state of the files and make any necessary adjustments.

scripts/Dockerfile
```python
# Use the official Jekyll image from Docker Hub
FROM jekyll/jekyll:latest

# Set the working directory
WORKDIR /srv/jekyll

# Copy the current directory contents into the container at /srv/jekyll
COPY . /srv/jekyll

# Install any needed packages specified in Gemfile
RUN bundle install

# Expose port 4000 to the host
EXPOSE 4000

# Run Jekyll server
CMD ["jekyll", "serve", "--host", "0.0.0.0"]
```

scripts/build.sh
```python
#!/bin/bash

# Build the Docker image
docker build -t my-jekyll-site -f scripts/Dockerfile .
```

scripts/run.sh
```python
#!/bin/bash

# Run the Docker container
docker run --rm -v "$PWD:/srv/jekyll" -p 4000:4000 my-jekyll-site
```

These files should work as expected. The `COPY . /srv/jekyll` command in the Dockerfile will copy all the necessary Jekyll site files, including `_layouts`, `_data`, `_posts`, etc., into the Docker container. The `bundle install` command will install any dependencies specified in the `Gemfile`.

To verify, you can run the following commands in your project directory:

```sh
bash scripts/build.sh
bash scripts/run.sh
```

This will build the Docker image and start a container that serves your Jekyll site on `http://localhost:4000`. If you encounter any issues, please let me know!

USER: I ran this command:

 bash scripts/build.sh

And got this output:

#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 435B done
#1 DONE 0.0s

#2 [internal] load .dockerignore
#2 transferring context: 101B 0.0s done
#2 DONE 0.0s

#3 [internal] load metadata for docker.io/jekyll/jekyll:latest
#3 DONE 0.6s

#4 [1/4] FROM docker.io/jekyll/jekyll:latest@sha256:400b8d1569f118bca8a3a09a25f32803b00a55d1ea241feaf5f904d66ca9c625
#4 DONE 0.0s

#5 [internal] load build context
#5 transferring context: 250.53kB 0.2s done
#5 DONE 0.2s

#6 [2/4] WORKDIR /srv/jekyll
#6 CACHED

#7 [3/4] COPY . /srv/jekyll
#7 CACHED

#8 [4/4] RUN bundle install
#8 6.277 Could not locate Gemfile
#8 ERROR: process "/bin/sh -c bundle install" did not complete successfully: exit code: 10
------
 > [4/4] RUN bundle install:
#8 6.277 Could not locate Gemfile
------
Dockerfile:11
--------------------
   9 |
  10 |     # Install any needed packages specified in Gemfile
  11 | >>> RUN bundle install
  12 |
  13 |     # Expose port 4000 to the host
--------------------
ERROR: failed to solve: process "/bin/sh -c bundle install" did not complete successfully: exit code: 10

ASSISTANT: The error indicates that the `Gemfile` is not being found in the context being copied into the Docker container. This could be because the `Gemfile` is not in the root directory of your project or it might be missing.

To resolve this, we need to ensure that the `Gemfile` is present in the root directory of your project. If it is not, you should add it. If it is present, we need to make sure it is being copied correctly.

Let's first verify that the `Gemfile` is in the root directory of your project. If it is not, please add it. If it is, we can proceed with the current Dockerfile and scripts.

Assuming the `Gemfile` is present in the root directory, the Dockerfile should work as expected. If the `Gemfile` is in a different directory, please let me know its location so we can adjust the Dockerfile accordingly.

If the `Gemfile` is missing, you can create a basic one with the following content:

Gemfile
```ruby
source 'https://rubygems.org'
gem 'jekyll'
```

Once the `Gemfile` is in place, you can re-run the build script:

```sh
bash scripts/build.sh
```

If you encounter any further issues, please let me know!

USER: please make it

ASSISTANT: Let's create the `Gemfile` in the root directory of your project.

Gemfile
```ruby
<<<<<<< SEARCH
=======
source 'https://rubygems.org'
gem 'jekyll'
>>>>>>> REPLACE
```

This `Gemfile` specifies the source for Ruby gems and includes the `jekyll` gem, which is necessary for your Jekyll site.

After adding this `Gemfile`, you can re-run the build script:

```sh
bash scripts/build.sh
```

This should resolve the issue with the missing `Gemfile` and allow the Docker image to be built successfully. If you encounter any further issues, please let me know!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants