Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: aider crashes on files with non-ASCII characters (when building the repository map) #82

Closed
cgrothaus opened this issue Jul 11, 2023 · 5 comments

Comments

@cgrothaus
Copy link

cgrothaus commented Jul 11, 2023

Description

When aider attempts to call the run_ctags function on a git repository containing filenames with non-ASCII characters, it crashes with a FileNotFoundError. The error message indicates that the filename passed to os.path.getmtime in the run_ctags function contains escaped non-ASCII characters (\\303\\274), which is likely causing the issue.

Steps to Reproduce

  1. Create a git repository with a file with a non-ASCII character in the file or directory name (e.g., doc/fänny_dirname/README.md), or clone such a repository (demo repo: https://github.com/cgrothaus/sample-repo-demonstrate-aider-bug-special-filenames).
  2. Run aider on the repository.
  3. Run the /tokens command, which causes the repo map to be built.

aider crashes with this error output:

Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniconda/base/envs/aider/bin/aider", line 33, in <module>
    sys.exit(load_entry_point('aider-chat', 'console_scripts', 'aider')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/christoph.grothaus/projects/aider/main.py", line 371, in main
    coder.run()
  File "/Users/christoph.grothaus/projects/aider/coders/base_coder.py", line 382, in run
    new_user_message = self.run_loop()
                       ^^^^^^^^^^^^^^^
  File "/Users/christoph.grothaus/projects/aider/coders/base_coder.py", line 446, in run_loop
    return self.commands.run(inp)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/christoph.grothaus/projects/aider/commands.py", line 60, in run
    return self.do_run(matching_commands[0][1:], rest_inp)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/christoph.grothaus/projects/aider/commands.py", line 45, in do_run
    return cmd_method(args)
           ^^^^^^^^^^^^^^^^
  File "/Users/christoph.grothaus/projects/aider/commands.py", line 113, in cmd_tokens
    repo_content = self.coder.repo_map.get_repo_map(self.coder.abs_fnames, other_files)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/christoph.grothaus/projects/aider/repomap.py", line 107, in get_repo_map
    res = self.choose_files_listing(chat_files, other_files)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/christoph.grothaus/projects/aider/repomap.py", line 138, in choose_files_listing
    files_listing = self.get_ranked_tags_map(chat_files, other_files)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/christoph.grothaus/projects/aider/repomap.py", line 381, in get_ranked_tags_map
    ranked_tags = self.get_ranked_tags(chat_fnames, other_fnames)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/christoph.grothaus/projects/aider/repomap.py", line 289, in get_ranked_tags
    data = self.run_ctags(fname)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/christoph.grothaus/projects/aider/repomap.py", line 175, in run_ctags
    file_mtime = os.path.getmtime(filename)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen genericpath>", line 55, in getmtime
FileNotFoundError: [Errno 2] No such file or directory: '/Users/christoph.grothaus/projects/sample-repo-demonstrate-aider-bug-special-filenames/"doc/f\\303\\244nny_dirname/README.md"'

Expected Behavior

aider should correctly handle filenames with non-ASCII characters and not crash when calling the run_ctags function.

Actual Behavior

aider crashes with a FileNotFoundError when calling the run_ctags function on a repository containing filenames with non-ASCII characters. The error message indicates that the filename passed to os.path.getmtime contains escaped non-ASCII characters.

Possible Solution

Ensure that the filename is correctly encoded and escaped at all points in the code where it's used. This might involve changing how the filename is read from the file system, how it's stored in the cache, and how it's passed to the ctags command.

Additional Context

This issue was discovered during a chat session with aider. The issue occurs regardless of the specific non-ASCII characters in the filenames.

@cgrothaus
Copy link
Author

Note: I have close to no knowledge about python. This issue was largely written by ChatGPT, with me driving it via aider.

@paul-gauthier
Copy link
Collaborator

Thanks for trying aider, and sorry you ran into this issue. And thank you for such an excellent bug report including a sample repo to reproduce the bug!

I just made a PR that attempts to fix the root cause, which is that git ls-files mangles unicode filenames by default and escapes them:

$ git ls-files
README.md
"doc/f\303\244nny_dirname/README.md"
"doc/system\303\274berblick.md"

There is a git config setting to disable this behavior. Aider now invokes git -c core.quotepath=off ls-files to get unmangled filenames:

$ git -c core.quotepath=off ls-files
README.md
doc/fänny_dirname/README.md
doc/systemüberblick.md

@paul-gauthier
Copy link
Collaborator

Ok, that turned out to be way more complicated than expected on Windows. Thanks again for pointing out this issue. Hopefully the fixes should make aider much more robust to all sorts of unusual filenames.

@cgrothaus
Copy link
Author

Thanks for fixing it so quickly! I tested the new version, it works 🥳 .

aider is such an astounding thing, thank you for for creating it in the first place!

Regards from Germany

Christoph

@NickEdinburgh
Copy link

I'm still struggling with this same issue on MacOS. Aider works perfectly up until I add to the git repo, then it becomes unusable due to over 10k red alerts about encodings.

I've spent hours trying to fix the issue including eg with "git config --global core.quotepath off" and adding various directories by .gitignore. No joy.

As breadcrumbs, I've found a workaround is to individually add directories like my /scripts instead of using 'git add .'. It would be great if anyone comes up with further ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants