Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filters don't work correctly when a search path is specified #278

Open
nu4nu opened this issue Dec 11, 2016 · 5 comments
Open

filters don't work correctly when a search path is specified #278

nu4nu opened this issue Dec 11, 2016 · 5 comments
Labels
bug A bug. gitignore Bugs related to gitignore problems. icebox A feature that is recognized as possibly desirable, but is unlikely to implemented any time soon.

Comments

@nu4nu
Copy link

nu4nu commented Dec 11, 2016

ripgrep seems to ignore unintended files when the base directory is not the current directory.

How to reproduce:

% mkdir -p ../test/foo
% echo '.*' >../test/.gitignore
% echo baz >../test/foo/bar

% rg baz ../test
../test/foo/bar
1:baz

% rg baz ../test/foo
No files were searched, which means ripgrep probably applied a filter you didn't expect. Try running again with --debug.

I found the following lines in ignore/src/dir.rs don't make sense because path on the right side is a relative path from the current directory, not from abs_parent_path.

339         if let Some(abs_parent_path) = self.absolute_base() {
340             let path = abs_parent_path.join(path);

A possible fix to run the example above correctly is the following. However, I'm not sure this fixes other cases.

let path = path.canonicalize().unwrap()
@BurntSushi BurntSushi added the bug A bug. label Jan 10, 2017
@BurntSushi
Copy link
Owner

BurntSushi commented Jan 11, 2017

This is a tough one to fix.

First and foremost, the proposed solution can never work. Canonicalizing candidate paths does seem like the obvious solution---and I've been tempted to do it because it simplifies so much---but it has two significant problems:

  1. Canonicalization erases symlinks and ignore files are symlink aware. If we erased symlinks, lots of ignore files would break.
  2. Canonicalization is expensive.

Here are the key values in this particular case, assuming we've reproduced your test case inside /tmp/ripgrep-278, and searching from inside /tmp/ripgrep-278/nested for ../test/foo:

  • The absolute base path is /tmp/ripgrep-278/test/foo, which is derived by canonicalizing the search path ../test/foo.
  • The root directory that sets the context for the .gitignore file is /tmp/ripgrep-278/test.
  • The candidate file path is ../test/foo/bar.
  • The path that we actually search is foo/../test/foo/bar, which is generated by joining the absolute base path (/tmp/ripgrep-278/test/foo) with the candidate file path (../test/foo/bar) and then stripping the gitignore directory path prefix (/tmp/ripgrep-278/test).

That last step is where things go bad. In particular, the text that we actually match should be just foo/bar.

More generally, relative components like . and .. should probably never be present when doing a match, which potentially suggests that we need a realpath that does what it does today, but leaves symlinks in tact. However, doing that for every single file path is a non-starter. It would be too expensive.

Part of the problem here is that we're canonicalizing the parent directory paths, which is one way of traversing up the file hierarchy to find parent gitignore files. Doing this canonicalization inherently makes the parent directory paths get out of sync with the search paths, which is what causes the last step above to fail. If we found a way to go up the tree without canonicalization, then perhaps that would fix this. Let's try:

  • The absolute base path is ../test/foo, which is taken directly from the search path ../test/foo since ../test/foo is a directory.
  • The root directory that sets the context for the .gitignore file is ../test.
  • The candidate file path is ../test/foo/bar.
  • The path that we actually search is foo/bar, which is generated by simply stripping the gitignore directory path prefix (../test) from the candidate file path (../test/foo).

The interesting bits come from parent directories at .. and above. You wind up with absolute base paths like ../../../, which will infect the gitignore context directories. This means that a gitignore context directory at ../.. and above will never be a prefix of a candidate file path, which makes matching problematic. However, if we canonicalized file paths at .. and above and synced this up with our candidate file paths, then I think that could be make to work.


I'm not sure when I'm likely to fix this, but I thought it'd be useful to try to write up the problem and a possible solution.

Sorry it took me so long to get to this, but I had a feeling this was going to be a fairly intricate bug, so I stayed away. :-(

@BurntSushi
Copy link
Owner

There is one other solution. Notably, this command gives the correct output:

$ rg baz $(realpath ../test/foo)

ripgrep could canonicalize (sans symlinks) only the paths given by the user. This still might be costly, but we could limit it only to paths that contain ... Unfortunately, this becomes pervasive, because that canonicalization will influence the paths printed by ripgrep in the results, which could be annoying to users.

@kalekseev
Copy link

@BurntSushi a simple test case that reproduces the problem even without ..

    #[test]
    fn absolute_parent_anchored_2() {
        let td = TempDir::new("ignore-test-").unwrap();
        mkdirp(td.path().join(".git"));
        mkdirp(td.path().join("src/llvm"));
        wfile(td.path().join(".gitignore"), "/src/llvm");

        let ig0 = IgnoreBuilder::new().build();
        let (ig1, err) = ig0.add_parents(td.path().join("src"));
        assert!(err.is_none());
        let (ig2, err) = ig1.add_child("src");
        assert!(err.is_none());

        assert!(ig2.matched("src/llvm", true).is_ignore());
    }

@tmccombs
Copy link
Contributor

tmccombs commented Jun 9, 2018

I'd like to help fix this, but need a little help.

cross posted from related issue.

Why is path joined to absolute_base rather than the current directory? I don't entirely understand what the purpose of absolute_base is.

I'd be happy to help implement a fix for this, but I may need some help understanding what exactly absolute_base is used for and any gotchas I should watch out for.

@BurntSushi
Copy link
Owner

@tmccombs I wish I could be the oracle you want me to be, but I don't have the context to answer your query. Unfortunately, the quickest way to fix this bug probably requires you to read and understand the code. You'll need to experiment and debug.

tmccombs added a commit to tmccombs/ripgrep that referenced this issue Jun 24, 2018
Fixes BurntSushi#829 and BurntSushi#278.

This does change the Ignore struct to depend on the working directory at
the time of creation, I _think_ this is fine, since Ignore isn't
publicly accessible and the Walk structs already depend on the current
working directory implicitly.

I tried to make a minimal change to fix the issue. An alternative
implementation could be to call `current_dir` in the `matched_ignore`
method instead of in `add_parents`, but I'm not sure of the performance
implications of doing that.

Another possible solution would be to change the places that we call
`Ignore::matched` to change the path to be relative to the absolute_base
of the `Ignore`.
tmccombs added a commit to tmccombs/ripgrep that referenced this issue Sep 8, 2018
@BurntSushi BurntSushi added the icebox A feature that is recognized as possibly desirable, but is unlikely to implemented any time soon. label Jan 27, 2019
@BurntSushi BurntSushi added the gitignore Bugs related to gitignore problems. label May 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug. gitignore Bugs related to gitignore problems. icebox A feature that is recognized as possibly desirable, but is unlikely to implemented any time soon.
Projects
None yet
Development

No branches or pull requests

4 participants