-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement such --files-from
option
#273
Comments
It seems like there are a few ways to do this without building it into ripgrep. Here are a couple:
I don't think either of these approaches is less efficient than what ripgrep would do if it were built-in. The only caveat here is that if your file list is big enough, you'll need to use Could you explain in more detail why these approaches don't work for you? |
Sure. None of your proposed alternatives work with filenames containing spaces.
|
I guess the standard solution to that is to delimit your file paths will a NUL terminator (e.g., If you aren't generating files with |
Fair enough. So, for the record, instead of the syntax I proposed I can achieve the same results using Not as convenient, but I can very well live with that. |
Yes, I think I'd prefer that at this point. Popping up a level, do also note that ripgrep provides the |
@BurntSushi My use-case is similar to his where I compile a list of files I'm interested in and search only those files instead of letting ripgrep loose on my entire project which would take a lot longer. Like you suggested, I've been using Next, I wanted to search only some specific filetypes (say C++ source files) within FILELIST so I tried to add a This is kinda bad as I've defined the csrc type elsewhere but I'm not able to use it in this context. Is there a better way to go about this? It'd be nice if ripgrep filters the list of files provided using the type/glob argument if one is provided eg. |
@kshenoy ripgrep has, and probably always will, explicitly ignore any filtering for file paths that are explicitly given on the command line. I realize that for your particular niche case, this isn't what you want, but to do otherwise would grossly complicate the already complex filtering logic that ripgrep performs. e.g., running
You might instead consider using a |
That's a reasonable rule to follow. I agree that doing anything else would involve prioritizing between different ways to include/exclude files. Thanks for the clarification.
I did consider doing that. However, we use Perforce at work and it's easier to compile a list to search through using |
it would be nice to be able to pipe a list of files to ripgrep. right now, I searched for a second pattern in files matching a first pattern with rg "pattern2" --files-without-match $(rg "pattern1" --files-with-matches) when it would be nice to do the following because I usually think of the first pattern first rg "pattern1" --files-with-matches | rg "pattern2" --files-without-matches although, this use case is unique since I'm using |
What about windows? Main issue that there is no xargs. And if you try to add all files to command line:
then it exceeds maximum command length and doesn't work. I want to search all vimhelp files provided in vim runtime path and there are a lot of files (including various plugin documentation). |
I'm re-opening this because it seems impossible or difficult to work around this when xargs is not present. |
What should the flag name for this be? Also, should files specified via this method be subject to smart filtering or globs? Files specified on the command line are not, so I would think these shouldn't be either. That is, files to be searched via this method should act as if they were given on the command line. |
That's a great news !
I proposed man tar|rg -s -A8 -- '-T, --files-from'
-T, --files-from=FILE
Get names to extract or create from FILE.
Unless specified otherwise, the FILE must contain a list of names separated by ASCII LF (i.e. one name per line). The names read are handled the same way as command line arguments. They undergo quote removal and
word splitting, and any string that starts with a - is handled as tar command line option.
If this behavior is undesirable, it can be turned off using the --verbatim-files-from option.
The --null option instructs tar that the names in FILE are separated by ASCII NUL character, instead of LF. It is useful if the list is generated by find(1) -print0 predicate.
Seconded. |
|
@okdana Aye. I also think that if the list of files is given explicitly like this, then users can use other mechanisms of filtering very easily before passing the file to ripgrep. For example, you might use And also, come to think of it, if we did allow gitignore or other filters to apply to the list of files given, that would probably prevent this feature from being implemented in any reasonable time frame. gitignore matching, for example, is pretty heavily coupled to directory traversal. Applying |
|
@timotheecour I would expect you to have to write Executing the search before stdin is closed is interesting. That will require some re-factoring inside ripgrep, since right now, it stores the complete set of paths to search in memory. (Because it was always in memory via CLI arguments.) I agree that streaming is probably the right option, although that may be an enhancement that comes after the initial feature lands, depending on how difficult that refactoring is. |
totally fine, then let's keep this issue open till then :-) |
No, if that happens, then I'll close this issue and open a new one. |
Using xargs adds few seconds to the execution time when the file list contains 20 000 paths. |
Another possible motivation is that using $ time git ls-files -z | xargs -0 rg symlinks >/dev/null
_______________________________________________________
Executed in 1,99 secs fish external
usr time 2,64 secs 434,00 micros 2,64 secs
sys time 2,67 secs 88,00 micros 2,67 secs but if I just add $ time git ls-files -z | xargs -0 -P4 rg symlinks >/dev/null
________________________________________________________
Executed in 811,66 millis fish external
usr time 3,41 secs 0,00 millis 3,41 secs
sys time 3,26 secs 1,77 millis 3,25 secs That's a more than 2x improvement. IIUC the |
@BurntSushi , let me share big feedback on using Several years ago I started writing script called So I wrote C++ program called And this First of all I needed to know whether Then I needed to know what rg options make rg fully compatible with grep (so that I can simply replace Bug # 1. Docs may be interpreted as saying that It would be great if Moreover: Bug # 2. I was unable to find a way to keep argument order intact. Fortunately,
What order So I did experiments and fortunately I found that Bug # 3. I reported it here: #2418 When Advice for xargs users. Add But what if particular Advice for xargs users. Add Some my files have actual Bug # 4. I found a workaround: General advice. Pass I have read rg changelog and found that rg sometimes does breaking changes. So I added to my script this: if [ "$(rg --version | head -n 1)" != "ripgrep 13.0.0" ]; then
echo "${0##*/}: rg problems" >&2
exit 1
fi Actual my-find ... | { xargs -d '\n' --no-run-if-empty -- rg -uuu --heading --color=always --no-messages -B 10 -A 10 --no-config --sort=path --line-number -- "$REGEX" /dev/null || :; } | less -R Notes:
So I eventually overcame all All these bugs are subjective, so I didn't report most of them as separate reports. But if you ( @BurntSushi ) want, I will do this |
Thanks for the feedback but in the future, please just file a new issue instead of attaching to an existing one. The vast majority of your comment is irrelevant to his issue. Also, most of the problems you ran into are also quirks of grep. The To comment specifically on a couple things...
The README says, "Automatic filtering can be disabled with rg -uuu." And that's not a lie. And the docs for the
And in context, that is absolutely correct. It obviously doesn't mean that
It never will because ripgrep never was, is or will be fully compatible with grep. Once again: https://github.com/BurntSushi/ripgrep/blob/master/FAQ.md#posix4ever
It's not documented because it's not guaranteed. No such documentation exists for grep either. |
Another motivation is that using it inside vim with git ls files is really cumbersome and not os dependent. |
A recurring workflow of mine is to search within an existing list of files.
Currently I'm living by
$(generate list of files) | while read f; do rg pattern "$f"; done
which is both inconvenient and inefficient.
Ack does provide a
--files-from
option. Implementing it would allow me to typerg pattern --list-from <(gen list)
to fulfill my needs.
The text was updated successfully, but these errors were encountered: