-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a way to skip large files while indexing ? #1646
Comments
There is an enhancement #534 filed to track this. As a workaround, specify the files/pattern as ignored (-i). |
Ok. Thanks for feedback. |
@ChristopheBordieu fwiw, 1.0 release has fixes for most of big files problems (ctags parsing is completely fixed, some languages - we limit length of tokens for parser - https://github.com/OpenGrok/OpenGrok/blob/master/build.xml#L270 ) |
Hi @tarzanek |
So ... it could be a good improvement |
As I wrote in #534 this is not as simple as it looks. |
I do not know Java... So do not wait for me for a patch :-) And it is not simple ! |
Just wanted to chime in that we're seeing slow search performance after upgrading from a very old version of OpenGrok. We believe that it's caused by a few large JSON files (5-10MB). For certain terms, we're seeing search take a very long time or time out. However, when we filter out JSON files from the search it returns very quickly. We'll try to find a workaround in the meantime. EDIT: We're on OpenGrok 1.3.8 |
Incidentally, file processing times could be part of the statistics (#579). |
In our case we don't care much about processing times since it's offline (up to a certain point of course 😄 ). However we care a lot about search speed since our engineers rely on interactive searching of OpenGrok and want it to be fast. It seems like these big JSON files are causing searches to be at least an order of magnitude slower. Using a file path or file type filter to exclude JSON files makes searches snappy again. We recently upgraded from OpenGrok 0.12.1.5, which didn't seem to have a In the short term we're going to try switching JSON files to use the |
My guess is the Lucene Unified Highlighter, which forces to read full source into memory. That will still be active even if you use |
Seems like you were exactly right. Switching to |
Hello,
Not a bug. Just a question.
When indexing, is there a way to skip large files whose sizes are greater than X kB (or MB or GB) ?
The text was updated successfully, but these errors were encountered: