-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
history based incremental reindex for changeset based SCMs #3077
Comments
#3033 should be also considered as it might share some code with the solution for this issue. |
Ruminating on the possible solutions: The list of files from the The history cache stores the current indexed revision in the So, it makes more sense to implement this wholly in the indexer. |
Thinking about this a bit more: I wanted to hijack Instead, |
May be worth noting that this approach will not avoid the syscall (I/O) churn completely, esp. if there is a lot of renames - the files coming from the history traversal method would have to be checked for existence on a file system before turned into It is also a question whether this makes sense to do for the initial indexing. Or, in general for incremental reindex of sizable (in terms of number of changed files) changesets however this is impossible to decide without actually traversing the history and getting the list of changed files and comparing that to the number of files in given repository. Also, this general approach would work only if all repositories for given project supported the history traversal. That is, if |
fixed in #3951 |
Is your feature request related to a problem? Please describe.
As mentioned in #3071 (comment) the indexer could take a list of files to process from the changesets that were used to update given project. This way the reindex could be made truly incremental (currently only history cache reindex is incremental) for repositories based on SCMs that operate on changesets. This will reduce the time spent in
indexDown()
significantly for repostories with big number of files (depending on the number of files impacted by the changesets added).Describe the solution you'd like
The
opengrok-mirror
script could generate the list of files (after all it can already do the "incoming" check) and pass that to the indexer. This will be handy especially for per project reindex.Describe alternatives you've considered
The indexer will figure out the list of files itself. After all the history cache stores the latest indexed changeset ID and the repository classes have already contain code for retrieving history entries and parse the output from various SCM commands.
The text was updated successfully, but these errors were encountered: