Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reindexing project after repositories were renamed/deleted #3421

Closed
ohm314 opened this issue Feb 12, 2021 · 6 comments
Closed

Reindexing project after repositories were renamed/deleted #3421

ohm314 opened this issue Feb 12, 2021 · 6 comments

Comments

@ohm314
Copy link
Contributor

ohm314 commented Feb 12, 2021

Describe the bug

We have an opengrok instance with several projects each containing several repositories. A nightly cronjob checks on our remote git servers the various repositories, clones new repositories and (when this occasionally happens) deletes repositories that were deleted/moved on the remote git server. The rough directory organization looks like this:

/opengrok/src/proj_A
/opengrok/src/proj_A/repo_1
/opengrok/src/proj_A/repo_2
/opengrok/src/proj_A/repo_3
/opengrok/src/proj_B
/opengrok/src/proj_B/repo_1
/opengrok/src/proj_B/repo_2

Now after our git repo synrhonization runs it looks like this:

/opengrok/src/proj_A
/opengrok/src/proj_A/repo_1
/opengrok/src/proj_A/repo_4
/opengrok/src/proj_B
/opengrok/src/proj_B/repo_1
/opengrok/src/proj_B/repo_2

We then ran the project reindexing tool as follows to refresh the index reflecting the changes in the filesystem:

opengrok-reindex-project -J=-Xmx16g -J=-server -a /opengrok/dist/lib/opengrok.jar -t /opengrok/etc/logging.properties.template -p "%PROJ%" -d /opengrok/log/hpc         -P hpc -U http://localhost:8080/source -- --renamedHistory on -d /opengrok/data --depth 3 -R /opengrok/etc/read-only.xml -r on -G -m 256 -c /usr/local/bin/ctags -U http://localhost:8080/source -H proj_A

followed by the configuration merging as described in the wiki:

opengrok-projadm -b /opengrok/ -R /opengrok/etc/read-only.xml -r --jar /opengrok/dist/lib/opengrok.jar -u

When I then go on the web interface I see that the project repository list is still out of sync

Digging further, I did a curl request:

curl -X GET -s --include "http://localhost:8080/source/api/v1/projects/proj_A/repositories"

which also returns an outdated list of repositories.
Looking further into the configuration.xml I see still the non-existing repositories mentioned.

To Reproduce

  1. setup a project with a repository
  2. rename the repository
  3. run opengrok-project-reindex

Expected behavior

I expect that changing repositories inside the project will be reflected in the index after rerunning the indexer. Or if that is not possible, I would appreciate some guidance on how to update the index through the RESTFul API?

Maybe more fundamentally. Are repository changes supported at all, or do we have to delete the project entirely and index it from scratch if a repository is added/removed/renamed?

Additional context
I tried deleting the index to the repository and reindexing, but now the project is completely gone from the UI:

curl -X DELETE --include "http://localhost:8080/source/api/v1/projects/proj_A/data"

No more project show up in the UI, but doing curl -X GET -s --include "http://localhost:8080/source/api/v1/projects/proj_A/repositories" will show me still the old outdated project list.

Versions

Ubuntu 18.04
opengrok: 1.5.11
jdk: openjdk 11.0.9.1 2020-11-04
tomcat: 9.0.16

@vladak
Copy link
Member

vladak commented Feb 12, 2021

With per project workflow you basically need to re-add the project that changed the repositories using the RESTful API (the opengrok-projadm will work too). This will take care of refreshing the list of repositories. The indexer will then make sure the documents in the index corresponding to the deleted files are removed. This will work as long as the history of the repositories that remained was not overwritten.

@ohm314
Copy link
Contributor Author

ohm314 commented Feb 15, 2021

Thanks!

This will work as long as the history of the repositories that remained was not overwritten.
Not sure I fully understand this point, could you please elaborate?

I wonder, is my use-case very specialized? Is there a better group-project-repo arrangement that I should consider? The issue is just that we would like to index all the code in our institute, which is spread over many repositories that belong to different "groups" (as in entities at our institute). These repositories appear and disappear over time as work evolves.

Finally, you mention the issue in #3402 - my java is very rusty, but I see there is a fair bit of python in there - With a bit of guidance, I could help extending the python tools with this functionality, if I manage to dig out a little freetime to work on this.

@vladak
Copy link
Member

vladak commented Feb 15, 2021

Thanks!

This will work as long as the history of the repositories that remained was not overwritten.
Not sure I fully understand this point, could you please elaborate?

I wonder, is my use-case very specialized? Is there a better group-project-repo arrangement that I should consider? The issue is just that we would like to index all the code in our institute, which is spread over many repositories that belong to different "groups" (as in entities at our institute). These repositories appear and disappear over time as work evolves.

I'd say it is a bit uncommon, at least to my knowledge. As I said, nothing wrong with that as long as you are not rewriting history of the repositories that stick around.

Finally, you mention the issue in #3402 - my java is very rusty, but I see there is a fair bit of python in there - With a bit of guidance, I could help extending the python tools with this functionality, if I manage to dig out a little freetime to work on this.

That's already in progress. @tulinkry mentioned that he would like to work on it in the past and my thinking was that it should not really take a lot of time however in the end I was surprised how much effort was needed to make it happen - it was not too much work, I was just surprised.

@vladak
Copy link
Member

vladak commented Feb 15, 2021

To wrap this up, for your use case it should be sufficient to run opengrok-projadm ... --add INSERT_PROJ_NAME_HERE before reindexing the project whose repositories have changed.

@ohm314
Copy link
Contributor Author

ohm314 commented Feb 15, 2021

OK, thank you very much! I'll include that into my workflow then 😄

@vladak
Copy link
Member

vladak commented Feb 16, 2021

Cool. let us know how it goes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants