Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

git tag processing not efficient #2986

Closed
louie0817 opened this issue Nov 26, 2019 · 8 comments
Closed

git tag processing not efficient #2986

louie0817 opened this issue Nov 26, 2019 · 8 comments
Assignees
Milestone

Comments

@louie0817
Copy link
Contributor

Is your feature request related to a problem? Please describe.
git tag processing can be made more efficient:

Opengrok version 1.3.3, other app/os versions not important.

For Git tag processing, current procedure per git repo:
exec: git tag (to get list of tags)
then for each tag, exec:
exec: git log --format=commit:%H%nDate:%at -n1 $tag --

in our installation, we have 90000 github enterprise repos, and combined they have 1.25 million tags.

this results in 90000 + 1.25 million execs of git.

it would be more efficient to run just one command per repo, which in our installation, would save 1.25 million execs of git

the gitTagParser function would change as well. the single git command needed would be:
exec git log --tags --simplify-by-decoration --pretty="%D:%H:%at"

Describe the solution you'd like
I can provide the PR for the changes.

@tarzanek
Copy link
Contributor

YES
I'd love to see a PR! :-)
(actually I am bugged by the same problem)

@tarzanek tarzanek added this to the 1.3 milestone Nov 26, 2019
@vladak vladak added the indexer label Dec 12, 2019
@vladak
Copy link
Member

vladak commented Dec 12, 2019

Tagged with indexer however this impact the webapp as well I believe.

@louie0817
Copy link
Contributor Author

louie0817 commented Dec 13, 2019

Seems there is an existing bug with Git tag processing. if two or more tags are on the same date/sha1, then only one is preserved since the data structure is a TreeSet and the comparator is the date. So first tag in wins. Two possible solutions are to either have the tag field in TagEntry (GItTagEntry) be a String[], or have the tag String just be a comma separated list. I will go with the latter in my PR for this issue I am addressing and see how that goes in testing.

@vladak
Copy link
Member

vladak commented Dec 16, 2019

Either solution sounds good as long as it is sufficiently wrapped in methods.

@idodeclare
Copy link
Contributor

@louie0817 , are you still working toward a PR?

idodeclare added a commit to idodeclare/OpenGrok that referenced this issue Jun 2, 2020
@idodeclare
Copy link
Contributor

@louie0817, your Git skills are top notch. I tried a couple of years ago to speed up tags but never arrived at the optimum as you've done with your log-tags command. I wrote up a patch using your pointers.

@louie0817
Copy link
Contributor Author

louie0817 commented Jun 3, 2020

sorry, yes, I had written the code months ago, but got delayed in testing in our 90k+ GitHub enterprise repos. While the tag process worked as expected, I recall having some issue with some of the Class comparator functions. But maybe you saw the same issues and solved them. I will take a look at your PR. Thanks for following up on this as it would have been another month or so till I had time to wrap it up. Note added: I see I did mention the comparator issue previously and I see you addressed that. Thanks. Still reviewing. very well commented.

@idodeclare
Copy link
Contributor

Thank you, @louie0817

@vladak vladak closed this as completed in f6bdcf6 Jun 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants