-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
git tag processing not efficient #2986
Comments
YES |
Tagged with |
Seems there is an existing bug with Git tag processing. if two or more tags are on the same date/sha1, then only one is preserved since the data structure is a TreeSet and the comparator is the date. So first tag in wins. Two possible solutions are to either have the tag field in TagEntry (GItTagEntry) be a String[], or have the tag String just be a comma separated list. I will go with the latter in my PR for this issue I am addressing and see how that goes in testing. |
Either solution sounds good as long as it is sufficiently wrapped in methods. |
@louie0817 , are you still working toward a PR? |
@louie0817, your Git skills are top notch. I tried a couple of years ago to speed up tags but never arrived at the optimum as you've done with your log-tags command. I wrote up a patch using your pointers. |
sorry, yes, I had written the code months ago, but got delayed in testing in our 90k+ GitHub enterprise repos. While the tag process worked as expected, I recall having some issue with some of the Class comparator functions. But maybe you saw the same issues and solved them. I will take a look at your PR. Thanks for following up on this as it would have been another month or so till I had time to wrap it up. Note added: I see I did mention the comparator issue previously and I see you addressed that. Thanks. Still reviewing. very well commented. |
Thank you, @louie0817 |
Is your feature request related to a problem? Please describe.
git tag processing can be made more efficient:
Opengrok version 1.3.3, other app/os versions not important.
For Git tag processing, current procedure per git repo:
exec: git tag (to get list of tags)
then for each tag, exec:
exec: git log --format=commit:%H%nDate:%at -n1 $tag --
in our installation, we have 90000 github enterprise repos, and combined they have 1.25 million tags.
this results in 90000 + 1.25 million execs of git.
it would be more efficient to run just one command per repo, which in our installation, would save 1.25 million execs of git
the gitTagParser function would change as well. the single git command needed would be:
exec git log --tags --simplify-by-decoration --pretty="%D:%H:%at"
Describe the solution you'd like
I can provide the PR for the changes.
The text was updated successfully, but these errors were encountered: