-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve listing performance by using go-git #6478
Conversation
Not sure how to update the vendor modules. |
@filipnavara you have to commit the following changes after
After that, |
Thanks, I will try that. |
@typeless Unfortunately that didn't work for me. I suspect it has to do something with my local setup. I updated Go, but that alone didn't help.
|
@filipnavara |
Codecov Report
@@ Coverage Diff @@
## master #6478 +/- ##
==========================================
- Coverage 41.4% 41.35% -0.06%
==========================================
Files 415 415
Lines 56382 56337 -45
==========================================
- Hits 23344 23296 -48
- Misses 29906 29921 +15
+ Partials 3132 3120 -12
Continue to review full report at Codecov.
|
Signed-off-by: Filip Navara <[email protected]>
Signed-off-by: Filip Navara <[email protected]>
Signed-off-by: Filip Navara <[email protected]>
Signed-off-by: Filip Navara <[email protected]>
…t and never modified after. Signed-off-by: Filip Navara <[email protected]>
…ing (.editorconfig code path is still hit). Signed-off-by: Filip Navara <[email protected]>
Signed-off-by: Filip Navara <[email protected]>
Signed-off-by: Filip Navara <[email protected]>
Signed-off-by: Filip Navara <[email protected]>
Signed-off-by: Filip Navara <[email protected]>
Signed-off-by: Filip Navara <[email protected]>
Signed-off-by: Filip Navara <[email protected]>
Signed-off-by: Filip Navara <[email protected]>
… thus all the loaded packfile indexes). Signed-off-by: Filip Navara <[email protected]>
@lunny Can you give me a link to the Git repository? I'll try to reproduce it locally. |
I think maybe you missed commits counting cache in this PR. |
I do get pretty abysmal times on that repository on Windows (~12-13 seconds). I'll investigate it. |
I assume the regression you are seeing happens because I dropped supported for the cache introduced in #6045. It's relatively trivial to bring it back and I hope to test and submit a patch soon. |
@filipnavara I don't think so because #6045 is not merged yet. As I know, for a big repo. Commits counting and Last Commit of every file or folder will spend most time when loading repo home page. I think maybe because you missed Commits counting which introduced on #2774 |
@lunny Right. I did remove the usage of I am pretty sure I didn't mess with the commit counting. I'll profile it and figure out what is wrong. |
Just a small update: The profile clearly shows the time is spent in |
@filipnavara thanks, you are doing great work! Really appreciate it 👍 |
Thanks for all the positive comments! I managed to find a flaw in my algorithm that severly degraded performance on repositories with huge number of merge commits. I'm testing a fix now and hope to submit it soon. Damn, not too soon though. There's still an issue with some 8 year old file in the root of the repository that is slow. |
Update: I am still tracking down the regression, but I managed to close in on it a bit. Draft PR #6686 significantly improves the speed on repositories containing massive amount of merge commits. The first commit in the PR fixes one correctness issue and speeds up the loads for cases where there are files with mostly recent commits. If I delete the ".yardopts" file (9 year old file in the root directory) I can load the listing in 2 seconds (compared to over 4 seconds before this PR). Second commit in the PR introduces optimization for reducing the search space in case of many merge commits. On stock Rails repository together it brings over 30% improvement for the listing times. Second problem is actually something I tried to tackle before and that's enormous amount of I/O in go-git when loading relatively small objects. This was mostly caused by unnecessary seeks throwing away buffered input and then re-reading it again. Due to some correctness issues my previous fix didn't apply to delta-encoded objects. Unfortunately this seems to be hit very hard on traversing the tree/commit history. Naïve fix by changing this line to allow deltas reduces the loading time by another 40% to around 5s. |
I have a WIP branch with few go-git fixes at https://github.com/filipnavara/gitea/tree/go-git-perf-fixes (directly in the vendor module directory for now). It contains PR #6686 + some optimizations to go-git. On my machine that's enough to push it to around 4 - 4.5 seconds on the root of the Rails repository. There's still room for some improvement but I wanted to address the low-hanging fruit first. For the root directory it's now similar to the performance I got before #6478. That may not necessarily be good enough since invoking git is super slow on Windows by comparison. For subdirectories it's significantly faster. Here are results from the profiler for the above branch and for a state before PR #6478: https://gist.github.com/filipnavara/0777cdeeba07f469314e4c05185fa29d Notice that the expensive listing operation on before- profile is over 7 seconds! Yet I am fairly certain that the displayed page load time was around 4.6s. It may be some quirk or a clever parallelization I missed. |
That's actually it. I was actually comparing apples to oranges. The previous implementation did parallelization on the maximum number of CPUs which is how it managed to achieve the speed. While that's beneficial in some cases (like a site with low load) the trade off comes with higher disk access and lower actual throughput under load. Obviously I can do that with the new algorithm too by partitioning the set of input entries into groups and then for each of these groups running the history algorithm in parallel. It may yield better results in some cases but it's unlikely to be a good idea under higher load. Long term there are much better approaches to solve it. Internally we used to run another fork of Gitea+go-git that implemented support for git commit-graph files and custom bloom filter format. While the commit-graph work is now part of official git, the bloom filter format is not and as such it was not viable to maintain. It was inspired by the work done for Azure DevOps (https://devblogs.microsoft.com/devops/super-charging-the-git-commit-graph-iv-bloom-filters/) and I highly recommend reading the whole series of the articles. Having this metadata allowed us to keep the load times almost always under one second for nearly any shape of repository. |
…pository. (#6686) * Fix flaw in the commit history lookup that caused unnecessary traversal when the repository contains a lot of merge commits. Also return the merge commit as the changed one if the file or directory was changed as part of the merge, eg. through conflict resolution. Signed-off-by: Filip Navara <[email protected]> * Perform history simplification. If a file is present on multiple parents in a merge commit follow only the first parent.
I'm sure I am already boring everyone to hell with all the comments. :-) Turns out that I was not the only person who noticed the performance shortcomings in go-git and started fixing them. I hope the pending PRs will get processed in timely manner. My latest measurements were with src-d/go-git#1119, src-d/go-git#1121, src-d/go-git#1123, src-d/go-git#1124, src-d/go-git#1125, src-d/go-git#1126 applied (for convenience I pushed the merged version to https://github.com/filipnavara/go-git/tree/wip-perf). Together they allow me to load the root of Rails repository in little less than 3 seconds (more than 50% improvement compared to currently released version). With pregenerated commit graph file (#6701) it drops to 1.7 seconds. I was actually worried that I caused some serious performance regression for some scenarios and that this PR, or part of it, would have to be reverted. I'm now fairly certain that should not be necessary. Thanks for feedback from everone. |
@filipnavara I don't fell boring about what you are doing, but very thanks for your effort on that. I will try https://github.com/filipnavara/go-git/tree/wip-perf ASAP. |
I have tested your wip-perf branch. The time on rails root page spent 2.5s ~ 4s and on release/v1.8 it's 2.3s. It's significant branch. But I found there are some different last commit display between two implements. Notice: I'm using an old rails repository. v1.8 : branch wip-perf : You can find |
@lunny Can you share the commit hash of the HEAD you are looking at? I believe this is result of my last tweaks which I compared to GitHub behavior. I'd like to verify the correctness locally and I need to ensure that we are looking at the same thing. |
The last commit id is 5abe612b0d on that rails repository. |
GitHub displays the same thing as you have on the It boils down to the treatment of merge commits on directories. I don't have hard preference on either one, but I would not consider it a bug. |
Keep in mind that these are numbers with basically no load. The At the moment Algorithmically computing the newest commit for this file means that all those 8 years of commit objects have to be loaded from disk. That's true for both Aside from the general |
@filipnavara It seems all above pipeline PRs on go-git merged. Could you send a PR from your wip-perf branch? |
@lunny don't we need to wait official go-git release for the vendor module to get updated? (I can ask the go-git guys to make one) |
Contributes to #491.
This is rebased version of my changes which switch most of the read-only repository access logic to use go-git. We have been running version of Gitea with these changes in production environment for several months so it received some stress testing.
It's becoming harder to maintain this as separate branch so it would be nice if someone can take it over and upstream the changes.
Updating the go-git.v4 vendor module should bring some more performance improvements.