-
Notifications
You must be signed in to change notification settings - Fork 42
Conversation
141bc21
to
f366cad
Compare
ctx, cancel := context.WithTimeout(context.Background(), 1*time.Minute) | ||
defer cancel() | ||
|
||
args := []string{"log", getCommitsInfoPretty, "--name-status", "-c"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this command ever be faster than the targetedSearch
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is faster for repos with many files (which would otherwise require many calls to targetedSearch
vs. one untargeted search). See the benchmark results for the manyfiles
repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe have 2 benchmarks, one for each type of search, just to convince me 😛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bkcsoft I believe the existing benchmarks are telling on their own. Using only the targetedSearch
is essentially the same as the old implementation (aside from a few small tweaks). The only explanation for the very large (>150x) performance improvement in the manyfiles
repo between the old and new implementations is that running an "untargeted" search in parallel is helpful (in this case, very helpful).
The advantage of using a "untargeted" approach in repos with many files is also reflected in the benchmarks for #53 (which solely used an untargeted approach)
commit_info.go
Outdated
} | ||
if line[0] >= 'A' && line[0] <= 'X' { // a file was changed by the current commit | ||
tabIndex := strings.IndexByte(line, '\t') | ||
if tabIndex < 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MM
is valid for Modified files in a Merge commit 🤔
c757efaaa0267ec823ddc0858cf6ec930e93a074 1510393589 Merge branch 'gitlab-git-f7537ce03a29b' into 'master'
MM CHANGELOG.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then you have moves like this: (no clue what the digits are for 🤔 )
R088 internal/server/auth.go internal/server/auth/auth.go
R070 internal/ref/helper.go internal/git/ref.go
R055 ruby/vendor/gitlab_git/lib/gitlab/git/committer.rb ruby/vendor/gitlab_git/lib/gitlab/git/user.rb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarity-score apparently:
<X><score> The rename or copy score (denoting the percentage
of similarity between the source and target of the
move or copy). For example "R100" or "C75".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bkcsoft The code will handle MM
correctly (as far as I can tell, since the line still begins with an upper case character). I'll update the code to correctly handle R...
and C...
(where there will be two filepaths, and we should look at the second one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, I read tabIndex > 1
which would fail. But it's tabIndex < 1
😅
commit_info.go
Outdated
func (state *getCommitsInfoState) numRemainingEntries() int { | ||
state.lock.Lock() | ||
numRemaining := len(state.entries) - len(state.commits) | ||
state.lock.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to move this just after lock and use defer state.lock.Unlock()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intentionally did not use defer
for performance. This function will be called many times, so small changes can have a meaningful impact on overall performance. See golang/go#6980
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will go on that with @ethantkoenig defer add some overhead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lafriks friendly ping
commit_info.go
Outdated
state.targetedPaths[entryPath] = struct{}{} | ||
break | ||
} | ||
state.lock.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - move this just after lock and use defer state.lock.Unlock()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here since no return in between.
commit_info.go
Outdated
state.commits[entryPath] = commit | ||
updated = true | ||
} | ||
state.lock.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here with defer
commit_info.go
Outdated
func (state *getCommitsInfoState) numRemainingEntries() int { | ||
state.lock.Lock() | ||
numRemaining := len(state.entries) - len(state.commits) | ||
state.lock.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will go on that with @ethantkoenig defer add some overhead
commit_info.go
Outdated
state.targetedPaths[entryPath] = struct{}{} | ||
break | ||
} | ||
state.lock.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here since no return in between.
LGTM |
@ethantkoenig Is it really worth risking locking up if unexpected error occurs? Can you retest test how big performance penalty in this case it is with |
@lafriks Overhead of using |
f39f745
to
acb3e0f
Compare
LGTM |
@ethantkoenig please update git dep in gitea repository |
A faster implementation of
GetCommitsInfo
, particularly for repositories with many files. Addresses go-gitea/gitea#491 and go-gitea/gitea#502.A revival of the failed #53 (which was reverted by #73).
BENCHMARKS
Old implementations:
New implementation:
In summary, we see a >150x improvement in repos with many files, and mild improvements in other repos.
IMPLEMENTATION DETAILS:
Runs targeted and untargeted searches in parallel. The untargeted search (1 thread) sifts through the output of a
git log --name-status -c -- <tree-path>
command, seeing which files the most recent commits have affected. The targeted search (multiple threads) runsgit rev-list -1 HEAD -- <entry>
commands to get the latest commit for specific file that have not been found by the untargeted search. The targeted search is similar to the old implementation, with a few enhancements (e.g. usesgit rev-list
instead ofgit log
).If there are only a few remaining entries, then the untargeted search stops to allow the targeted-search threads to finish without any interference (in my tests, this interference had a non-negligible impact on performance).
TESTING:
I have not been able to test this new implementation on a Windows machine. If someone could test the new implementation on Windows (to check for forward-slash vs. backslash bugs), that would be great.