-
-
Notifications
You must be signed in to change notification settings - Fork 328
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[performance+memory] Beating git in index-pack
(as used for clones and fetches) β
π
#5
Comments
memory mode: in-memory, resolve-bases, resolve-deltas, resolve-deltas-and-basesMore like a test I wanted to see if it makes any difference to keep the decompressed data in memory to speed up downstream operations. And it looks like this is actually reducing performance at least while the pack is also streamed to disk at the same time. The virtual memory system probably caches it entirely. When not streaming the pack to disk, in-memory operation appears to be yielding a mild speedup. But when allowing to write a temporary file, the speedup is entirely gone. Thus it seems that keeping decompressed bytes really doesn't do any good.
|
index-pack
(as used for clones and fetches)index-pack
(as used for clones and fetches)
index-pack
(as used for clones and fetches)index-pack
(as used for clones and fetches) β
π
For the actual performance tests on a 96 core machine, have a look at this comment. tldr;: the time is dominated by creating an index by streaming the pack, and pack resolution is then done in about 10 seconds or 14.6GB/s (of decoded objects). |
The ARM git provided with MacOS Big Sur changes everything: With 3 threads (default)
Git is at least twice as fast when reading/streaming the pack. In our case this is limited by the deflate performance of millions of small streams, and there are still some improvements that we can make use of. With 8 threads (as available cores)
Clearly contention reduces speed. This effect is not visible at all when verifying pack entries, making me think the amount of work
|
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
git index-pack is streaming a pack and creates an index from it. The difficulty arises from having to decompress every entry in the pack stream, which can be composed of many small objects. These are placed in some sort of index to accelerate the next stage that is all about resolving the deltas in order to produce a SHA1. Per pack entry, the SHA1, pack offset and CRC32 are written into the index file to complete the operation.
The indexing phase in inherently single-threaded with little potential for improvements, whereas the resolving phase is fully multithreaded and entirely lock free. The first phase could be improved by writing the pack file in parallel - right now it happens after reading it (the pack file is used later for lookup to not hold everything in memory). However, IO doesn't appear to be the bottleneck at all.
Compared to
gitoxide
, git is considerably faster when creating the index, averaging 54MB/s of reading uncompressed bytes.gitoxide
clocks in at about45MB/s50MB/s, and slows down considerably during the end. Part of that slowdown might be attributed to this issue with resetting miniz_oxide's decompressor.Luckily
gitoxide
is way faster when resolving deltas, which already gives it a good first place in the race, with some room for more if it manages to get as fast as git when decompressing and indexing objects.The picture below shows the fastest git run I could produce, probably with everything being properly cached:
Without cache, it seems to look different:
The fastest
gitoxide
runs, which are pretty comparable in the amount of work done, as they also write out the pack and the index. The only difference is that they use the packfile directly instead of reading it from stdin, it's streamed nonetheless though, and merely an oversight.Memory consumption of git hovers consistently around 650MB (for the kernel pack), and is
lowerhigher than the1.2GB750MB580MB thatgitoxide
uses. However,gitoxide
can temporarily use more memory as it keeps intermediate decompressed objects per thread, whose maximum sizes depend on the amount of children and the base size. So I have seen this go up to 850MB for small fractions of time because of that.The text was updated successfully, but these errors were encountered: