Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
mononoke: add a method to derive a simple stack of manifests
Summary: Background: I've been looking into derived data performance and found that while overall performance is good, it depends quite a lot on the blobstore latency i.e. the higher the latency the slower the derivation. What's worse is that increasing blobstore latency even by 100ms might increase time of derivation of 100 commits from 12 to 65 secs! [1] However we have ways to mitigate it: * **Option 1** If we use "backfill" mode then it makes derived data derivation less sensitive to the put() latency * **Option 2** If we use "parallel" mode then it makes derived data derivation less sensitive to the get() latency. We can use "backfill" mode for almost all derived data types (only exception is filenodes), however "parallel" only enabled for a few derived data types (e.g. fsnodes, skeleton manifests, filenodes). In particular, we didn't have a way to do batch derived data derivation for unodes, and so unodes derivation might get quite sensitive to the blobstore get() latency. So this diff tries to address that. I considered three options: * **Option 1** The simplest option of implementing "parallel" mode for unodes is to just do a unode warmup before we start a sequential derivation for a stack of commits. After the warmup all necessary entries should be in cache, so derivation should be less latency sensitive. This could work, but it has a few disadvantages, namely: * We do additional traversal - not the end of the world, but it might get expensive for large commits * We might fetch large directories that don't fit in cache more often than we need to. That said, because of it's simplicity it might be a reasonable option to keep in mind, and I might get back to it later. * **Option 2** Do a derivation for a stack of commits. We have a function to derive a manifest for a single commit, but we could write a similar function to derive the whole stack at once. That means for each changed file or directory we generate not a single change but a stack of changes. I was able to implement it, but the code was too complicated. There were quite a few corner cases (particularly when a file was replaced with a directory, or when deriving a merge commit), and dealing with all of them was a pain. Moreover, we need to make sure it works correctly in all scenarios, and that wouldn't be an easy thing to do. * **Option 3** Do a derivation for a "simple" stack of commits. That's basically the simplified version of option #2. Let's allow doing batch derivation only for stacks that have no a) merges b) path changes that are ancestors of each other (which cause file/dir conflicts). This implementation is significantly simpler than option #2, and it should cover most of the cases and hopefully bring perf benefits (though this is something I'm yet about to measure). So this is what this diff implements Reviewed By: yancouto Differential Revision: D30989888 fbshipit-source-id: 2c50dfa98300a94a566deac35de477f18706aca7
- Loading branch information