-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement an incremental compilation cache for Cranelift #4551
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for working on this @bnjbvr -- I'm extremely excited for the compile-time improvements this will bring, and there's no question in my mind that we'll merge this!
I took a first look at the approach and I have a few high level thoughts. The main one is: the duplication of types into "cached" variants, and the translation between these and the existing IR types, has me thinking that it might be significantly easier for maintenance, readability, and correctness if we instead separate out the types completely so that we have a core function-IR type that is exactly the cache key, without any munging, and then a separate "specialization" / "instantiation" / "finalization" (unsure of the best verb here) stage to plug in the bits we are resilient to: source locs, value labels, referenced names, etc.
In other words, I think we should take a scalpel to Function
and DataFlowGraph
and (i) make references to source locs always relative to a base, (ii) make ExternalName
references indirect through a table, etc.
Then we can put that source-loc base index, that referenced-name table, and anything else, in a separate struct. And Function
is a struct that composes FunctionCore
and FunctionParams
.
Then the optimizer and backends have access only to FunctionCore
, and compile down to a MachCompileResultCore
. And we have a final relocation-application bit that turns that into MachCompileResult
. (Or something like that anyway; I'm not too picky about the type names and exact divisions, just that we make sure that the core of the compiler does not have any access to the "parameters" that are fixed up later.)
I don't think that's too much of a departure from much of the logic here, it's just rotating it a bit. The advantage is that it will almost be correct by construction; we're using the type system to ensure that we don't have any data-dependence leaks from things not in the cache key to the bit that we cache. And the key idea of "relocations" or "fixups" is still there, just made more explicit in the types.
Sorry I didn't come up with this earlier -- it wasn't apparent exactly what the thing would look like until I saw CachedDataFlowGraph
and friends :-)
Happy to talk further about this of course!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err, I didn't mean to check the r+ bit in that last review-comment. Adding a "request changes" review here as per above (but I'm still very excited about this).
8753cdf
to
5567177
Compare
Thanks all for the first batch of reviews! @cfallin this makes sense to me. At this point there's not much that needs those relocations, if I'm not mistaken; mostly things that weren't trivially |
split `Function` into `FunctionParameters` and `FunctionStencil`, so the stencil can be compiled and cached, and compilation parameters can be applied onto the result of compiling a FunctionStencil (CompiledCodeBase<Stencil>) later on to become a consumable CompiledCode.
5567177
to
7a3d562
Compare
I've updated the initial text of this PR, and while not ideal/perfect, I claim this is ready for review and could be merged as a first step. Remaining things to do as follow-ups, either myself in my copious free time, or other interested contributors 😍 :
|
Unfortunately the previous implementation was more powerful, in the sense that a function call (in a function body) could change a call target, and that would result in a cache hit. With the new FunctionStencil approach this is not the case anymore because that changes the `FunctionStencil` identity. A future refactoring could make it possible to support caching for that specific case.
@cfallin This new version incorporates changes to address your review feedback during the live review session (that we organized to help going through the changes, as this is a large patch). This is now ready and passes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks great to me! Thanks @bnjbvr for the patience and perseverance on this one. After our live review today I'm confident that this is likely to work well and be maintainable, and have the safety properties we would expect; in particular, the type-level separation of cacheable vs. change-resilient state, and the embedding of version markers in the key and value both.
It looks like there's a CI failure that may be the result of the new Rust version (a new warning appeared?); I'll look into it. I don't think it's caused by this PR.
I just merged in |
Hmm, @bnjbvr it looks like some of the changes broke the fuzz targets -- happy to merge once that is fixed up. |
fbcd271
to
570b634
Compare
fn declare_func_in_data(&self, func: FuncId, ctx: &mut DataContext) -> ir::FuncRef { | ||
(**self).declare_func_in_data(func, ctx) | ||
} | ||
|
||
fn declare_data_in_data(&self, data: DataId, ctx: &mut DataContext) -> ir::GlobalValue { | ||
(**self).declare_data_in_data(data, ctx) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These haven't yet been re-added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, seems I added it back in one implementation, but not in the trait. Is that something that you were using, @bjorn3? I could add it back later, if required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change means that declare_func_in_data
and declare_data_in_data
are no longer forwarded to FooModule
when you call them on a &mut FooModule
, which is incorrect if those functions have been overwritten by FooModule
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see; it's likely subpar design if the default implementation is incorrect for some implementations by default, but that's a different discussion to have :) Will post a small patch for this chunk.
This is the implementation of #4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime.
After the suggestion of Chris,
Function
has been split into mostly two parts:FunctionStencil
contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (CompiledCodeBase<Stencil>
) will be the same. This makes caching trivial, as the only thing to cache is theFunctionStencil
.FunctionParameters
contain the... function parameters that are required to finalize the result of compilation into aCompiledCode
(akaCompiledCodeBase<Final>
) with proper final relocations etc., by applying fixups and so on.Most changes are here to accomodate those requirements, in particular that
FunctionStencil
should beHash
able to be used as a key in the cache:RelSourceLoc
in theFunctionStencil
. This required changes so that there's no need to explicitly mark aSourceLoc
as the base source location, it's automatically detected instead the first time a non-defaultSourceLoc
is set.FunctionStencil
(aka before this patchExternalName::User { namespace, index }
) are now references into an external table ofUserExternalNameRef -> UserExternalName
, present in theFunctionParameters
, and must be explicitly declared usingFunction::declare_imported_user_function
.ExternalName
was used as the type for aFunction
's name; while it thus allowedExternalName::Libcall
in this place, this would have been quite confusing to use it there. Instead, a new enumUserFuncName
is introduced for this name, that's either a user-defined function name (the aboveUserExternalName
) or a test case name.ExternalName
is likely to become a full reference into theFunctionParameters
's mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with.The cache computes a sha256 hash of the
FunctionStencil
, and uses this as the cache key. No equality check (usingPartialEq
) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions.A basic fuzz target has been introduced that tries to do the bare minimum:
UserExternalNameRef -> UserExternalName
hits the cache, and that other modifications don't hit the cache.Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.
Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many
FunctionStencil
s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement.Fixes #4155.