Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This commit splits `VMCallerCheckedFuncRef::func_ptr` into three new function pointers: `VMCallerCheckedFuncRef::{wasm,array,native}_call`. Each one has a dedicated calling convention, so callers just choose the version that works for them. This is as opposed to the previous behavior where we would chain together many trampolines that converted between calling conventions, sometimes up to four on the way into Wasm and four more on the way back out. See [0] for details. [0] https://github.com/bytecodealliance/rfcs/blob/main/accepted/tail-calls.md#a-review-of-our-existing-trampolines-calling-conventions-and-call-paths Thanks to @bjorn3 for the initial idea of having multiple function pointers for different calling conventions. This is generally a nice ~5-10% speed up to our call benchmarks across the board: both Wasm-to-host and host-to-Wasm. The one exception is typed calls from Wasm to the host, which have a minor regression. We hypothesize that this is because the old hand-written assembly trampolines did not maintain a call frame and do a tail call, but the new Cranelift-generated trampolines do maintain a call frame and do a regular call. The regression is only a couple nanoseconds, which seems well-explained by these differences explain, and ultimately is not a big deal. However, this does lead to a ~5% code size regression for compiled modules. Before, we compiled a trampoline per escaping function's signature and we deduplicated these trampolines by signature. Now we compile two trampolines per escaping function: one for if the host calls via the array calling convention and one for it the host calls via the native calling convention. Additionally, we compile a trampoline for every type in the module, in case there is a native calling convention function from the host that we `call_indirect` of that type. Much of this is in the `.eh_frame` section in the compiled module, because each of our trampolines needs an entry there. Note that the `.eh_frame` section is not required for Wasmtime's correctness, and you can disable its generation to shrink compiled module code size; we just emit it to play nice with external unwinders and profilers. We believe there are code size gains available for follow up work to offset this code size regression in the future. Backing up a bit: the reason each Wasm module needs to provide these Wasm-to-native trampolines is because `wasmtime::Func::wrap` and friends allow embedders to create functions even when there is no compiler available, so they cannot bring their own trampoline. Instead the Wasm module has to supply it. This in turn means that we need to look up and patch in these Wasm-to-native trampolines during roughly instantiation time. But instantiation is super hot, and we don't want to add more passes over imports or any extra work on this path. So we integrate with `wasmtime::InstancePre` to patch these trampolines in ahead of time. Co-Authored-By: Jamey Sharp <[email protected]> Co-Authored-By: Alex Crichton <[email protected]> prtest:full
- Loading branch information