Skip to content

Commit

Permalink
wasmtime: Overhaul trampolines
Browse files Browse the repository at this point in the history
This commit splits `VMCallerCheckedFuncRef::func_ptr` into three new function
pointers: `VMCallerCheckedFuncRef::{wasm,array,native}_call`. Each one has a
dedicated calling convention, so callers just choose the version that works for
them. This is as opposed to the previous behavior where we would chain together
many trampolines that converted between calling conventions, sometimes up to
four on the way into Wasm and four more on the way back out. See [0] for
details.

[0] https://github.com/bytecodealliance/rfcs/blob/main/accepted/tail-calls.md#a-review-of-our-existing-trampolines-calling-conventions-and-call-paths

Thanks to @bjorn3 for the initial idea of having multiple function pointers for
different calling conventions.

This is generally a nice ~5-10% speed up to our call benchmarks across the
board: both Wasm-to-host and host-to-Wasm. The one exception is typed calls from
Wasm to the host, which have a minor regression. We hypothesize that this is
because the old hand-written assembly trampolines did not maintain a call frame
and do a tail call, but the new Cranelift-generated trampolines do maintain a
call frame and do a regular call. The regression is only a couple nanoseconds,
which seems well-explained by these differences explain, and ultimately is not a
big deal.

However, this does lead to a ~5% code size regression for compiled modules.
Before, we compiled a trampoline per escaping function's signature and we
deduplicated these trampolines by signature. Now we compile two trampolines per
escaping function: one for if the host calls via the array calling convention
and one for it the host calls via the native calling convention. Additionally,
we compile a trampoline for every type in the module, in case there is a native
calling convention function from the host that we `call_indirect` of that
type. Much of this is in the `.eh_frame` section in the compiled module, because
each of our trampolines needs an entry there. Note that the `.eh_frame` section
is not required for Wasmtime's correctness, and you can disable its generation
to shrink compiled module code size; we just emit it to play nice with external
unwinders and profilers. We believe there are code size gains available for
follow up work to offset this code size regression in the future.

Backing up a bit: the reason each Wasm module needs to provide these
Wasm-to-native trampolines is because `wasmtime::Func::wrap` and friends allow
embedders to create functions even when there is no compiler available, so they
cannot bring their own trampoline. Instead the Wasm module has to supply
it. This in turn means that we need to look up and patch in these Wasm-to-native
trampolines during roughly instantiation time. But instantiation is super hot,
and we don't want to add more passes over imports or any extra work on this
path. So we integrate with `wasmtime::InstancePre` to patch these trampolines in
ahead of time.

Co-Authored-By: Jamey Sharp <[email protected]>
Co-Authored-By: Alex Crichton <[email protected]>

prtest:full
  • Loading branch information
fitzgen committed Apr 24, 2023
1 parent 7ad2fe3 commit 0bb9d76
Show file tree
Hide file tree
Showing 68 changed files with 3,517 additions and 1,836 deletions.
5 changes: 3 additions & 2 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 12 additions & 0 deletions RELEASES.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,18 @@ Unreleased.

### Changed

* Overhauled the way that Wasmtime calls into Wasm and Wasm calls back out to
the host. Instead of chaining together trampolines to convert between calling
conventions, we now represent `funcref`s with multiple function pointers, one
per calling convention. This paves the way for supporting Wasm tail calls and
also results in ~10% speed ups to a variety of function call benchmarks,
however there are some slight compiled Wasm module code size regressions
(which can be alleviated by disabling optional `.eh_frame`
generation). Additionally, in the C API the `wasmtime_func_call_unchecked`
function gained one more parameter, which is the capacity of the
args-and-results
buffer. [#6262](https://github.com/bytecodealliance/wasmtime/pull/6262)

--------------------------------------------------------------------------------

## 8.0.0
Expand Down
2 changes: 1 addition & 1 deletion benches/call.rs
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,7 @@ fn bench_host_to_wasm<Params, Results>(
space[i] = param.to_raw(&mut *store);
}
untyped
.call_unchecked(&mut *store, space.as_mut_ptr())
.call_unchecked(&mut *store, space.as_mut_ptr(), space.len())
.unwrap();
for (i, expected) in results.iter().enumerate() {
assert_vals_eq(
Expand Down
8 changes: 8 additions & 0 deletions benches/instantiation.rs
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,10 @@ fn bench_sequential(c: &mut Criterion, path: &Path) {
panic!("failed to load benchmark `{}`: {:?}", path.display(), e)
});
let mut linker = Linker::new(&engine);
// Add these imports so we can benchmark instantiation of Sightglass
// benchmark programs.
linker.func_wrap("bench", "start", || {}).unwrap();
linker.func_wrap("bench", "end", || {}).unwrap();
wasmtime_wasi::add_to_linker(&mut linker, |cx| cx).unwrap();
let pre = linker
.instantiate_pre(&module)
Expand Down Expand Up @@ -74,6 +78,10 @@ fn bench_parallel(c: &mut Criterion, path: &Path) {
let module =
Module::from_file(&engine, path).expect("failed to load WASI example module");
let mut linker = Linker::new(&engine);
// Add these imports so we can benchmark instantiation of Sightglass
// benchmark programs.
linker.func_wrap("bench", "start", || {}).unwrap();
linker.func_wrap("bench", "end", || {}).unwrap();
wasmtime_wasi::add_to_linker(&mut linker, |cx| cx).unwrap();
let pre = Arc::new(
linker
Expand Down
Empty file modified cranelift/codegen/meta/src/shared/instructions.rs
100755 → 100644
Empty file.
3 changes: 3 additions & 0 deletions crates/c-api/include/wasmtime/func.h
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,8 @@ WASM_API_EXTERN wasmtime_error_t *wasmtime_func_call(
*
* * The `args_and_results` pointer has enough space to hold all the parameters
* and all the results (but not at the same time).
* * The `args_and_results_len` contains the length of the `args_and_results`
* buffer.
* * Parameters must all be configured as if they were the correct type.
* * Values such as `externref` and `funcref` are valid within the store being
* called.
Expand All @@ -245,6 +247,7 @@ WASM_API_EXTERN wasmtime_error_t *wasmtime_func_call_unchecked(
wasmtime_context_t *store,
const wasmtime_func_t *func,
wasmtime_val_raw_t *args_and_results,
size_t args_and_results_len,
wasm_trap_t **trap
);

Expand Down
3 changes: 2 additions & 1 deletion crates/c-api/src/func.rs
Original file line number Diff line number Diff line change
Expand Up @@ -365,9 +365,10 @@ pub unsafe extern "C" fn wasmtime_func_call_unchecked(
store: CStoreContextMut<'_>,
func: &Func,
args_and_results: *mut ValRaw,
args_and_results_len: usize,
trap_ret: &mut *mut wasm_trap_t,
) -> Option<Box<wasmtime_error_t>> {
match func.call_unchecked(store, args_and_results) {
match func.call_unchecked(store, args_and_results, args_and_results_len) {
Ok(()) => None,
Err(trap) => store_err(trap, trap_ret),
}
Expand Down
2 changes: 1 addition & 1 deletion crates/cranelift-shared/src/obj.rs
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ impl<'a> ModuleTextBuilder<'a> {
// loop could also be updated to forward the relocation to
// the final object file as well.
panic!(
"unresolved relocation could not be procesed against \
"unresolved relocation could not be processed against \
{index:?}: {r:?}"
);
}
Expand Down
Loading

0 comments on commit 0bb9d76

Please sign in to comment.