Skip to content

Commit

Permalink
wasmtime: Overhaul trampolines
Browse files Browse the repository at this point in the history
This commit splits `VMCallerCheckedFuncRef::func_ptr` into three new function
pointers: `VMCallerCheckedFuncRef::{wasm,array,native}_call`. Each one has a
dedicated calling convention, so callers just choose the version that works for
them. This is as opposed to the previous behavior where we would chain together
many trampolines that converted between calling conventions, sometimes up to
four on the way into Wasm and four more on the way back out. See [0] for
details.

[0] https://github.com/bytecodealliance/rfcs/blob/main/accepted/tail-calls.md#a-review-of-our-existing-trampolines-calling-conventions-and-call-paths

Thanks to @bjorn3 for the initial idea of having multiple function pointers for
different calling conventions.

This is generally a nice ~5-10% speed up to our call benchmarks across the
board: both Wasm-to-host and host-to-Wasm. The one exception is typed calls from
Wasm to the host, which have a minor regression. We hypothesize that this is
because the old hand-written assembly trampolines did not maintain a call frame
and do a tail call, but the new Cranelift-generated trampolines do maintain a
call frame and do a regular call. The regression is only a couple nanoseconds,
which seems well-explained by these differences explain, and ultimately is not a
big deal.

However, this does lead to a ~5% code size regression for compiled modules.
Before, we compiled a trampoline per escaping function's signature and we
deduplicated these trampolines by signature. Now we compile two trampolines per
escaping function: one for if the host calls via the array calling convention
and one for it the host calls via the native calling convention. Additionally,
we compile a trampoline for every type in the module, in case there is a native
calling convention function from the host that we `call_indirect` of that
type. Much of this is in the `.eh_frame` section in the compiled module, because
each of our trampolines needs an entry there. Note that the `.eh_frame` section
is not required for Wasmtime's correctness, and you can disable its generation
to shrink compiled module code size; we just emit it to play nice with external
unwinders and profilers. We believe there are code size gains available for
follow up work to offset this code size regression in the future.

Backing up a bit: the reason each Wasm module needs to provide these
Wasm-to-native trampolines is because `wasmtime::Func::wrap` and friends allow
embedders to create functions even when there is no compiler available, so they
cannot bring their own trampoline. Instead the Wasm module has to supply
it. This in turn means that we need to look up and patch in these Wasm-to-native
trampolines during roughly instantiation time. But instantiation is super hot,
and we don't want to add more passes over imports or any extra work on this
path. So we integrate with `wasmtime::InstancePre` to patch these trampolines in
ahead of time.

Co-Authored-By: Jamey Sharp <[email protected]>
Co-Authored-By: Alex Crichton <[email protected]>

prtest:full
  • Loading branch information
fitzgen committed Apr 21, 2023
1 parent 7ad2fe3 commit ea06742
Show file tree
Hide file tree
Showing 61 changed files with 3,337 additions and 1,555 deletions.
5 changes: 3 additions & 2 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion benches/call.rs
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,7 @@ fn bench_host_to_wasm<Params, Results>(
space[i] = param.to_raw(&mut *store);
}
untyped
.call_unchecked(&mut *store, space.as_mut_ptr())
.call_unchecked(&mut *store, space.as_mut_ptr(), space.len())
.unwrap();
for (i, expected) in results.iter().enumerate() {
assert_vals_eq(
Expand Down
4 changes: 4 additions & 0 deletions benches/instantiation.rs
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,8 @@ fn bench_sequential(c: &mut Criterion, path: &Path) {
panic!("failed to load benchmark `{}`: {:?}", path.display(), e)
});
let mut linker = Linker::new(&engine);
linker.func_wrap("bench", "start", || {}).unwrap();
linker.func_wrap("bench", "end", || {}).unwrap();
wasmtime_wasi::add_to_linker(&mut linker, |cx| cx).unwrap();
let pre = linker
.instantiate_pre(&module)
Expand Down Expand Up @@ -74,6 +76,8 @@ fn bench_parallel(c: &mut Criterion, path: &Path) {
let module =
Module::from_file(&engine, path).expect("failed to load WASI example module");
let mut linker = Linker::new(&engine);
linker.func_wrap("bench", "start", || {}).unwrap();
linker.func_wrap("bench", "end", || {}).unwrap();
wasmtime_wasi::add_to_linker(&mut linker, |cx| cx).unwrap();
let pre = Arc::new(
linker
Expand Down
1 change: 1 addition & 0 deletions crates/c-api/include/wasmtime/func.h
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,7 @@ WASM_API_EXTERN wasmtime_error_t *wasmtime_func_call_unchecked(
wasmtime_context_t *store,
const wasmtime_func_t *func,
wasmtime_val_raw_t *args_and_results,
size_t args_and_results_len,
wasm_trap_t **trap
);

Expand Down
3 changes: 2 additions & 1 deletion crates/c-api/src/func.rs
Original file line number Diff line number Diff line change
Expand Up @@ -365,9 +365,10 @@ pub unsafe extern "C" fn wasmtime_func_call_unchecked(
store: CStoreContextMut<'_>,
func: &Func,
args_and_results: *mut ValRaw,
args_and_results_len: usize,
trap_ret: &mut *mut wasm_trap_t,
) -> Option<Box<wasmtime_error_t>> {
match func.call_unchecked(store, args_and_results) {
match func.call_unchecked(store, args_and_results, args_and_results_len) {
Ok(()) => None,
Err(trap) => store_err(trap, trap_ret),
}
Expand Down
2 changes: 1 addition & 1 deletion crates/cranelift-shared/src/obj.rs
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ impl<'a> ModuleTextBuilder<'a> {
// loop could also be updated to forward the relocation to
// the final object file as well.
panic!(
"unresolved relocation could not be procesed against \
"unresolved relocation could not be processed against \
{index:?}: {r:?}"
);
}
Expand Down
Loading

0 comments on commit ea06742

Please sign in to comment.