Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cranelift: Make inline stackprobe unroll sequence valgrind compliant #7470

Merged
merged 5 commits into from
Nov 3, 2023

Conversation

afonso360
Copy link
Contributor

👋 Hey,

This PR alters our stackprobe unroll sequences to move the stackpointer before writing to the stack. I don't think there was anything wrong with what we were doing before but it makes valgrind really unhappy. (See #7454)

I've tested this with cg_clif on x86, and it now cleanly passes valgrind for the original reported testcase.

Fixes: #7454
prtest:full

@afonso360 afonso360 requested a review from a team as a code owner November 3, 2023 12:59
@afonso360 afonso360 requested review from fitzgen and removed request for a team November 3, 2023 12:59
@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen labels Nov 3, 2023
Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment on lines +1181 to +1199
// When manually unrolling adjust the stack pointer and then write a zero
// to the stack at that offset. This generates something like
// `sub sp, sp, #1, lsl #12` followed by `stur wzr, [sp]`.
//
// We do this because valgrind expects us to never write beyond the stack
// pointer and associated redzone.
// See: https://github.com/bytecodealliance/wasmtime/issues/7454
for _ in 0..probe_count {
insts.extend(Self::gen_sp_reg_adjust(-(guard_size as i32)));

insts.push(Self::gen_store_stack(
StackAMode::SPOffset(0, I8),
zero_reg(),
I32,
));
}

// Restore the stack pointer to its original value
insts.extend(Self::gen_sp_reg_adjust((guard_size * probe_count) as i32));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity / unrelated to this PR: do you know why it is necessary to probe via storing to the stack rather than loading from the stack? It seems like either would accomplish the task AFAICT, and loads should be cheaper than stores.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the point of view of global cost, stores might actually be better, I suspect: if it faults in a new page, a load would cause a read-only mapping to the global zero page to be created, and the first write would then cause another page fault (at least if the stack is an ordinary MAP_ANON region). Or at least, we've seen elsewhere that it's better for first touch to be write when TLB contention is high...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay that makes sense. Thanks!

@afonso360 afonso360 added this pull request to the merge queue Nov 3, 2023
Merged via the queue into bytecodealliance:main with commit 2e8c488 Nov 3, 2023
@afonso360 afonso360 deleted the unroll-valgrind branch November 3, 2023 20:59
elliottt pushed a commit to elliottt/wasmtime that referenced this pull request Nov 3, 2023
…ytecodealliance#7470)

* x64: Make inline probe sequence valgrind compliant

* aarch64: Move stackprobe implementation into separate functions

* aarch64: Make inline probe sequence valgrind compliant

* riscv64: Make inline probe sequence valgrind compliant

* riscv64: Avoid reloading probe amount when unrolling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cranelift: inline probestack makes valgrind unhappy
3 participants