-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cranelift: Make inline stackprobe unroll sequence valgrind compliant #7470
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
// When manually unrolling adjust the stack pointer and then write a zero | ||
// to the stack at that offset. This generates something like | ||
// `sub sp, sp, #1, lsl #12` followed by `stur wzr, [sp]`. | ||
// | ||
// We do this because valgrind expects us to never write beyond the stack | ||
// pointer and associated redzone. | ||
// See: https://github.com/bytecodealliance/wasmtime/issues/7454 | ||
for _ in 0..probe_count { | ||
insts.extend(Self::gen_sp_reg_adjust(-(guard_size as i32))); | ||
|
||
insts.push(Self::gen_store_stack( | ||
StackAMode::SPOffset(0, I8), | ||
zero_reg(), | ||
I32, | ||
)); | ||
} | ||
|
||
// Restore the stack pointer to its original value | ||
insts.extend(Self::gen_sp_reg_adjust((guard_size * probe_count) as i32)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity / unrelated to this PR: do you know why it is necessary to probe via storing to the stack rather than loading from the stack? It seems like either would accomplish the task AFAICT, and loads should be cheaper than stores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the point of view of global cost, stores might actually be better, I suspect: if it faults in a new page, a load would cause a read-only mapping to the global zero page to be created, and the first write would then cause another page fault (at least if the stack is an ordinary MAP_ANON
region). Or at least, we've seen elsewhere that it's better for first touch to be write when TLB contention is high...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay that makes sense. Thanks!
…ytecodealliance#7470) * x64: Make inline probe sequence valgrind compliant * aarch64: Move stackprobe implementation into separate functions * aarch64: Make inline probe sequence valgrind compliant * riscv64: Make inline probe sequence valgrind compliant * riscv64: Avoid reloading probe amount when unrolling
👋 Hey,
This PR alters our stackprobe unroll sequences to move the stackpointer before writing to the stack. I don't think there was anything wrong with what we were doing before but it makes valgrind really unhappy. (See #7454)
I've tested this with cg_clif on x86, and it now cleanly passes valgrind for the original reported testcase.
Fixes: #7454
prtest:full