Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interrupt blocking instructions #1921

Conversation

eloparco
Copy link
Contributor

Following the discussion in #1910, blocking operations (e.g. atomic wait, sleep, user-defined blocking functions) should be interruptible by exceptions (e.g. trap, proc_exit).

To achieve that, before calling a wasm function, the stack context/environment is saved (using sigsetjmp() in posix). When an exception is encountered in one thread, a signal is sent to all the other threads, interrupting their execution and passing the control to the registered signal handler. The signal handler then restores (using siglongjmp() in posix) the most recent stack context saved for the threads.

@eloparco
Copy link
Contributor Author

eloparco commented Jan 29, 2023

TODO:

  • test in different modes (only classic interpreter for now)
  • add support for other platforms (Windows at least? add dummy/empty functions for the definitions in platform_api_extension.h for the other platforms)
  • try with low level tests in Add some low level tests WebAssembly/wasi-threads#19

@eloparco eloparco force-pushed the eloparco/interrupt-blocking-instructions branch from 2d87749 to d70da2f Compare January 29, 2023 23:39
{
korp_tid self = pthread_self();

stack_ctx_block_sig_and_lock();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't you make it thread-local?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, otherwise it would be shared between threads (and would then require access protection).

Actually, I wanted to avoid globals and return the oldmask from stack_ctx_block_sig_and_lock(), but returning anything from that function was corrupting the stack when used inside os_is_returning_from_signal(), I'm still not sure I understand why

@@ -4183,79 +4183,86 @@ wasm_interp_call_wasm(WASMModuleInstance *module_inst, WASMExecEnv *exec_env,

wasm_exec_env_set_cur_frame(exec_env, frame);

if (function->is_import_func) {
sigjmp_buf *context = os_create_stack_context();
if (!os_is_returning_from_signal(exec_env->handle, context)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about resource cleanup i mentioned in #1910 (comment) ?

@eloparco eloparco force-pushed the eloparco/interrupt-blocking-instructions branch from f21e33a to 57d208d Compare January 31, 2023 22:38
os_is_returning_from_signal(korp_tid handle, sigjmp_buf *context)
{
if (sigsetjmp(*context, 1))
return true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks unsafe to me.
when we return here with a longjmp, the call frame of os_is_returning_from_signal is not valid anymore.

to make this work, os_is_returning_from_signal should always be inlined.
maybe it's more straightforward to use a macro instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, that might explain some problems that I was seeing when trying to use it with AOT. In any case, I realized that I missed this PR #1309 that is basically doing something very similar to what I want to do. The approach is better because we don't need to keep a map <thread_id, stack contexts> if we use tls. I'll try to apply a similar approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eloparco Yes, in the memory access boundary check with hardware trap feature, both interpreter and AOT use the setjmp/longjmp mechanism in posix like platforms. The local variable jmpbuf is pushed into exec_env's jmpbuf stack before setjmp, runtime doesn't allocate memory for it.
I think that the principle of this PR is similar to the hw bound check feature: firstly register a signal handler, then setjmp before call, and when running wasm function, if there is a signal raised, the signal handler is trigger and the handler longjmp into the place of setjmp.

I think there may be two ways to implement this PR: one is to extend the hw bound check feature, e.g. reuse the current signal handler, the current setjmp and longjmp process, but add another macro to control them respectively (for example, OS_ENABLE_XXX and OS_ENABLE_HW_BOUND_CHECK). The other is to register new signal handler, add new setjmp/longjmp process, but I suggest not to add setjmp in wasm_interp_class.c, had better add in wasm_runtime.c, before calling wasm function:
https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/iwasm/interpreter/wasm_runtime.c#L2163
https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/iwasm/interpreter/wasm_runtime.c#LL2459C3-L2459C3

And note that setjmp is only supported by posix platforms, there should be macro to control the feature, and had better use os_setjmp/os_longjmp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And could you submit this PR to main branch since the main branch also has the issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/iwasm/interpreter/wasm_runtime.c#L2163

yes that works, for aot I'm not sure what to wrap with os_setjmp since there are 2 calls, I need to check

ret = invoke_native_internal(exec_env, function->u.func.func_ptr,

ret = invoke_native_internal(exec_env, function->u.func.func_ptr,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And could you submit this PR to main branch since the main branch also has the issue?

I'll try to have it working here first since I'm testing it with wasi threads, then I'll create a PR in main to merge it there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm closing this PR in favour of #1930, where I rely on what's already implemented for hardware bound checks.

@eloparco eloparco closed this Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants