-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make longjmp/exceptions helpers thread-safe #12056
Conversation
system/lib/libc/extras.c
Outdated
@@ -31,3 +31,15 @@ long* _get_timezone() { | |||
void __lock(void* ptr) {} | |||
void __unlock(void* ptr) {} | |||
|
|||
/* References to these longjmp- and exceptions-supporting things are generated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why move into this file?
These symbols seems to live more logically in compiler-rt, which is a library that hold internal symbols that the compiler can generate during codegee.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason is that compiler-rt doesn't have a multithreaded variation atm, and adding one just for this seemed silly to me. Given libc already had a kind of "misc" file, this seemed reasonable. But I don't feel strongly, what do you prefer?
OK, maybe just mention that in the PR. Maybe be worth putting it these in their own source file too, since they are locally part of the same functionality? How about |
@sbc100 sounds good, split off and renamed. |
@tlively Are there any known issues with TLS that we should be aware of here? |
No bugs, but our TLS is limited to the local-exec model, which means that thread_local data can only be accessed from the main executable and not from any shared libraries. As long as the functions that directly access these are statically linked in, there should be no problems. |
I see, thanks @tlively That sounds fine then. But trying to write a test for pthreads + exceptions, it fails for me. Here is my test and the diff: https://gist.github.com/kripken/6c9cd1d8e575c09e5636e68cae435bdd That's all with this PR, and then compiling the program without edit: example compile command: Maybe my testcase is bad somehow? |
Hmm, I can't see anything wrong with it. I'll clone this branch locally and see if I can figure out what is happening. |
Ok, I poked around a bit and I think this change is going to require an upstream LLVM change. Right now the Emscripten SjLj lowering pass constructs the |
Oh, that makes sense, yeah... only the linker sees that these are TLS, but if there are also necessary codegen changes then it's not a simple relocation the linker can fix up. Ok, we'll wait on a backend change then. Not sure there is any other option here other than to go back to JS variables for these, which is slow. |
*/ | ||
|
||
int _Thread_local __THREW__ = 0; | ||
int _Thread_local __threwValue = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you can use the thread_local
keyword if you include
#include <threads.h>
thread_local int foo = 0;
I think that i slightly more readable and also the same keyword as C++ so less confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, done.
LLVM patch is up: https://reviews.llvm.org/D88262 |
Emscripten's longjump and exception mechanism depends on two global variables, `__THREW__` and `__threwValue`, which are changed to be defined as thread-local in emscripten-core/emscripten#12056. This patch updates the corresponding code in the WebAssembly backend to properly declare these globals as thread-local as well. Differential Revision: https://reviews.llvm.org/D88262
I'm tempted to say we should add a thread variant of compiler-rt just because this code seems to make so more sense in compiler-rt .. since its literally defining compiler-internally-generated symbols. |
Then we could avoid the file rename and this patch would be way tiny. |
This patch is already pretty tiny isn't it @sbc100 ? Not sure what you are thinking of trimming out. Making compiler-rt have a threaded variant seems like a larger change. However, I do not feel strongly about this either way. |
Changed to use a compiler-rt variant. Slightly sad this increases our SDK downloads by 150K, but if this feels cleaner sgtm. |
This now includes undoing the test disabling in #12337. After LLVM rolls in this should pass. Locally it looks ok for me on the disabled tests. Note that I had to fix up one test, |
A bunch more libraries must now be built with It seems like when the user builds with pthreads, we now need all libraries to be built with atomics. So it's not just some libraries we annotate with Or is this just libraries that use setjmp or exceptions, and so they have |
@tlively I'm very confused here, please advise... |
tools/system_libs.py
Outdated
@@ -1086,7 +1086,7 @@ class libwebgpu_cpp(MTLibrary): | |||
src_files = ['webgpu_cpp.cpp'] | |||
|
|||
|
|||
class libembind(Library): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should be needed as part of this change. At least that is not the intention behind the recent llvm changes.
What error were you seeing without this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usual error on features not matching: can't link in this .o because it disallows atomics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically
wasm-ld: error: --shared-memory is disallowed by bind.o because it was not compiled with 'atomics' or 'bulk-memory' features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange.. I can't see how that would start happening as a result of this change. @tlively will hopefully know what is going on here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Building bind.o
with an -mt
variant fixes it, of course.
What's not clear to me is why this worked before, and what @tlively 's change in LLVM makes different so that it doesn't work now...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think I know what's going on. I'll do another LLVM patch to explain and fix it.
Does this really need to be in compiler-rt? It seems we could avoid the -mt build issues just by moving it to libcxxabi? edit: Nevermind, I guess this is also used for setjmp/longjmp. We could potentially split it so setjmp and EH use different variables, and put one in libcxxabi and the other in libc (both of those already have threaded variants, right?) |
It may be good to land this as-is (with the comment cleanup @sbc100 mentioned) as it now passes tests while enabling the ones that were disabled, so it is strictly better than what is in trunk atm. Then after the LLVM update we can adjust. However if we think the LLVM update may end up not rolling in by us re-enabling tests here, we should probably just wait, to avoid any annoying back-and-forth. |
I think maybe we should get the bottom of things before landing. I am rather confused about why you started seeing |
The way my previous LLVM patch worked, it unconditionally made |
Passes locally for me with that patch. Uploaded simplified version here that can be run once that has rolled in. |
Look like Thomas's change is rolling in now: https://chromium-review.googlesource.com/c/emscripten-releases/+/2432788 One possible downside I just realized is that we now have a very hard dependency on this version llvm... but maybe its only for programs that use exceptions of setjmp/longjmp which isn't so bad. |
Did we decide to try to support other versions of LLVM? |
AFAIK we haven't really committed to that, although we also think that it should mostly just work (although we aren't testing it anywhere either, so it's probably broken in some way or other). |
This kind of ABI change that requires a simultaneous Emscripten and LLVM change seems kind of rare, but is definitely the kind of thing that would make support stable LLVM difficult. |
Emscripten's longjump and exception mechanism depends on two global variables, `__THREW__` and `__threwValue`, which are changed to be defined as thread-local in emscripten-core/emscripten#12056. This patch updates the corresponding code in the WebAssembly backend to properly declare these globals as thread-local as well. Differential Revision: https://reviews.llvm.org/D88262
Emscripten's longjump and exception mechanism depends on two global variables, `__THREW__` and `__threwValue`, which are changed to be defined as thread-local in emscripten-core/emscripten#12056. This patch updates the corresponding code in the WebAssembly backend to properly declare these globals as thread-local as well. Differential Revision: https://reviews.llvm.org/D88262
This adds the thread-local annotation to those globals. Previously they were in
JS, which is effectively thread-local, but then we moved them to C which meant
they were stored in the same shared memory for all threads. A race could
happen if threads (or longjmp) operations happened at just the right time, one
writing to those globals and another reading.
Also make compiler-rt now build with a multithreaded variation, as the implementation
of those globals is in that library.
Also add a testcase that runs exceptions on multiple threads. It's not a guaranteed
way to notice a race, but it may help, and this is an area we didn't have coverage
of.
Fixes #12035
This has been a possible race condition for a very long time. I think it went
unnoticed because exceptions and longjmp tend to be used for exceptional
things, and not constantly being executed. And also until we get wasm
exceptions support they are also slow, so people have been trying to avoid
them as much as possible anyhow.