Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make longjmp/exceptions helpers thread-safe #12056

Merged
merged 29 commits into from
Sep 26, 2020
Merged

Make longjmp/exceptions helpers thread-safe #12056

merged 29 commits into from
Sep 26, 2020

Conversation

kripken
Copy link
Member

@kripken kripken commented Aug 26, 2020

This adds the thread-local annotation to those globals. Previously they were in
JS, which is effectively thread-local, but then we moved them to C which meant
they were stored in the same shared memory for all threads. A race could
happen if threads (or longjmp) operations happened at just the right time, one
writing to those globals and another reading.

Also make compiler-rt now build with a multithreaded variation, as the implementation
of those globals is in that library.

Also add a testcase that runs exceptions on multiple threads. It's not a guaranteed
way to notice a race, but it may help, and this is an area we didn't have coverage
of.

Fixes #12035

This has been a possible race condition for a very long time. I think it went
unnoticed because exceptions and longjmp tend to be used for exceptional
things, and not constantly being executed. And also until we get wasm
exceptions support they are also slow, so people have been trying to avoid
them as much as possible anyhow.

@@ -31,3 +31,15 @@ long* _get_timezone() {
void __lock(void* ptr) {}
void __unlock(void* ptr) {}

/* References to these longjmp- and exceptions-supporting things are generated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why move into this file?

These symbols seems to live more logically in compiler-rt, which is a library that hold internal symbols that the compiler can generate during codegee.

Copy link
Member Author

@kripken kripken Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason is that compiler-rt doesn't have a multithreaded variation atm, and adding one just for this seemed silly to me. Given libc already had a kind of "misc" file, this seemed reasonable. But I don't feel strongly, what do you prefer?

@sbc100
Copy link
Collaborator

sbc100 commented Aug 26, 2020

OK, maybe just mention that in the PR.

Maybe be worth putting it these in their own source file too, since they are locally part of the same functionality?

How about emscripten_exception_builtins.c? I can't really think of a better name...

@kripken
Copy link
Member Author

kripken commented Aug 26, 2020

@sbc100 sounds good, split off and renamed.

@kripken
Copy link
Member Author

kripken commented Aug 26, 2020

@tlively Are there any known issues with TLS that we should be aware of here?

@tlively
Copy link
Member

tlively commented Aug 27, 2020

No bugs, but our TLS is limited to the local-exec model, which means that thread_local data can only be accessed from the main executable and not from any shared libraries. As long as the functions that directly access these are statically linked in, there should be no problems.

@kripken
Copy link
Member Author

kripken commented Aug 27, 2020

I see, thanks @tlively That sounds fine then.

But trying to write a test for pthreads + exceptions, it fails for me. Here is my test and the diff: https://gist.github.com/kripken/6c9cd1d8e575c09e5636e68cae435bdd That's all with this PR, and then compiling the program without _Thread_local (on those special globals), and then with (as in the PR's default). The diff of the ThreadMain program shows that what was a global address, 2948 (which I am guessing is __THREW__) becomes just 0.

edit: example compile command: ./emcc exceptions.cpp -pthread -O2 -fexceptions --profiling

Maybe my testcase is bad somehow?

@tlively
Copy link
Member

tlively commented Aug 27, 2020

Hmm, I can't see anything wrong with it. I'll clone this branch locally and see if I can figure out what is happening.

@tlively
Copy link
Member

tlively commented Aug 27, 2020

Ok, I poked around a bit and I think this change is going to require an upstream LLVM change. Right now the Emscripten SjLj lowering pass constructs the __THREW__ and __threwValue globals if they are not already defined in the module, but it doesn't set any TLS mode on them. So codegen lowers accesses to them as normal accesses rather than as TLS accesses. That 0 address is probably supposed to be the offset into the TLS segment, but codegen didn't know to add in the TLS base.

@kripken
Copy link
Member Author

kripken commented Aug 27, 2020

Oh, that makes sense, yeah... only the linker sees that these are TLS, but if there are also necessary codegen changes then it's not a simple relocation the linker can fix up.

Ok, we'll wait on a backend change then. Not sure there is any other option here other than to go back to JS variables for these, which is slow.

*/

int _Thread_local __THREW__ = 0;
int _Thread_local __threwValue = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you can use the thread_local keyword if you include

#include <threads.h>
thread_local int foo = 0;

I think that i slightly more readable and also the same keyword as C++ so less confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, done.

@tlively
Copy link
Member

tlively commented Sep 24, 2020

LLVM patch is up: https://reviews.llvm.org/D88262

tlively added a commit to llvm/llvm-project that referenced this pull request Sep 24, 2020
Emscripten's longjump and exception mechanism depends on two global variables,
`__THREW__` and `__threwValue`, which are changed to be defined as thread-local
in emscripten-core/emscripten#12056. This patch updates
the corresponding code in the WebAssembly backend to properly declare these
globals as thread-local as well.

Differential Revision: https://reviews.llvm.org/D88262
@sbc100
Copy link
Collaborator

sbc100 commented Sep 24, 2020

I'm tempted to say we should add a thread variant of compiler-rt just because this code seems to make so more sense in compiler-rt .. since its literally defining compiler-internally-generated symbols.

@sbc100
Copy link
Collaborator

sbc100 commented Sep 24, 2020

Then we could avoid the file rename and this patch would be way tiny.

@kripken
Copy link
Member Author

kripken commented Sep 24, 2020

This patch is already pretty tiny isn't it @sbc100 ? Not sure what you are thinking of trimming out.

Making compiler-rt have a threaded variant seems like a larger change.

However, I do not feel strongly about this either way.

@kripken
Copy link
Member Author

kripken commented Sep 24, 2020

Changed to use a compiler-rt variant. Slightly sad this increases our SDK downloads by 150K, but if this feels cleaner sgtm.

@kripken
Copy link
Member Author

kripken commented Sep 25, 2020

This now includes undoing the test disabling in #12337. After LLVM rolls in this should pass. Locally it looks ok for me on the disabled tests.

Note that I had to fix up one test, test_warning_flags, and also I had to add libc_rt_wasm to also have a multithreaded variation.

@kripken
Copy link
Member Author

kripken commented Sep 25, 2020

A bunch more libraries must now be built with -mt variations, asan_js and embind. I am not actually sure why these and not others, so this worries me - will users hit more library combinations we don't test?

It seems like when the user builds with pthreads, we now need all libraries to be built with atomics. So it's not just some libraries we annotate with -mt, but they all must be. So like relocatable, we may as well have another cache dir, really.

Or is this just libraries that use setjmp or exceptions, and so they have __THREW__ which is now atomic?

@kripken
Copy link
Member Author

kripken commented Sep 25, 2020

asan_js does not use exceptions or longjmp. Literally the only difference in the .o files (between the normal and -mt versions of it) is the features section. So I'm not sure why the LLVM change made us need to build it with an -mt variation while before we didn't...

@tlively I'm very confused here, please advise...

@@ -1086,7 +1086,7 @@ class libwebgpu_cpp(MTLibrary):
src_files = ['webgpu_cpp.cpp']


class libembind(Library):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be needed as part of this change. At least that is not the intention behind the recent llvm changes.

What error were you seeing without this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usual error on features not matching: can't link in this .o because it disallows atomics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically

wasm-ld: error: --shared-memory is disallowed by bind.o because it was not compiled with 'atomics' or 'bulk-memory' features.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange.. I can't see how that would start happening as a result of this change. @tlively will hopefully know what is going on here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building bind.o with an -mt variant fixes it, of course.

What's not clear to me is why this worked before, and what @tlively 's change in LLVM makes different so that it doesn't work now...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think I know what's going on. I'll do another LLVM patch to explain and fix it.

@dschuff
Copy link
Member

dschuff commented Sep 25, 2020

Does this really need to be in compiler-rt? It seems we could avoid the -mt build issues just by moving it to libcxxabi?

edit: Nevermind, I guess this is also used for setjmp/longjmp. We could potentially split it so setjmp and EH use different variables, and put one in libcxxabi and the other in libc (both of those already have threaded variants, right?)

@kripken
Copy link
Member Author

kripken commented Sep 25, 2020

It may be good to land this as-is (with the comment cleanup @sbc100 mentioned) as it now passes tests while enabling the ones that were disabled, so it is strictly better than what is in trunk atm. Then after the LLVM update we can adjust.

However if we think the LLVM update may end up not rolling in by us re-enabling tests here, we should probably just wait, to avoid any annoying back-and-forth.

@sbc100
Copy link
Collaborator

sbc100 commented Sep 25, 2020

I think maybe we should get the bottom of things before landing.

I am rather confused about why you started seeing wasm-ld: error: --shared-memory is disallowed by bind.o because it was not compiled with 'atomics' or 'bulk-memory' features.

@tlively
Copy link
Member

tlively commented Sep 25, 2020

The way my previous LLVM patch worked, it unconditionally made __THREW__ and __threwValue thread-local and depended on the CoalesceFeaturesAndStripAtomics pass to downgrade it to a normal variable if necessary. Unfortunately that had the unintended side effect of making every non-TLS-supporting object strip TLS, which normally makes the object thread-unsafe and causes it to be disallowed from being linked into a shared-memory module. The fix is in https://reviews.llvm.org/D88323, which checks features before making the variables thread-local.

@kripken
Copy link
Member Author

kripken commented Sep 25, 2020

Passes locally for me with that patch. Uploaded simplified version here that can be run once that has rolled in.

@sbc100
Copy link
Collaborator

sbc100 commented Sep 25, 2020

Look like Thomas's change is rolling in now: https://chromium-review.googlesource.com/c/emscripten-releases/+/2432788

One possible downside I just realized is that we now have a very hard dependency on this version llvm... but maybe its only for programs that use exceptions of setjmp/longjmp which isn't so bad.

@tlively
Copy link
Member

tlively commented Sep 25, 2020

Look like Thomas's change is rolling in now: https://chromium-review.googlesource.com/c/emscripten-releases/+/2432788

One possible downside I just realized is that we now have a very hard dependency on this version llvm... but maybe its only for programs that use exceptions of setjmp/longjmp which isn't so bad.

Did we decide to try to support other versions of LLVM?

@dschuff
Copy link
Member

dschuff commented Sep 25, 2020

AFAIK we haven't really committed to that, although we also think that it should mostly just work (although we aren't testing it anywhere either, so it's probably broken in some way or other).

@tlively
Copy link
Member

tlively commented Sep 25, 2020

This kind of ABI change that requires a simultaneous Emscripten and LLVM change seems kind of rare, but is definitely the kind of thing that would make support stable LLVM difficult.

@kripken kripken merged commit e43a870 into master Sep 26, 2020
@kripken kripken deleted the exc branch September 26, 2020 00:20
arichardson pushed a commit to arichardson/llvm-project that referenced this pull request Mar 24, 2021
Emscripten's longjump and exception mechanism depends on two global variables,
`__THREW__` and `__threwValue`, which are changed to be defined as thread-local
in emscripten-core/emscripten#12056. This patch updates
the corresponding code in the WebAssembly backend to properly declare these
globals as thread-local as well.

Differential Revision: https://reviews.llvm.org/D88262
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this pull request Oct 7, 2022
Emscripten's longjump and exception mechanism depends on two global variables,
`__THREW__` and `__threwValue`, which are changed to be defined as thread-local
in emscripten-core/emscripten#12056. This patch updates
the corresponding code in the WebAssembly backend to properly declare these
globals as thread-local as well.

Differential Revision: https://reviews.llvm.org/D88262
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exception Handling - exception on the wrong thread when using pthreads
4 participants