Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception Handling - exception on the wrong thread when using pthreads #12035

Closed
rsielken opened this issue Aug 25, 2020 · 23 comments · Fixed by #12056
Closed

Exception Handling - exception on the wrong thread when using pthreads #12035

rsielken opened this issue Aug 25, 2020 · 23 comments · Fixed by #12056

Comments

@rsielken
Copy link

We have a large project which now runs pretty well except for the one module that uses C++ exceptions extensively. We generally have 3 pthreads running and what we see is that the expection on pthread A is thrown and then caught by any of the 3 pthreads (A, B or C) when it should always be A. Having pthread B or C don't have catches for that exception type, so those fall out (worker.js onmessage() captured an uncaught exception: RuntimeError: unreachable executed) and pthread A which needed the exception goes down error paths and often gets an index out of bounds exception trying to run code it shouldn't be. If I remove the other pthreads and only have the one, main pthread (losing the functionality of those other threads for the sake of this investigation - not a viable solution), we don't seem to have this issue but we are not convinced that that conclusion is not just circumstantial.

We have seen these issues with EMSDK 1.39.17, 1.39.20 and 2.0.0. We are building with -O1 which then requires -s DISABLE_EXCEPTION_CATCHING=0 per https://emscripten.org/docs/optimizing/Optimizing-Code.html#optimizing-code-exception-catching, but we have seen the issue with -O0 too.

Per ajeejin's May 28th comment from #11233, we tried to build with -s DISABLE_EXCEPTION_CATCHING=0 removed and -fwasm-exceptions. 2 modules got clang++ errors saying to submit a but to bugs.llvm.org including some files, but our company legal hasn't cleared us to do that because the files are basically the source code. I did try those 2 modules without -fwasm-exceptions and all the other modules with -fwasm-exceptions, but that didn't run:
wasm streaming compile failed: CompileError: wasm validation error: at offset 146711: unknown section before code section falling back to ArrayBuffer instantiation failed to asynchronously prepare wasm: CompileError: wasm validation error: at offset 146711: expected code section CompileError: wasm validation error: at offset 146711: expected code section
These exceptions are thrown and should be caught in the C++ code (not javascript), so https://emscripten.org/docs/porting/Debugging.html#handling-c-exceptions-from-javascript should not apply.

Are there additional settings that we need to have these exceptions be thrown and caught by the correct (same) thread (I have read through the Emscripten doc and tried all sorts of combinations of -S and other settings - all to no avail)? Is there anything to do until the new exception handling (-fwasm-exceptions) is done and ready? Is this a known issue for the existing exception handling that will be fixed? I don't have a test case outside of our real code that I can share - is it worth my time to work on a test case that I can share or is this already known and will be addressed (I don't want to waste time on the test case if it isn't needed but if this can/should be fixed on the existing exception handling infrastructure, I'm certainly will to go spend that time)?

@sbc100
Copy link
Collaborator

sbc100 commented Aug 25, 2020

According to #11233, exception handling is not current thread safe. So what you are seeing makes sense. Also according to #11233, it seems that it was considered not worth fixing.

However, if you would like to work on it, its certainly something that we could accept fixes for.

Since you are at least the third person to run into this issue I think we should disable exception handling when threading is enabled. There is clearly no way it works today.

@rsielken
Copy link
Author

We had read all of #11233 and the thread safety concerned us and we figured we were running into that very issue. Disabling exceptions doesn't really help us as the code will still be broken without the exceptions being thrown and caught.

What we don't have a good feel for is what to do next. Is the native exception handling (-fwasm-exceptions) going to be any time soon (it has been 3 months since the "several more months" comment in 11233) and we should wait/help with that rather than spending time on the current exception handling? sbc100 mentioned here and in 11233 about accepting fixes to make the current exception handling thread safe in the meantime, but is it worth that effort if it is about to be replaced?

@kripken
Copy link
Member

kripken commented Aug 25, 2020

I updated that issue, now - #11518 fixed most of the known problem there. If you still see an issue, maybe you are hitting the specific corner case mentioned in the comment in the source there that is not handled yet? If so, fixing that specific issue may be worth considering.

cc @aheejin for the status of -fwasm-exceptions.

@rsielken
Copy link
Author

We had seen #11518 and waited for EMSDK 2.0.0 in hopes that it would fix the issue we were having, but alas, it has not.

// TODO: Unfortunately this approach still cannot be considered thread-safe because single
// exception object can be simultaneously thrown in several threads and its state (except
// reference counter) is not protected from that. Also protection is not enough, separate state
// should be allocated. libcxxabi has concept of dependent exception which is used for that
// purpose, it references the primary exception.

While we have 3 pthreads, the one pthread is the primary thread and the only thread that is really using the exception handling. Therefore, we wouldn't have the same exception object being thrown on multiple threads - it might be thrown multiple times but it would always be on that primary pthread. Therefore, I wouldn't think the comment would apply, but perhaps I'm misreading it.

However, it does bring up a separate question. We have cases where an exception is thrown and in the catch clause, something is done and then the exception is thrown again (and caught again). Is throwing an exception from a catch block a safe operation? Would it matter if it was a new exception or the same/caught exception? Would it matter if we saved a reference to the exception (or the fact that we needed an exception and created a new exception), got out of the catch block, and then rethrow the exception?

@kripken
Copy link
Member

kripken commented Aug 25, 2020

I think those are safe, but I'm not enough of a C++ exceptions expert to know for sure. A quick thing you can do is build with -fsanitize=undefined, as UBSan may report something if those are undefined behavior. ASan may also be worth trying. (You can also do those on a native build, which may be simpler.)

In general it sounds like you may be hitting undefined behavior or a unknown bug. It might be good to create a small standalone testcase for investigation, could be an easy fix given that.

@kripken
Copy link
Member

kripken commented Aug 25, 2020

Btw, completely unrelated to this, I believe #12039 may fix a thread safety issue. Long shot, but might be worth seeing if that PR helps you @rsielken

@aheejin
Copy link
Member

aheejin commented Aug 26, 2020

Wasm native exception handling, enabled in clang by -fwasm-exceptions, is not really reliable at the moment. We had been working on stabilizing it, but the situations changed and we might undergo some spec changes, so the stabilization effort is on hold for now. And if it weren't for unreliable status, it will work only with V8 (Chrome) and only under a flag. So if you are planning to run your program in all mainstream browsers, it is not really available.

That being said, but, I'd still like to know what the full error message in clang you encountered was (if the source code is not available), and the version of toolchain you are using, just in case I can get any info from that. If the error message is from the debug build toolchain it'd be better, but I'm not sure if EMSDK provides that. Does it? @kripken

However, it does bring up a separate question. We have cases where an exception is thrown and in the catch clause, something is done and then the exception is thrown again (and caught again). Is throwing an exception from a catch block a safe operation? Would it matter if it was a new exception or the same/caught exception? Would it matter if we saved a reference to the exception (or the fact that we needed an exception and created a new exception), got out of the catch block, and then rethrow the exception?

It's hard to know what the exact situation is; I don't see why rethrowing or throwing from a catch block can be a problem, but I think a source code snippet can help. Can you show a small example code that does what you said?

@rsielken
Copy link
Author

rsielken commented Aug 26, 2020

I am working on a sample code that I can share that recreates the problem.

I am trying out the #12039 fix.

There were two -fwasm-exceptions errors with clang. I'll post them as two comments. Here is the first.

MAKE: Processing tmg/make.mak for WebAssembly platform with EMCC and GCC3+ compilers em++ -DEMSDK -DMOBILE -DOPENGL -DNO_DYNAMIC_LOADING -DNO_EM -g2 -fwasm-exceptions -s USE_PTHREADS=1 -s PROXY_TO_PTHREAD=1 -s PTHREAD_POOL_SIZE=4 -s FETCH=1 -s FETCH_SUPPORT_INDEXEDDB=0 -s FULL_ES2=1 -s USE_SDL=2 -s USE_SDL_TTF=2 -s USE_FREETYPE=2 -s USE_LIBPNG=1 -s INITIAL_MEMORY=268435456 -s ALLOW_MEMORY_GROWTH=1 -s FORCE_FILESYSTEM=1 -s ERROR_ON_UNDEFINED_SYMBOLS=0 -s OFFSCREEN_FRAMEBUFFER=1 -s DEMANGLE_SUPPORT=1 -c -I/usr/local/include/g++ -I/usr/include/g++ -O0 -w -m32 -march=pentium3 -DGCC3 -DGCC4 -fno-strict-aliasing -DGCC_LBLB_NOT_SUPPORTED -DUNIX -DLINUX -DLINUX86 -DW -DW32 -DEMSDK -DWSS_CLIENT -DWSS_CLIENT_NET -fcheck-new -DPTHREAD_KERNEL -D_REENTRANT -DUSE_THREADSAFE_INTERFACES -D_POSIX_THREAD_SAFE_FUNCTIONS -DHANDLE_IS_32BITS -DHAS_IOCP -DHAS_BOOL -DHAS_DLOPEN -DUSE_PTHREAD_INTERFACES -DLARGE64_FILES -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -fPIC -I../syssrv -I../lsi -I../idepm -I../ideim -I../iparse -I../unicod -I../tmg -I. -I/vhome/builder/sandboxes/myproject/unix/cham -I/vhome/builder/sandboxes/myproject/inc -o tmgapi.wasm.o -DSEG_LS_TMG tmgapi.cpp root:WARNING: USE_PTHREADS + ALLOW_MEMORY_GROWTH may run non-wasm code slowly, see https://github.com/WebAssembly/design/issues/1271 clang++: /b/s/w/ir/cache/builder/emscripten-releases/llvm-project/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp:995: bool (anonymous namespace)::WebAssemblyLowerEmscriptenEHSjLj::runSjLjOnFunction(llvm::Function &): Assertion!isa(&I)' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0. Program arguments: /local/emsdk/upstream/bin/clang++ -target wasm32-unknown-emscripten -D__EMSCRIPTEN_major__=2 -D__EMSCRIPTEN_minor__=0 -D__EMSCRIPTEN_tiny__=0 -D_LIBCPP_ABI_VERSION=2 -Dunix -D__unix -D__unix__ -Werror=implicit-function-declaration -Xclang -nostdsysteminc -Xclang -isystem/local/emsdk/upstream/emscripten/system/include/libcxx -Xclang -isystem/local/emsdk/upstream/emscripten/system/lib/libcxxabi/include -Xclang -isystem/local/emsdk/upstream/emscripten/system/include/compat -Xclang -isystem/local/emsdk/upstream/emscripten/system/include -Xclang -isystem/local/emsdk/upstream/emscripten/system/include/libc -Xclang -isystem/local/emsdk/upstream/emscripten/system/lib/libc/musl/arch/emscripten -Xclang -isystem/local/emsdk/upstream/emscripten/system/local/include -Xclang -isystem/local/emsdk/upstream/emscripten/system/include/SSE -Xclang -isystem/local/emsdk/upstream/emscripten/system/lib/compiler-rt/include -Xclang -isystem/local/emsdk/upstream/emscripten/system/lib/libunwind/include -Xclang -isystem/local/emsdk/upstream/emscripten/cache/wasm/include -DEMSCRIPTEN -D__EMSCRIPTEN_PTHREADS__=1 -DEMSDK -DMOBILE -DOPENGL -DNO_DYNAMIC_LOADING -DNO_EM -fwasm-exceptions -c -I/usr/local/include/g++ -I/usr/include/g++ -O0 -w -m32 -march=pentium3 -DGCC3 -DGCC4 -fno-strict-aliasing -DGCC_LBLB_NOT_SUPPORTED -DUNIX -DLINUX -DLINUX86 -DW -DW32 -DEMSDK -DWSS_CLIENT -DWSS_CLIENT_NET -fcheck-new -DPTHREAD_KERNEL -D_REENTRANT -DUSE_THREADSAFE_INTERFACES -D_POSIX_THREAD_SAFE_FUNCTIONS -DHANDLE_IS_32BITS -DHAS_IOCP -DHAS_BOOL -DHAS_DLOPEN -DUSE_PTHREAD_INTERFACES -DLARGE64_FILES -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -fPIC -I../syssrv -I../lsi -I../idepm -I../ideim -I../iparse -I../unicod -I../tmg -I. -I/vhome/builder/sandboxes/myproject/unix/cham -I/vhome/builder/sandboxes/myproject/inc -o tmgapi.wasm.o -DSEG_LS_TMG -pthread -pthread tmgapi.cpp -I/local/emsdk/upstream/emscripten/cache/wasm/include/freetype2/freetype -Xclang -isystem/local/emsdk/upstream/emscripten/cache/wasm/include/SDL2 -c -o tmgapi.wasm.o -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr

  1. parser at end of file
  2. Code generation
  3. Running pass 'WebAssembly Lower Emscripten Exceptions' on module 'tmgapi.cpp'.
    #0 0x00007f3e602bbd04 PrintStackTraceSignalHandler(void*) (/local/emsdk/upstream/bin/../lib/libLLVM-12git.so+0x839d04)
    GHC output: TypeError: Cannot read property 'tokens' #1 0x00007f3e602b98ee llvm::sys::RunSignalHandlers() (/local/emsdk/upstream/bin/../lib/libLLVM-12git.so+0x8378ee)
    JS crash: |ReferenceError: _dlopen is not defined|. #2 0x00007f3e602baead llvm::sys::CleanupOnSignal(unsigned long) (/local/emsdk/upstream/bin/../lib/libLLVM-12git.so+0x838ead)
    Problem with fixed size arrays in structs #3 0x00007f3e601e7e73 (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) (/local/emsdk/upstream/bin/../lib/libLLVM-12git.so+0x765e73)
    Lua string functions not fully supported #4 0x00007f3e601e7fac CrashRecoverySignalHandler(int) (/local/emsdk/upstream/bin/../lib/libLLVM-12git.so+0x765fac)
    Build lua from source in test runner #5 0x00007f3e5fa6e3c0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x153c0)
    emmaken.py needs to distinguish c from c++ #6 0x00007f3e5be1418b raise (/lib/x86_64-linux-gnu/libc.so.6+0x4618b)
    emmaken.py doesn't emulate ar correctly #7 0x00007f3e5bdf3859 abort (/lib/x86_64-linux-gnu/libc.so.6+0x25859)
    Python's print adds an extra newline #8 0x00007f3e5bdf3729 (/lib/x86_64-linux-gnu/libc.so.6+0x25729)
    OS X problems #9 0x00007f3e5be04f36 (/lib/x86_64-linux-gnu/libc.so.6+0x36f36)
    rpython fails on missing _write function #10 0x00007f3e6200411b (anonymous namespace)::WebAssemblyLowerEmscriptenEHSjLj::runSjLjOnFunction(llvm::Function&) (/local/emsdk/upstream/bin/../lib/libLLVM-12git.so+0x258211b)
    Windows: test_conststructs failure #11 0x00007f3e620014c8 (anonymous namespace)::WebAssemblyLowerEmscriptenEHSjLj::runOnModule(llvm::Module&) (/local/emsdk/upstream/bin/../lib/libLLVM-12git.so+0x257f4c8)
    Windows: Problem with passing arguments by value #12 0x00007f3e6041c4c7 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/local/emsdk/upstream/bin/../lib/libLLVM-12git.so+0x99a4c7)
    Error in generated javascript (from InChI library) #13 0x00007f3e5d9be200 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_deletellvm::raw_pwrite_stream >) (/local/emsdk/upstream/bin/../lib/libclang-cpp.so.12git+0x1894200)
    Translation failed with union #14 0x00007f3e5dcd1516 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/local/emsdk/upstream/bin/../lib/libclang-cpp.so.12git+0x1ba7516)
    Parser bug: Invalid token, cannot triage #15 0x00007f3e5ca3d823 clang::ParseAST(clang::Sema&, bool, bool) (/local/emsdk/upstream/bin/../lib/libclang-cpp.so.12git+0x913823)
    Error due to unhandled {} type #16 0x00007f3e5e4f0873 clang::FrontendAction::Execute() (/local/emsdk/upstream/bin/../lib/libclang-cpp.so.12git+0x23c6873)
    TypeError: Cannot read property 'tokens' of undefined #17 0x00007f3e5e489f43 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/local/emsdk/upstream/bin/../lib/libclang-cpp.so.12git+0x235ff43)
    Assertion failed: Failed to find the # of uses of var: $0 #18 0x00007f3e5e563972 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/local/emsdk/upstream/bin/../lib/libclang-cpp.so.12git+0x2439972)
    Add profiler option #19 0x0000000000410ecf cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/local/emsdk/upstream/bin/clang+++0x410ecf)
    Improve Code Readability and Aesthetics #20 0x000000000040f09c ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) (/local/emsdk/upstream/bin/clang+++0x40f09c)
    Investigate use of LLVM optimizations #21 0x00007f3e5e13eef2 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optionalllvm::StringRef >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, bool) const::$_1>(long) (/local/emsdk/upstream/bin/../lib/libclang-cpp.so.12git+0x2014ef2)
    Build Box2D and make Cool Demos #22 0x00007f3e601e7d87 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/local/emsdk/upstream/bin/../lib/libLLVM-12git.so+0x765d87)
    Python raw_input #23 0x00007f3e5e13e4ed clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optionalllvm::StringRef >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, bool) const (/local/emsdk/upstream/bin/../lib/libclang-cpp.so.12git+0x20144ed)
    test_zlib failed on python runner.py  #24 0x00007f3e5e10be6b clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&) const (/local/emsdk/upstream/bin/../lib/libclang-cpp.so.12git+0x1fe1e6b)
    Closure Compiler with the Python demo? #25 0x00007f3e5e10c257 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*> >&) const (/local/emsdk/upstream/bin/../lib/libclang-cpp.so.12git+0x1fe2257)
    Various fixes to IO stubs #26 0x00007f3e5e125748 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*> >&) (/local/emsdk/upstream/bin/../lib/libclang-cpp.so.12git+0x1ffb748)
    Python fixes: demo polish and module loading #27 0x000000000040ea43 main (/local/emsdk/upstream/bin/clang+++0x40ea43)
    Emscripting Python with optimizations+assertions produces invalid code #28 0x00007f3e5bdf50b3 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b3)
    JS Math.* equivalent cstdlib functions are not in FUNCTION_TABLE #29 0x000000000040be5a _start (/local/emsdk/upstream/bin/clang+++0x40be5a)
    clang-12: error: clang frontend command failed due to signal (use -v to see invocation)
    clang version 12.0.0 (/b/s/w/ir/cache/git/chromium.googlesource.com-external-jackfan.us.kg-llvm-llvm--project a3036b386383f1c1e9d32c2c8dba995087959da3)
    Target: wasm32-unknown-emscripten
    Thread model: posix
    InstalledDir: /local/emsdk/upstream/bin
    clang-12: note: diagnostic msg:

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-12: note: diagnostic msg: /tmp/tmgapi-48c79a.cpp
clang-12: note: diagnostic msg: /tmp/tmgapi-48c79a.sh
clang-12: note: diagnostic msg: `

@rsielken
Copy link
Author

Second....

MAKE: Processing make.mak for WebAssembly platform with EMCC and GCC3+ compilers em++ -DEMSDK -DMOBILE -DOPENGL -DNO_DYNAMIC_LOADING -DNO_EM -g2 -fwasm-exceptions -s USE_PTHREADS=1 -s PROXY_TO_PTHREAD=1 -s PTHREAD_POOL_SIZE=4 -s FETCH=1 -s FETCH_SUPPORT_INDEXEDDB=0 -s FULL_ES2=1 -s USE_SDL=2 -s USE_SDL_TTF=2 -s USE_FREETYPE=2 -s USE_LIBPNG=1 -s INITIAL_MEMORY=268435456 -s ALLOW_MEMORY_GROWTH=1 -s FORCE_FILESYSTEM=1 -s ERROR_ON_UNDEFINED_SYMBOLS=0 -s OFFSCREEN_FRAMEBUFFER=1 -s DEMANGLE_SUPPORT=1 -c -I/usr/local/include/g++ -I/usr/include/g++ -O0 -w -m32 -march=pentium3 -DGCC3 -DGCC4 -fno-strict-aliasing -DGCC_LBLB_NOT_SUPPORTED -DUNIX -DLINUX -DLINUX86 -DW -DW32 -DEMSDK -DWSS_CLIENT -DWSS_CLIENT_NET -fcheck-new -DPTHREAD_KERNEL -D_REENTRANT -DUSE_THREADSAFE_INTERFACES -D_POSIX_THREAD_SAFE_FUNCTIONS -DHANDLE_IS_32BITS -DHAS_IOCP -DHAS_BOOL -DHAS_DLOPEN -DUSE_PTHREAD_INTERFACES -DLARGE64_FILES -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -fPIC -I. -I/vhome/builder/sandboxes/myproject/unix/cham -I/vhome/builder/sandboxes/myproject/inc -o snaccvda/cpp-lib/src/sm_vdasnacc.wasm.o -DSEG_BSAFE_CPP_TEXT -I./ -I./include -I./snaccvda/cpp-lib/inc -I./include/cmapi -I./snaccvda -I./alg_libs/sm_myproject -I./alg_libs/sm_abc -DDEBUG -DNO_SCCS_ID -DSNACC_DEEP_COPY -DVDADER_RULES -D_WINDOWS -DSFL_BASE64 -DSNACCDLL_NONE -DSM_ABC_USED -DABCDLL_NONE snaccvda/cpp-lib/src/sm_vdasnacc.cpp root:WARNING: USE_PTHREADS + ALLOW_MEMORY_GROWTH may run non-wasm code slowly, see https://github.com/WebAssembly/design/issues/1271 clang++: /b/s/w/ir/cache/builder/emscripten-releases/llvm-project/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp:995: bool (anonymous namespace)::WebAssemblyLowerEmscriptenEHSjLj::runSjLjOnFunction(llvm::Function &): Assertion !isa(&I)' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0. Program arguments: /local/emsdk/upstream/bin/clang++ -target wasm32-unknown-emscripten -D__EMSCRIPTEN_major__=2 -D__EMSCRIPTEN_minor__=0 -D__EMSCRIPTEN_tiny__=0 -D_LIBCPP_ABI_VERSION=2 -Dunix -D__unix -D__unix__ -Werror=implicit-function-declaration -Xclang -nostdsysteminc -Xclang -isystem/local/emsdk/upstream/emscripten/system/include/libcxx -Xclang -isystem/local/emsdk/upstream/emscripten/system/lib/libcxxabi/include -Xclang -isystem/local/emsdk/upstream/emscripten/system/include/compat -Xclang -isystem/local/emsdk/upstream/emscripten/system/include -Xclang -isystem/local/emsdk/upstream/emscripten/system/include/libc -Xclang -isystem/local/emsdk/upstream/emscripten/system/lib/libc/musl/arch/emscripten -Xclang -isystem/local/emsdk/upstream/emscripten/system/local/include -Xclang -isystem/local/emsdk/upstream/emscripten/system/include/SSE -Xclang -isystem/local/emsdk/upstream/emscripten/system/lib/compiler-rt/include -Xclang -isystem/local/emsdk/upstream/emscripten/system/lib/libunwind/include -Xclang -isystem/local/emsdk/upstream/emscripten/cache/wasm/include -DEMSCRIPTEN -D__EMSCRIPTEN_PTHREADS__=1 -DEMSDK -DMOBILE -DOPENGL -DNO_DYNAMIC_LOADING -DNO_EM -fwasm-exceptions -c -I/usr/local/include/g++ -I/usr/include/g++ -O0 -w -m32 -march=pentium3 -DGCC3 -DGCC4 -fno-strict-aliasing -DGCC_LBLB_NOT_SUPPORTED -DUNIX -DLINUX -DLINUX86 -DW -DW32 -DEMSDK -DWSS_CLIENT -DWSS_CLIENT_NET -fcheck-new -DPTHREAD_KERNEL -D_REENTRANT -DUSE_THREADSAFE_INTERFACES -D_POSIX_THREAD_SAFE_FUNCTIONS -DHANDLE_IS_32BITS -DHAS_IOCP -DHAS_BOOL -DHAS_DLOPEN -DUSE_PTHREAD_INTERFACES -DLARGE64_FILES -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -fPIC -I. -I/vhome/builder/sandboxes/myproject/unix/cham -I/vhome/builder/sandboxes/myproject/inc -o snaccvda/cpp-lib/src/sm_vdasnacc.wasm.o -DSEG_BSAFE_CPP_TEXT -I./ -I./include -I./snaccvda/cpp-lib/inc -I./include/cmapi -I./snaccvda -I./alg_libs/sm_myproject -I./alg_libs/sm_rsa -DDEBUG -DNO_SCCS_ID -DSNACC_DEEP_COPY -DVDADER_RULES -D_WINDOWS -DSFL_BASE64 -DSNACCDLL_NONE -DSM_ABC_USED -DABCDLL_NONE -pthread -pthread snaccvda/cpp-lib/src/sm_vdasnacc.cpp -I/local/emsdk/upstream/emscripten/cache/wasm/include/freetype2/freetype -Xclang -isystem/local/emsdk/upstream/emscripten/cache/wasm/include/SDL2 -c -o snaccvda/cpp-lib/src/sm_vdasnacc.wasm.o -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr

  1. parser at end of file
  2. Code generation
  3. Running pass 'WebAssembly Lower Emscripten Exceptions' on module 'snaccvda/cpp-lib/src/sm_vdasnacc.cpp'.
    #0 0x00007f7a75e98d04 PrintStackTraceSignalHandler(void*) (/local/emsdk/upstream/bin/../lib/libLLVM-12git.so+0x839d04)
    GHC output: TypeError: Cannot read property 'tokens' #1 0x00007f7a75e968ee llvm::sys::RunSignalHandlers() (/local/emsdk/upstream/bin/../lib/libLLVM-12git.so+0x8378ee)
    JS crash: |ReferenceError: _dlopen is not defined|. #2 0x00007f7a75e97ead llvm::sys::CleanupOnSignal(unsigned long) (/local/emsdk/upstream/bin/../lib/libLLVM-12git.so+0x838ead)
    Problem with fixed size arrays in structs #3 0x00007f7a75dc4e73 (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) (/local/emsdk/upstream/bin/../lib/libLLVM-12git.so+0x765e73)
    Lua string functions not fully supported #4 0x00007f7a75dc4fac CrashRecoverySignalHandler(int) (/local/emsdk/upstream/bin/../lib/libLLVM-12git.so+0x765fac)
    Build lua from source in test runner #5 0x00007f7a7564b3c0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x153c0)
    emmaken.py needs to distinguish c from c++ #6 0x00007f7a719f118b raise (/lib/x86_64-linux-gnu/libc.so.6+0x4618b)
    emmaken.py doesn't emulate ar correctly #7 0x00007f7a719d0859 abort (/lib/x86_64-linux-gnu/libc.so.6+0x25859)
    Python's print adds an extra newline #8 0x00007f7a719d0729 (/lib/x86_64-linux-gnu/libc.so.6+0x25729)
    OS X problems #9 0x00007f7a719e1f36 (/lib/x86_64-linux-gnu/libc.so.6+0x36f36)
    rpython fails on missing _write function #10 0x00007f7a77be111b (anonymous namespace)::WebAssemblyLowerEmscriptenEHSjLj::runSjLjOnFunction(llvm::Function&) (/local/emsdk/upstream/bin/../lib/libLLVM-12git.so+0x258211b)
    Windows: test_conststructs failure #11 0x00007f7a77bde4c8 (anonymous namespace)::WebAssemblyLowerEmscriptenEHSjLj::runOnModule(llvm::Module&) (/local/emsdk/upstream/bin/../lib/libLLVM-12git.so+0x257f4c8)
    Windows: Problem with passing arguments by value #12 0x00007f7a75ff94c7 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/local/emsdk/upstream/bin/../lib/libLLVM-12git.so+0x99a4c7)
    Error in generated javascript (from InChI library) #13 0x00007f7a7359b200 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_deletellvm::raw_pwrite_stream >) (/local/emsdk/upstream/bin/../lib/libclang-cpp.so.12git+0x1894200)
    Translation failed with union #14 0x00007f7a738ae516 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/local/emsdk/upstream/bin/../lib/libclang-cpp.so.12git+0x1ba7516)
    Parser bug: Invalid token, cannot triage #15 0x00007f7a7261a823 clang::ParseAST(clang::Sema&, bool, bool) (/local/emsdk/upstream/bin/../lib/libclang-cpp.so.12git+0x913823)
    Error due to unhandled {} type #16 0x00007f7a740cd873 clang::FrontendAction::Execute() (/local/emsdk/upstream/bin/../lib/libclang-cpp.so.12git+0x23c6873)
    TypeError: Cannot read property 'tokens' of undefined #17 0x00007f7a74066f43 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/local/emsdk/upstream/bin/../lib/libclang-cpp.so.12git+0x235ff43)
    Assertion failed: Failed to find the # of uses of var: $0 #18 0x00007f7a74140972 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/local/emsdk/upstream/bin/../lib/libclang-cpp.so.12git+0x2439972)
    Add profiler option #19 0x0000000000410ecf cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/local/emsdk/upstream/bin/clang+++0x410ecf)
    Improve Code Readability and Aesthetics #20 0x000000000040f09c ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) (/local/emsdk/upstream/bin/clang+++0x40f09c)
    Investigate use of LLVM optimizations #21 0x00007f7a73d1bef2 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optionalllvm::StringRef >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, bool) const::$_1>(long) (/local/emsdk/upstream/bin/../lib/libclang-cpp.so.12git+0x2014ef2)
    Build Box2D and make Cool Demos #22 0x00007f7a75dc4d87 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/local/emsdk/upstream/bin/../lib/libLLVM-12git.so+0x765d87)
    Python raw_input #23 0x00007f7a73d1b4ed clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optionalllvm::StringRef >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, bool) const (/local/emsdk/upstream/bin/../lib/libclang-cpp.so.12git+0x20144ed)
    test_zlib failed on python runner.py  #24 0x00007f7a73ce8e6b clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&) const (/local/emsdk/upstream/bin/../lib/libclang-cpp.so.12git+0x1fe1e6b)
    Closure Compiler with the Python demo? #25 0x00007f7a73ce9257 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*> >&) const (/local/emsdk/upstream/bin/../lib/libclang-cpp.so.12git+0x1fe2257)
    Various fixes to IO stubs #26 0x00007f7a73d02748 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*> >&) (/local/emsdk/upstream/bin/../lib/libclang-cpp.so.12git+0x1ffb748)
    Python fixes: demo polish and module loading #27 0x000000000040ea43 main (/local/emsdk/upstream/bin/clang+++0x40ea43)
    Emscripting Python with optimizations+assertions produces invalid code #28 0x00007f7a719d20b3 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b3)
    JS Math.* equivalent cstdlib functions are not in FUNCTION_TABLE #29 0x000000000040be5a _start (/local/emsdk/upstream/bin/clang+++0x40be5a)
    clang-12: error: clang frontend command failed due to signal (use -v to see invocation)
    clang version 12.0.0 (/b/s/w/ir/cache/git/chromium.googlesource.com-external-jackfan.us.kg-llvm-llvm--project a3036b386383f1c1e9d32c2c8dba995087959da3)
    Target: wasm32-unknown-emscripten
    Thread model: posix
    InstalledDir: /local/emsdk/upstream/bin
    clang-12: note: diagnostic msg:

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-12: note: diagnostic msg: /tmp/sm_vdasnacc-0ccb83.cpp
clang-12: note: diagnostic msg: /tmp/sm_vdasnacc-0ccb83.sh
clang-12: note: diagnostic msg: `

@rsielken
Copy link
Author

I tested the #12039 fix by replacing ./upstream/emscripten/src/library_exceptions.js, rebuilding the project, confirming my primary.js file has the changed code inserted into it, and running the program. Alas, it has the same errors as before.

While I work on a test program for this, I did want to mention one other detail in case it matters. The primary pthread is the one who calls to create the other pthreads - they are not all created from main at startup. Perhaps there is some inherent hierarchy based on who creates the pthreads? For example, pthread A creates pthreads B and C; pthread A continues running and throws the exception which is given to pthread B (the first pthread that pthread A created) instead of pthread A.

@aheejin
Copy link
Member

aheejin commented Aug 26, 2020

Both clang errors seem to be caused by using the new wasm EH (-fwasm-exceptions) and Emscripten SjLj (-mllvm -enable-emscripten-sjlj) together, which is not currently supported. Emscripten SjLj (setjmp-longjmp handling) is enabled by default, so it is hard to do get around. One fix can be replacing setjmp-longjmp with try-catch. But as I said, the new EH itself is not stable at the moment, so I wouldn't recommend bending over backwards to make it work. The best bet here still seems to be Emscripten EH, as long as the recent patches @kripken pointed solve your problems.

And about the test case, if you already have it I'd appreciate it, but if not, please don't spend time on creating it; I think I know what's going on.

@kripken
Copy link
Member

kripken commented Aug 26, 2020

@rsielken

Perhaps there is some inherent hierarchy based on who creates the pthreads?

I don't think that could matter to exceptions (but I may be missing something).

Meanwhile, btw, talking to @sbc100 we realized that #7203 and followups to that file may have regressed this. When they were in JS, they were essentially thread-local. However when in C they are linked once so all threads use the same location.

Those should probably be marked _Thread_local. That would require some changes in system_libs.py to build that library as multithreaded when necessary - it's in compiler-rt atm so maybe it could just be moved out to an already multithreaded-aware one, what do you think @sbc100 ?

@kripken
Copy link
Member

kripken commented Aug 26, 2020

@rsielken Ok, #12056 is up with a proposed fix for that issue, and CI looks green. Please test that + the other PR together, and hopefully that helps.

@rsielken
Copy link
Author

For the -fsanitize=undefined, the build gets this right after startup. We had tried the sanitizers before (at least with EMSDK 1.39.20 but I think we tried them with 1.39.17 too) and hit similar errors.

Uncaught InternalError: too much recursion ___sys_write https://localhost:8080/130dcf4f36163aa258e9480f5e33b8ca4a9f12f5.js:8543 130dcf4f36163aa258e9480f5e33b8ca4a9f12f5.js:8543:36 ___sys_write https://localhost:8080/130dcf4f36163aa258e9480f5e33b8ca4a9f12f5.js:8543 __sanitizer::internal_write(int, void const*, unsigned long) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129054882 __sanitizer::IsAccessibleMemoryRange(unsigned long, unsigned long) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129041273 __ubsan::checkDynamicType(void*, void*, unsigned long) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129028651 HandleDynamicTypeCacheMiss(__ubsan::DynamicTypeCacheMissData*, unsigned long, unsigned long, __ubsan::ReportOptions) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029501 __ubsan_handle_dynamic_type_cache_miss https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029456 std::type_info::operator==(std::type_info const&) const https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:69223022 is_equal(std::type_info const*, std::type_info const*, bool) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979144 __dynamic_cast https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979456 __ubsan::checkDynamicType(void*, void*, unsigned long) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129028709 HandleDynamicTypeCacheMiss(__ubsan::DynamicTypeCacheMissData*, unsigned long, unsigned long, __ubsan::ReportOptions) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029501 __ubsan_handle_dynamic_type_cache_miss https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029456 std::type_info::operator==(std::type_info const&) const https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:69223022 is_equal(std::type_info const*, std::type_info const*, bool) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979144 __dynamic_cast https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979456 __ubsan::checkDynamicType(void*, void*, unsigned long) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129028709 HandleDynamicTypeCacheMiss(__ubsan::DynamicTypeCacheMissData*, unsigned long, unsigned long, __ubsan::ReportOptions) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029501 __ubsan_handle_dynamic_type_cache_miss https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029456 std::type_info::operator==(std::type_info const&) const https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:69223022 is_equal(std::type_info const*, std::type_info const*, bool) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979144 __dynamic_cast https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979456 __ubsan::checkDynamicType(void*, void*, unsigned long) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129028709 HandleDynamicTypeCacheMiss(__ubsan::DynamicTypeCacheMissData*, unsigned long, unsigned long, __ubsan::ReportOptions) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029501 __ubsan_handle_dynamic_type_cache_miss https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029456 std::type_info::operator==(std::type_info const&) const https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:69223022 is_equal(std::type_info const*, std::type_info const*, bool) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979144 __dynamic_cast https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979456 __ubsan::checkDynamicType(void*, void*, unsigned long) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129028709 HandleDynamicTypeCacheMiss(__ubsan::DynamicTypeCacheMissData*, unsigned long, unsigned long, __ubsan::ReportOptions) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029501 __ubsan_handle_dynamic_type_cache_miss https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029456 std::type_info::operator==(std::type_info const&) const https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:69223022 is_equal(std::type_info const*, std::type_info const*, bool) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979144 __dynamic_cast https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979456 __ubsan::checkDynamicType(void*, void*, unsigned long) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129028709 HandleDynamicTypeCacheMiss(__ubsan::DynamicTypeCacheMissData*, unsigned long, unsigned long, __ubsan::ReportOptions) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029501 __ubsan_handle_dynamic_type_cache_miss https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029456 std::type_info::operator==(std::type_info const&) const https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:69223022 is_equal(std::type_info const*, std::type_info const*, bool) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979144 __dynamic_cast https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979456 __ubsan::checkDynamicType(void*, void*, unsigned long) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129028709 HandleDynamicTypeCacheMiss(__ubsan::DynamicTypeCacheMissData*, unsigned long, unsigned long, __ubsan::ReportOptions) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029501 __ubsan_handle_dynamic_type_cache_miss https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029456 std::type_info::operator==(std::type_info const&) const https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:69223022 is_equal(std::type_info const*, std::type_info const*, bool) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979144 __dynamic_cast https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979456 __ubsan::checkDynamicType(void*, void*, unsigned long) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129028709 HandleDynamicTypeCacheMiss(__ubsan::DynamicTypeCacheMissData*, unsigned long, unsigned long, __ubsan::ReportOptions) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029501 __ubsan_handle_dynamic_type_cache_miss https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029456 std::type_info::operator==(std::type_info const&) const https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:69223022 is_equal(std::type_info const*, std::type_info const*, bool) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979144 __dynamic_cast https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979456 __ubsan::checkDynamicType(void*, void*, unsigned long) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129028709 HandleDynamicTypeCacheMiss(__ubsan::DynamicTypeCacheMissData*, unsigned long, unsigned long, __ubsan::ReportOptions) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029501 __ubsan_handle_dynamic_type_cache_miss https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029456 std::type_info::operator==(std::type_info const&) const https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:69223022 is_equal(std::type_info const*, std::type_info const*, bool) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979144 __dynamic_cast https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979456 __ubsan::checkDynamicType(void*, void*, unsigned long) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129028709 HandleDynamicTypeCacheMiss(__ubsan::DynamicTypeCacheMissData*, unsigned long, unsigned long, __ubsan::ReportOptions) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029501 __ubsan_handle_dynamic_type_cache_miss https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029456 std::type_info::operator==(std::type_info const&) const https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:69223022 is_equal(std::type_info const*, std::type_info const*, bool) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979144 __dynamic_cast https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979456 __ubsan::checkDynamicType(void*, void*, unsigned long) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129028709 HandleDynamicTypeCacheMiss(__ubsan::DynamicTypeCacheMissData*, unsigned long, unsigned long, __ubsan::ReportOptions) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029501 __ubsan_handle_dynamic_type_cache_miss https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029456 std::type_info::operator==(std::type_info const&) const https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:69223022 is_equal(std::type_info const*, std::type_info const*, bool) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979144 __dynamic_cast https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979456 __ubsan::checkDynamicType(void*, void*, unsigned long) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129028709 HandleDynamicTypeCacheMiss(__ubsan::DynamicTypeCacheMissData*, unsigned long, unsigned long, __ubsan::ReportOptions) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029501 __ubsan_handle_dynamic_type_cache_miss https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029456 std::type_info::operator==(std::type_info const&) const https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:69223022 is_equal(std::type_info const*, std::type_info const*, bool) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979144 __dynamic_cast https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979456 __ubsan::checkDynamicType(void*, void*, unsigned long) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129028709 HandleDynamicTypeCacheMiss(__ubsan::DynamicTypeCacheMissData*, unsigned long, unsigned long, __ubsan::ReportOptions) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029501 __ubsan_handle_dynamic_type_cache_miss https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029456 std::type_info::operator==(std::type_info const&) const https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:69223022 is_equal(std::type_info const*, std::type_info const*, bool) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979144 __dynamic_cast https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979456 __ubsan::checkDynamicType(void*, void*, unsigned long) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129028709 HandleDynamicTypeCacheMiss(__ubsan::DynamicTypeCacheMissData*, unsigned long, unsigned long, __ubsan::ReportOptions) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029501 __ubsan_handle_dynamic_type_cache_miss https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029456 std::type_info::operator==(std::type_info const&) const https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:69223022 is_equal(std::type_info const*, std::type_info const*, bool) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979144 __dynamic_cast https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979456 __ubsan::checkDynamicType(void*, void*, unsigned long) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129028709 HandleDynamicTypeCacheMiss(__ubsan::DynamicTypeCacheMissData*, unsigned long, unsigned long, __ubsan::ReportOptions) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029501 __ubsan_handle_dynamic_type_cache_miss https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029456 std::type_info::operator==(std::type_info const&) const https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:69223022 is_equal(std::type_info const*, std::type_info const*, bool) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979144 __dynamic_cast https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979456 __ubsan::checkDynamicType(void*, void*, unsigned long) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129028709 HandleDynamicTypeCacheMiss(__ubsan::DynamicTypeCacheMissData*, unsigned long, unsigned long, __ubsan::ReportOptions) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029501 __ubsan_handle_dynamic_type_cache_miss https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029456 std::type_info::operator==(std::type_info const&) const https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:69223022 is_equal(std::type_info const*, std::type_info const*, bool) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979144 __dynamic_cast https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979456 __ubsan::checkDynamicType(void*, void*, unsigned long) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129028709 HandleDynamicTypeCacheMiss(__ubsan::DynamicTypeCacheMissData*, unsigned long, unsigned long, __ubsan::ReportOptions) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029501 __ubsan_handle_dynamic_type_cache_miss https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029456 std::type_info::operator==(std::type_info const&) const https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:69223022 is_equal(std::type_info const*, std::type_info const*, bool) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979144 __dynamic_cast https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979456 __ubsan::checkDynamicType(void*, void*, unsigned long) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129028709 HandleDynamicTypeCacheMiss(__ubsan::DynamicTypeCacheMissData*, unsigned long, unsigned long, __ubsan::ReportOptions) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029501 __ubsan_handle_dynamic_type_cache_miss https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029456 std::type_info::operator==(std::type_info const&) const https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:69223022 is_equal(std::type_info const*, std::type_info const*, bool) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979144 __dynamic_cast https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979456 __ubsan::checkDynamicType(void*, void*, unsigned long) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129028709 HandleDynamicTypeCacheMiss(__ubsan::DynamicTypeCacheMissData*, unsigned long, unsigned long, __ubsan::ReportOptions) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029501 __ubsan_handle_dynamic_type_cache_miss https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029456 std::type_info::operator==(std::type_info const&) const https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:69223022 is_equal(std::type_info const*, std::type_info const*, bool) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979144 __dynamic_cast https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979456 __ubsan::checkDynamicType(void*, void*, unsigned long) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129028709 HandleDynamicTypeCacheMiss(__ubsan::DynamicTypeCacheMissData*, unsigned long, unsigned long, __ubsan::ReportOptions) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029501 __ubsan_handle_dynamic_type_cache_miss https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029456 std::type_info::operator==(std::type_info const&) const https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:69223022 is_equal(std::type_info const*, std::type_info const*, bool) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979144 __dynamic_cast https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979456 __ubsan::checkDynamicType(void*, void*, unsigned long) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129028709 HandleDynamicTypeCacheMiss(__ubsan::DynamicTypeCacheMissData*, unsigned long, unsigned long, __ubsan::ReportOptions) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029501 __ubsan_handle_dynamic_type_cache_miss https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:129029456 std::type_info::operator==(std::type_info const&) const https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:69223022 is_equal(std::type_info const*, std::type_info const*, bool) https://localhost:8080/ed51d4b873a9b4996945d2925851d48345b50302.wasm:128979144

@rsielken
Copy link
Author

rsielken commented Aug 27, 2020

With the files replaced/patched from #12039 and #12056 (except the /test files - I didn't both to patch them), I still see the exception on the wrong thread.

As a sanity check, I grepped all of our build outputs for "exception_builtin" and didn't get any hits (I do get the expected hits when grepping for "exceptionThrowBuf"). I don't know that any of the changes should have ended up in the outputs versus just being part of the build process, but I wanted to mention it in case it pointed to something be missing in my patching and building.

I am still working on coming up with a test case.

@rsielken
Copy link
Author

rsielken commented Sep 3, 2020

Just an update - I have written the test case and played with all sorts of things to recreate the error and nothing recreates the error so far. As I have gone back to our real code to look for hints on things to do differently (complexities to add to the test case), I am disabling more and more of our real code to try to narrow down what I need to simulate/recreate in the test case. So far, no luck on recreating the issue in the test case. But, the bright side of that is the basic threading and exception is probably working and there is more likely something else in our real code that might be corrupting memory or some other side effect where the symptom is these exceptions on the seemingly incorrect threads. I am still working on it.

@rsielken
Copy link
Author

I was finally able to create a test case that reproduces the issue. Here is the test case which includes comments on the errors:

// This test case is associated with https://github.com/emscripten-core/emscripten/issues/12035.

// When running this test case, we have seen the following errors:
//
// 		worker.js onmessage() captured an uncaught exception: 1668509029
// 		pthread sent an error! undefined:undefined: undefined
// 		worker exited - TODO: update the worker queue?
//
// 		worker.js onmessage() captured an uncaught exception: RuntimeError: unreachable
// 		pthread sent an error! undefined:undefined: unreachable
// 		worker exited - TODO: update the worker queue?
//
// 		worker.js onmessage() captured an uncaught exception: RangeError: Invalid atomic access index
// 		pthread sent an error! undefined:undefined: Invalid atomic access index
// 		worker exited - TODO: update the worker queue?
//
// At times, we get multiple of these on a single run, but some times it is only one and every once in
// a while (not very often though) we get zero errors and the test case passes which indicates there is
// probably some amount of timing involved.

// cd /local/emsdk/upstream/emscripten/tests/pthread

// To run WITHOUT the UI:
// emcc -s USE_PTHREADS=1 -s PROXY_TO_PTHREAD=1 -s PTHREAD_POOL_SIZE=12 -s INITIAL_MEMORY=33554432 -s ALLOW_MEMORY_GROWTH=1 -s DISABLE_EXCEPTION_CATCHING=0 test_pthread_12035_timers.cpp -o test_pthread_12035_timers.js
// node --experimental-wasm-threads --trace-uncaught test_pthread_12035_timers.js

// To run WITH the UI:
// emcc -s USE_PTHREADS=1 -s PROXY_TO_PTHREAD=1 -s PTHREAD_POOL_SIZE=12 -s INITIAL_MEMORY=33554432 -s ALLOW_MEMORY_GROWTH=1 -s DISABLE_EXCEPTION_CATCHING=0 -s EXIT_RUNTIME=0 --emrun test_pthread_12035_timers.cpp -o test_pthread_12035_timers.html
// emrun --port 6933 --hostname 127.0.0.1 --no_browser .
// http://127.0.0.1:6933/test_pthread_12035_timers.html

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <emscripten.h>
#include <emscripten/threading.h>
#include <list>
#include <sys/time.h>
#include <unistd.h>
#include <semaphore.h>

typedef unsigned short WORD;
typedef WORD STATUS; /* STATUS = Status code (ERR_xxx) */
typedef unsigned long DWORD;
#define MAXDWORD ((DWORD) 0xffffffff)
#define MAXINT ((int) (((unsigned int) -1) >> 1))
#define FALSE 0
#define TRUE !FALSE

typedef unsigned int UINT;
typedef int (*FARPROC)();
typedef FARPROC TIMERPROC;

pthread_mutex_t g_timerListMutex;

// It appears that the TimerLockList management of the mutex (via the constructor and destructor)
// at least partially causes the errors.  By undefining USE_TIMER_LIST_LOCK, you can run with
// code that manages the mutex directly and doesn't appear to cause the errors.
#define USE_TIMER_LIST_LOCK 1

int thread1Finished = 0;
int thread2RunningCount = 0;
int thread2ShouldFinish = 0;

int isPrintLoopIndicatorThread1 = 0;
int isPrintLoopIndicatorThread2 = 0;

struct args {
	int id;
	int sleepTime;
};

DWORD GetSystemTimer(void) {
	struct timeval t;
	gettimeofday(&t, NULL); /* the use of struct timezone *tp is obsolete. */
	return ((t.tv_sec * 1000) + (t.tv_usec / 1000));
}

class GLTimerEntry {
public:
	GLTimerEntry(UINT uIDEvent, UINT uElapse, TIMERPROC lpTimerFunc) :
			m_id(uIDEvent), m_interval(uElapse), m_pFunc(lpTimerFunc), m_timer(
					0), m_refcnt(1) {
	}
	~GLTimerEntry() {
		m_refcnt--;
	}
	inline UINT getId() {
		return m_id;
	}
	inline UINT getInterval() {
		return m_interval;
	}
	inline TIMERPROC getFunc() {
		return m_pFunc;
	}
	inline void setInterval(UINT interval) {
		m_interval = interval;
	}

	inline void resetTimer() {
		m_timer = GetSystemTimer() + m_interval;
	}
	inline int expiresIn(DWORD now) {
		if (now >= m_timer)
			return 0 - (int) (now - m_timer);
		DWORD diff = m_timer - now;
		if (diff < MAXINT)
			return diff;
		// handle timer wrap here
		diff = MAXDWORD - m_timer + now;
		return (int) 0 - (int) diff;
	}

private:
	UINT m_id;
	UINT m_interval;
	TIMERPROC m_pFunc;
	DWORD m_timer;
	int m_refcnt;
};

class GLTimer {
public:
	static GLTimer* getInstance();
	UINT setTimer(UINT nIDEvent, UINT uElapse, TIMERPROC lpTimerFunc);
	void threadProc();

private:
	GLTimer();
	~GLTimer();
	static GLTimer *s_instance;
	std::list<GLTimerEntry*> m_timers;
	sem_t m_sem;
	void updateTimer();
	int dispatchTimers();
};

GLTimer *GLTimer::s_instance = NULL;

UINT SetTimer(UINT uIDEvent, UINT uElapse, TIMERPROC lpTimerFunc) {
	GLTimer *instance = GLTimer::getInstance();
	if (!instance)
		return FALSE; // Error
	return instance->setTimer(uIDEvent, uElapse, lpTimerFunc);
}

void* TimerThreadProc(void *param) {
	((GLTimer*) param)->threadProc();
	return NULL;
}

GLTimer* GLTimer::getInstance() {
	if (!s_instance) {
		s_instance = new GLTimer();
		if (s_instance) {
			printf("[0x%x] %s#%d: Calling pthread_create() for GLTimer\n",
					(int) pthread_self(), __func__,
					__LINE__);
			pthread_t thread;
			pthread_attr_t attr;
			pthread_attr_init(&attr);
			pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
			pthread_attr_setstacksize(&attr, 2 * 1024 * 1024);
			pthread_create(&thread, &attr, TimerThreadProc, (void*) s_instance);
			pthread_attr_destroy(&attr);
		}
	}
	return s_instance;
}

struct TimerListLock {
	TimerListLock() :
			owner_(false) {
		Lock();
	}
	~TimerListLock() {
		Unlock();
	}
	void Lock() {
		pthread_mutex_lock(&g_timerListMutex);
		owner_ = true;
	}
	void Unlock() {
		if (owner_) {
			owner_ = false;
			pthread_mutex_unlock(&g_timerListMutex);
		}
	}
private:
	bool owner_;
};

GLTimer::GLTimer() {
	sem_init(&m_sem, 0, 0);
}

GLTimer::~GLTimer() {
	sem_destroy(&m_sem);
}

UINT GLTimer::setTimer(UINT uIDEvent, UINT uElapse, TIMERPROC lpTimerFunc) {
#ifdef USE_TIMER_LIST_LOCK
		TimerListLock lock;
#endif
	GLTimerEntry *pExisting = NULL;
	if (uIDEvent) {
#ifndef USE_TIMER_LIST_LOCK
			pthread_mutex_lock(&g_timerListMutex);
#endif
		for (std::list<GLTimerEntry*>::iterator it = m_timers.begin();
				it != m_timers.end(); ++it) {
			GLTimerEntry *pNext = *it;
			if (pNext->getId() == uIDEvent) {
				if (lpTimerFunc && pNext->getFunc() == lpTimerFunc) {
					pExisting = pNext;
					break;
				}
			}
		}
	}

	if (!pExisting) {
		pExisting = new GLTimerEntry(uIDEvent, uElapse, lpTimerFunc);
		if (!pExisting) {
#ifndef USE_TIMER_LIST_LOCK
				pthread_mutex_unlock(&g_timerListMutex);
#endif
				return FALSE; // error
		}
		m_timers.push_front(pExisting);
	} else {
		pExisting->setInterval(uElapse);
	}
	pExisting->resetTimer();
	updateTimer();
#ifndef USE_TIMER_LIST_LOCK
		pthread_mutex_unlock(&g_timerListMutex);
#endif
	return uIDEvent;
}

void GLTimer::updateTimer() {
	sem_post(&m_sem);
}

int GLTimer::dispatchTimers() {
#ifdef USE_TIMER_LIST_LOCK
		TimerListLock lock;
#endif
		int minSleep = MAXINT;
	DWORD now = GetSystemTimer();
#ifndef USE_TIMER_LIST_LOCK
		pthread_mutex_lock(&g_timerListMutex);
#endif
	for (std::list<GLTimerEntry*>::iterator it = m_timers.begin();
			it != m_timers.end(); ++it) {
		GLTimerEntry *pNext = *it;
		int expires = pNext->expiresIn(now);
		if (expires <= 0) {
			UINT id = pNext->getId();
			TIMERPROC proc = pNext->getFunc();
			pNext->resetTimer();
			// don't hold the lock while we are dispatching messages
#ifdef USE_TIMER_LIST_LOCK
				lock.Unlock(); // don't hold the lock while we are dispatching messages
#else
				pthread_mutex_unlock(&g_timerListMutex);
#endif
			// Normally, this would call come function to do some real work.  For this test
			// case, it just indicates that it has run by printing out a message.
			// Timers are set to run often, so the following is verbose and disabled by default
			if (FALSE && (0 == thread1Finished)) {
				printf("[0x%x] %s#%d: timer %d\n", (int) pthread_self(),
						__func__, __LINE__, id);
			}
			return expires;
		}
		if (minSleep > expires)
			minSleep = expires;
	}
#ifndef USE_TIMER_LIST_LOCK
		pthread_mutex_unlock(&g_timerListMutex);
#endif
		return minSleep;
}

void GLTimer::threadProc() {
	while (1) {
		int sleep = 5000; // default sleep time
		int minSleep;
		do {
			minSleep = dispatchTimers();
		} while (minSleep <= 0);
		if (minSleep < sleep)
			sleep = minSleep;

		struct timespec ts;
		clock_gettime(CLOCK_REALTIME, &ts);
		ts.tv_sec += sleep / 1000;
		ts.tv_nsec += (sleep % 1000) * 1000000;
		sem_timedwait(&m_sem, &ts);
	}
}

class ComputeException {
	/* This is a error that is not supposed to be useable as
	 a value during a computation.
	 Examples include failed I/O operations and type mismatch errors */
public:
	inline ComputeException(const STATUS err);
	inline STATUS Err() const;
private:
	STATUS err;
};

inline ComputeException::ComputeException(const STATUS err_) {
	err = err_;
}

inline STATUS ComputeException::Err() const {
	return err;
}

void* thread2_func(void *vptr_args) {
	int threadID = ((struct args*) vptr_args)->id;
	int sleep = ((struct args*) vptr_args)->sleepTime;
	free(vptr_args);
	printf("[0x%x %d] %s#%d: ENTERING sleep=%d\n", (int) pthread_self(),
			threadID, __func__,
			__LINE__, sleep);
	thread2RunningCount++;

	sem_t m_sem;
	while (0 == thread2ShouldFinish) {
		if (isPrintLoopIndicatorThread2) {
			printf("[0x%x %d] %s#%d: TICK\n", (int) pthread_self(), threadID,
					__func__,
					__LINE__);
		}

		struct timespec ts;
		clock_gettime(CLOCK_REALTIME, &ts);
		ts.tv_sec += sleep / 1000;
		ts.tv_nsec += (sleep % 1000) * 1000000;
		sem_timedwait(&m_sem, &ts);
	}

	thread2RunningCount--;
	printf("[0x%x %d] %s#%d: EXITING\n", (int) pthread_self(), threadID,
			__func__, __LINE__);
	return NULL;
}

void* thread_func(void *vptr_args) {
	printf("[0x%x] %s#%d: ENTERING\n", (int) pthread_self(), __func__,
	__LINE__);

	if (!SetTimer(100, 3, NULL)) {
		printf("[0x%x] %s#%d: SetTimer failed\n", (int) pthread_self(),
				__func__, __LINE__);
	}
	if (!SetTimer(101, 7, NULL)) {
		printf("[0x%x] %s#%d: SetTimer failed\n", (int) pthread_self(),
				__func__, __LINE__);
	}

	// Does throwing and catching exceptions before the other thread creations make a difference?  No.
	int count = 20;
	while (0 < count) {

		usleep(2000); // sleep 2 ms

		try {
			try {
				// Periocially, do NOT throw the exception
				if (count % 9 != 0) {
					throw ComputeException(1000 + count);
				}
			} catch (ComputeException &e) {
				if (isPrintLoopIndicatorThread1) {
					printf("[0x%x] %s#%d: Caught INNER %d\n",
							(int) pthread_self(), __func__, __LINE__, e.Err());
				}
				// Periodically, throw an exception (e or new)
				if (count % 3 == 0) {
					if (count % 2 == 0) {
						throw e;
					} else {
						throw ComputeException(4000 + count);
					}
				}
			}
		} catch (ComputeException &e) {
			if (isPrintLoopIndicatorThread1) {
				printf("[0x%x] %s#%d: Caught OUTTER %d\n", (int) pthread_self(),
						__func__, __LINE__, e.Err());
			}
		}

		count--;
	}

	// Does the number of other threads matter?  Not yet.
	int thread2Count = 10;
	for (int i = 0; i < thread2Count; i++) {
		pthread_t thread;
		pthread_attr_t attr;
		pthread_attr_init(&attr);
		pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
		pthread_attr_setstacksize(&attr, 1 * 1024 * 1024);
		struct args *arguments = (struct args*) malloc(sizeof(struct args));
		arguments->id = i;
		arguments->sleepTime = 10 + (3 * i);
		pthread_create(&thread, &attr, thread2_func, (void*) arguments);
		pthread_attr_destroy(&attr);
		usleep(10000); // sleep 10 ms to give thread time to get started
		// thread will free(arguments);
	}

	// Wait for all the other threads to start running
	while (thread2Count != thread2RunningCount) {
		printf("[0x%x] %s#%d: Waiting for %d of %d thread2's to start...\n",
				(int) pthread_self(), __func__,
				__LINE__, (thread2Count - thread2RunningCount), thread2Count);
		usleep(10000); // sleep 10 ms
	}

	printf(
			"[0x%x] %s#%d: Running the main part of the test - check the console for errors related to exceptions on the wrong thread...\n",
			(int) pthread_self(), __func__, __LINE__);

	// Now that all the other threads are running, have some exceptions.
	count = 1000;
	while (0 < count) {

		usleep(2000); // sleep 2 ms

		try {
			try {
				// Periocially, do NOT throw the exception
				if (count % 9 != 0) {
					throw ComputeException(1000 + count);
				}
			} catch (ComputeException &e) {
				if (isPrintLoopIndicatorThread1) {
					printf("[0x%x] %s#%d: Caught INNER %d\n",
							(int) pthread_self(), __func__, __LINE__, e.Err());
				}
				// Periodically, throw an exception (e or new)
				if (count % 3 == 0) {
					if (count % 2 == 0) {
						throw e;
					} else {
						throw ComputeException(4000 + count);
					}
				}
			}
		} catch (ComputeException &e) {
			if (isPrintLoopIndicatorThread1) {
				printf("[0x%x] %s#%d: Caught OUTTER %d\n", (int) pthread_self(),
						__func__, __LINE__, e.Err());
			}
		}

		count--;
	}

	// Set the flag so that the other threads can finish
	thread2ShouldFinish = 1;

	while (0 != thread2RunningCount) {
		printf("[0x%x] %s#%d: Waiting for %d thread2's to complete...\n",
				(int) pthread_self(), __func__,
				__LINE__, thread2RunningCount);
		usleep(10000); // sleep 10 ms
	}

	// Look for unhandled exceptions in the console
	printf(
			"[0x%x] %s#%d: Check the console for errors related to exceptions on the wrong thread and then Ctrl+C to quit.\n",
			(int) pthread_self(), __func__, __LINE__);

	// Done
	printf("[0x%x] %s#%d: EXITING\n", (int) pthread_self(), __func__, __LINE__);
	thread1Finished = 1;
	return NULL;
}

int main(void) {
	printf("[main 0x%x] %s#%d: ENTERING\n", (int) pthread_self(), __func__,
	__LINE__);

	pthread_mutexattr_t attrMutext;
	pthread_mutexattr_init(&attrMutext);
	pthread_mutexattr_settype(&attrMutext, PTHREAD_MUTEX_RECURSIVE);

	//timer
	pthread_mutex_init(&g_timerListMutex, &attrMutext);

	pthread_mutexattr_destroy(&attrMutext);

	pthread_t thread;
	pthread_attr_t attr;
	pthread_attr_init(&attr);
	pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
	pthread_attr_setstacksize(&attr, 2 * 1024 * 1024);
	if (EM_ASM_INT(return !!(Module['canvas']))) {
		emscripten_pthread_attr_settransferredcanvases(&attr, "#canvas");
	}
	pthread_create(&thread, &attr, thread_func, NULL);
	pthread_attr_destroy(&attr);

	printf("[main 0x%x] %s#%d: EXITING\n", (int) pthread_self(), __func__,
	__LINE__);
	return 0;
}

@kripken
Copy link
Member

kripken commented Sep 17, 2020

@rsielken One quick thing worth trying is to see if the combination of all 3 of #12243 #12244 #12245 fixes this, as they seem to fix at least one known issue with mutexes.

@rsielken
Copy link
Author

Starting with EMSDK 2.0.3, I modified the files from the 3 fixes, rebuilt my test program, ran the test program using node, and still hit the errors:

[0xc02628] thread_func#431: Running the main part of the test - check the console for errors related to exceptions on the wrong thread...
worker.js onmessage() captured an uncaught exception: RuntimeError: unreachable
worker.js onmessage() captured an uncaught exception: 1668509029
pthread sent an error! undefined:undefined: undefined
pthread sent an error! undefined:undefined: unreachable

@kripken
Copy link
Member

kripken commented Sep 24, 2020

I see, thanks @rsielken

I spent some time looking at this now. I do see unreachables being hit, which I assume is the error condition. Before looking into those I tried some tools, and while ASan finds nothing, SAFE_HEAP does - we read from a null pointer in the exceptions support code.

That happens because __resumeException is called with a null pointer. @aheejin is that ever a valid thing to do?

This happens from GLTimer::dispatchTimers. It doesn't have an catch or throw, but it has a class with a destructor so it ends up using an invoke. To check the invoke it looks in C memory for __THREW__, which is not currently thread-safe, see #12056.

It's possible that's a red herring, but it seems plausible it isn't. In that case, we should really fix that issue, which is blocked on a TLS fix in upstream LLVM, #12056 (comment)

@tlively @aheejin any updates on the status there? If there isn't a quick fix, I think we have no choice but to go back to (slow) JS helper variables for __THREW__ which are threadsafe.

@tlively
Copy link
Member

tlively commented Sep 24, 2020

I'll go take a look at it now. Sorry for dropping that without a clear owner.

@kripken
Copy link
Member

kripken commented Sep 25, 2020

I verified this is fixed by #12056 which will land shortly.

Thanks again for the testcase @rsielken , it's been very helpful!

kripken added a commit that referenced this issue Sep 26, 2020
This adds the thread-local annotation to those globals. Previously they were in
JS, which is effectively thread-local, but then we moved them to C which meant
they were stored in the same shared memory for all threads. A race could
happen if threads (or longjmp) operations happened at just the right time, one
writing to those globals and another reading.

Also make compiler-rt now build with a multithreaded variation, as the implementation
of those globals is in that library.

Also add a testcase that runs exceptions on multiple threads. It's not a guaranteed
way to notice a race, but it may help, and this is an area we didn't have coverage
of.

Fixes #12035

This has been a possible race condition for a very long time. I think it went
unnoticed because exceptions and longjmp tend to be used for exceptional
things, and not constantly being executed. And also until we get wasm
exceptions support they are also slow, so people have been trying to avoid
them as much as possible anyhow.
@rsielken
Copy link
Author

rsielken commented Oct 3, 2020

I updated to 2.0.5 to pick up this fix and the test case passes for me also. Thank you for the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants