-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Threading without using patched LLVM #14083
Conversation
67a6369
to
1bd69dd
Compare
1bd69dd
to
393bddc
Compare
2ecc7e5
to
b2339b8
Compare
Squashed and all the TODO's in this PR are done now! Remove This essentially implements a runtime weak symbol. It would be nice if the dynamic linker can do this for us (I tried Getting the address of the TLS variable is not as fast as (~30-60% slower) the optimal case (a single function returning a static TLS address) but is also much faster (2x-3x) than using the global dynamic model. Note that this is also only relavant for code in the The advantage of this approach is that it should be able to support all platforms (non-linux have a lot of issues with unresolved symbols in a shared library) and doesn't require another shim for embedding. |
70c2643
to
c8ae160
Compare
@@ -5640,6 +5625,25 @@ static void init_julia_llvm_env(Module *m) | |||
"jl_float_temp")); | |||
#endif | |||
|
|||
#ifndef JULIA_ENABLE_THREADING | |||
jltls_states_var = | |||
new GlobalVariable(*m, ArrayType::get(T_int8, sizeof(jl_tls_states_t)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be declared as an ArrayType of T_pint8
so that alignment is correct and to avoid the BitCastOp on most usages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually had that at first but was a little bit paranoid about the size I should supply since jl_tls_states_t
has a int16_t
and is not exactly multiple of pointer size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, although although sizeof must be a multiple of __alignof in C, so the over-allocation is guaranteed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I'll rearange the members a little and declare it as a pointer array to LLVM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. You are right that sizeof(jl_tls_states_t) % sizeof(void*) == 0
. The bitcast was actually not necessary anyway since it's always casted to T_ppjlvalue
by emit_nthptr_addr
anyway....
lgtm. thanks, 🎅 🐧 |
fixes d:/code/msys64/home/Tony/julia/src/jltypes.c(1910) : error C2664: 'LONG64 _Inter lockedCompareExchange64(volatile LONG64 *,LONG64,LONG64)' : cannot convert argum ent 1 from 'volatile uint64_t *' to 'volatile LONG64 *' Types pointed to are unrelated; conversion requires reinterpret_cast, C- style cast or function-style cast d:/code/msys64/home/Tony/julia/src/jltypes.c(1913) : error C2664: 'LONG64 _Inter lockedCompareExchange64(volatile LONG64 *,LONG64,LONG64)' : cannot convert argum ent 1 from 'volatile uint64_t *' to 'volatile LONG64 *' Types pointed to are unrelated; conversion requires reinterpret_cast, C- style cast or function-style cast d:/code/msys64/home/Tony/julia/src/jltypes.c(1993) : error C2664: 'LONG64 _Inter lockedCompareExchange64(volatile LONG64 *,LONG64,LONG64)' : cannot convert argum ent 1 from 'volatile uint64_t *' to 'volatile LONG64 *' Types pointed to are unrelated; conversion requires reinterpret_cast, C- style cast or function-style cast d:/code/msys64/home/Tony/julia/src/jltypes.c(2000) : error C2664: 'LONG64 _Inter lockedCompareExchange64(volatile LONG64 *,LONG64,LONG64)' : cannot convert argum ent 1 from 'volatile uint64_t *' to 'volatile LONG64 *' Types pointed to are unrelated; conversion requires reinterpret_cast, C- style cast or function-style cast also define sleep for MSVC Kernel32 Sleep takes milliseconds, ref https://msdn.microsoft.com/en-us/library/windows/desktop/ms686298(v=vs.85).aspx fixes d:/code/msys64/home/Tony/julia/src/threading.c(212) : error C3861: 'sleep': iden tifier not found d:/code/msys64/home/Tony/julia/src/threading.c(290) : error C3861: 'sleep': iden tifier not found
…e always `extern "C"`'d so codegen_mutex and codegen_lock_count do not get c++ mangled
c8ae160
to
2986358
Compare
So I've just run some micro-benchmarks to compare the performance of creating a useless gc frame. The functions I used to benchmark this is (they are the @noinline function tls_test(a)
1 > 2 && "a$a"
a
end
@noinline function non_tls_test(a)
a
end The first function generates a useless gcframe for the branch eliminated by llvm. I've also precompiled these in the sysimg and verified in GDB that for this PR with threading, the sysimg version has one more level of indirection through Numbers below (time in ns)
The only possible regression is the non-threading GC frame is slower by 1-2 cycles (0.217ns). The extra level of indirection for tls access in sysimg isn't as bad as what I thought. Overall, I'm pretty happy with the current performance. Maybe we can do better for the tls access (trick the linker or emitting static TLS access if we can recognize that (with llvm support)?) but not for this PR ;-p Will merge tomorrow if there's no other issues. |
Let's merge now? |
Threading without using patched LLVM
Sure..... |
Does this mean we can turn on travis? |
The build should work, but I'm pretty sure there's still bugs (not sure if we can hit those bugs without actually running many threads though). Maybe at least have a buildbot for it? @tkelman @staticfloat |
It is dev master. I say let's turn threads on by default, everywhere except MSVC. |
I'd prefer to leave it off until we have a stop-the-world GC (rather than a wait-for-the-world GC as @vtjnash phrased it....) |
I thought there was a buildbot for threading. |
@yuyichao What build configuration would you want? Just tacking |
I thought there wasn't since I've heard there's issues getting newer LLVM to build.
Normal build and test with |
What is the minimum version of LLVM we can build against? We have |
Cxx nightlies are broken on gcc 5 because they are based off a broken point in upstream's history, see JuliaInterop/Cxx.jl#73 (comment). Threading with LLVM 3.3 should now work, which was the purpose of this PR. |
Perhaps I was thinking of the cxx buildbot. Sorry for the noise. |
Alright, I've got a nightly builder that will attempt this stuff, and have a test build forcing its way through right now. |
Seems like it was broken by #14103 (comment) |
This tries to implement what @vtjnash suggested for making threading working without LLVM patch. This should also make it possible to support threading on windows since (I was told that) the LLVM tls patch doesn't support emitting TLS on windows.
A few notes
and TODOs(all done):The first few commits in this PR are not directly related to TLS, they are either slightly relatedclean-up's or fix of the bug discovered when testing this PR.
(merged as Preserve function attributes in prepare_call (plus clean up) #14084)
This is currently GCC/Clang only due to the(fixed, MSVC doesn't/won't support it, use__attribute__((const))
but it is only for performanceand not for correctness so can easily be disabled for other compilers (MSVC) or replaced with
equivalent attributes.
__declspec(noalias)
on MSVC)#ifdef
'sin the struct in order to keep it clean and to make the threading and non-threading versions ABI
compatible (to certain degree anyway...). This also effectively export all the TLS variables but we
could easily fix this using a nested struct (and only exposing the pointer to the inner, exported
struct) if it is a concern. (FWIW, this is C and there's way more things you can do to break stuff
other than accessing a non-exported TLS variable....)
LLVM37.
(The compilation of(reported as Race condition in threads initialization. #14091)inference0.ji
with threading on had one SegFault that I couldn'treproduce).
Codegen is always accessing the TLS variables through a (const) function call. I get a freeze(fixed). OTOH, with thebefore enterning the REPL if I let it use the global variable directly for non-threading version.
(probably because I didn't declare the struct correctly for LLVM).
function attributes set (copied from Clang), llvm is able to emit only on call to
jl_get_ptls_states()
per function.
TLS model is stillUsing a dynamic model by default. Override to a static model in repl. The override has to be done at initialization time and can be done for embedding as well if needed.global-dynamic
.c.c @tkelman @Keno
edit: closes #13975