-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llvmcall test failure with LLVM 3.6.0 on Win64 #10394
Comments
Oh boy, julia-debug.exe segfaults immediately in either win32 or win64. https://gist.github.com/3b9ddfa24b74b53814c9 edit: as of efb878e this no longer happens |
With LLVM assertions enabled, we get
also, updated the original gist with a backtrace from the core segfault in |
I think most of the issue is something about ccall'ing into Fortran libraries, since openspecfun has trouble too. llvmcall fails an assertion:
ccall segfaults https://gist.github.com/tkelman/697c8f73f072cfdecc9f |
While debugging #10638 I noticed the following: if I run the test lines on a local build against LLVM3.6, I get thousands of lines of |
I'm fairly confident the |
Cc: @vtjnash |
It's possible/likely that llvm3.6 is using a different windows calling convention than previous versions of llvm. usually that has only affected C++11 code, but sometimes it crosses over to the C ABI.
I've been struggling with that one since the new GC merged. I think Keno finally found the fix for it in da3d7d5 |
I've tested this with all the calling conventions but haven't seen any change. It's possible I am setting things up wrong, but the error is in parameter 1 which should be the simplest. |
if you're feeling particularly ambitious, it would be informative to single step by instructions through the |
Well, looking at the registers in an LLVM33 build, it gets the first argument in RCX (which is correctly a pointer to a char array containing |
what does it point to? it should be a dynamically-allocated stack box, but perhaps it is declared in the wrong scope so llvm is deallocating it early? |
Nevermind, my function was still
|
I checked the IR generated by clang (svn) for the same call, and it is basically the same, so this doesn't seem like our bug. Tonight I'll try turning off various passes, and also disabling copy propagation (#9199). |
Looks like this is just a deprecation issue with (At some point we will need to change all the Fortran wrappers to use the new syntax, but that is orthogonal. We do need the deprecation to work for now). |
those go through very different code paths, so it still doesn't point to what part of the llvm bytecode was emitted incorrectly |
Here are several reduced examples (...compared to the 1000 lines of IR for the real This is what we emit with 3.3. Here are Here is a stand-alone version that is runnable with |
It looks like an llvm bug then? llvm comes with a bugpoint tool for isolating errors in lowering code. Alternatively, differencing the assembly generated by the good and bad versions of the llvm bytecode can help with identifying what went wrong. Since llvm36 added a custom lowing for the win64 alloca instruction (that turns it into a function call), I'm going to guess it might be not preserving this register correctly across that function |
Yes, unless you or @Keno have a chance to look, and see anything obviously On Wed, Apr 1, 2015 at 5:56 PM, Jameson Nash [email protected]
|
Since you have the llvm bytecode now, you can run llc on any machine to investigate the resulting assembly |
The redundant alloca might be an artifact of not having marked the arguments as isSA in the initial codegen assignments I don't think that should matter here though? |
Thanks, I do have Is there that much difference in codegen paths between 3.5 and 3.6 (on our side)? @tkelman do you have an LLVM3.5 build -- if so, could you check the IR for this function? |
the |
Here is the |
I can't get a build using 3.5.1 to finish bootstrapping on current master, unknown function access violation in linalg1? |
we should cherry-pick the following commit onto our llvm-3.6.0, otherwise the MCJIT will remain DOA for win64 (#9339):
|
We should be able to pull that in as a patch file, unless we're also planning on relying on a more substantially modified branch soon. |
Not sure what was wrong yesterday, works better with 3.5.1 today.
|
Looks like it's still a calling convention issue; 3.6 is putting args to |
wow, it's my second copyprop bug of the day! Lines 5607 to 5609 in d2ee85d
|
😭 |
cross link #9199 |
How did you test that? I get |
in various indirect ways. i may have missed some combination of julia and llvm versions? |
Sorry, I just saw your comments on the gist (because github). Replied there (yes, they were swapped. too many consoles). |
I tested with |
Looking better now: llvmcall and linalg2 both pass. Now I get a weird failure in the markdown test, not sure if related:
|
Ok, false alarm. Seems that the error was caused by having windows line endings in |
Current status: sparse and parallel failed, but they both pass when run individually. |
parallel failed because sparse had previously failed (and killed the worker). so it looks like just sparse had some alignment or corruption issue. |
the sparse failure looks quite similar to what I saw above #10394 (comment), something in umfpack |
Note that I still see the same assertion failure in llvmcall when I have |
Was having trouble getting a breakpoint to work at the assertion failure, so resorted to the trusty printf debugger. With this patch diff --git a/src/debuginfo.cpp b/src/debuginfo.cpp
index 3b7465a..c443ef4 100644
--- a/src/debuginfo.cpp
+++ b/src/debuginfo.cpp
@@ -251,8 +251,9 @@ public:
@@ -251,8 +251,9 @@ public:
UnwindData = (uint8_t*)Addr;
if (SectionAddrCheck)
assert(SectionAddrCheck == SectionAddr);
- else
+ else {
SectionAddrCheck = SectionAddr;
+ printf("Section %s has SectionAddr = %lld\n", sName.data(), SectionAddr); }
}
if (sName.equals("__catchjmp")) {
sym_iter.getAddress(Addr);
@@ -269,10 +270,12 @@ public:
catchjmp = (uint8_t*)Addr;
if (SectionAddrCheck)
assert(SectionAddrCheck == SectionAddr);
- else
+ else {
SectionAddrCheck = SectionAddr;
+ printf("Section %s has SectionAddr = %lld\n", sName.data(), SectionAddr); }
}
}
+ // should we reset SectionAddrCheck = 0; after this loop?
assert(catchjmp);
assert(UnwindData);
catchjmp[0] = 0x48;
@@ -333,11 +336,15 @@ public:
Section->getSize(SectionSize);
#endif
sym_iter.getName(sName);
+ printf("Section %s has SectionAddr = %lld\n", sName.data(), SectionAddr);
#ifdef _CPU_X86_
if (sName[0] == '_') sName = sName.substr(1);
#endif
- if (SectionAddrCheck)
- assert(SectionAddrCheck == SectionAddr);
+ if (SectionAddrCheck) {
+ if (SectionAddrCheck != SectionAddr) {
+ printf("SectionAddrCheck = %lld\n", SectionAddrCheck);
+ printf("SectionAddr = %lld\n", SectionAddr); }
+ assert(SectionAddrCheck == SectionAddr); }
else
SectionAddrCheck = SectionAddr;
create_PRUNTIME_FUNCTION( I get the following output
|
but from this, it appears that there are some bugs in the way llvmcall emits its extra function: |
Nice, that does fix the llvmcall assertion failure. The umfpack problem might be worth opening a new issue for. Or repurposing this one. |
let's go the new issue route. with 59 comments, this one is getting a bit long to keep track of. and i think it should now be possible to open narrow issues |
👏 |
bump. can you open the new issues for remaining bugs and cross link them here for reference? I think the only remaining one was the linalg/arnoldi when running all linalg tests on one process? |
Sure. Also the bad_alloc in hashing on win32 still happens last I checked. |
See https://gist.github.com/tkelman/adddb1018adabbb6711b
Mostly lapack / ccall seems broken. The core failure in
jl_method_table_assoc_exact
is also worrying. Dict and hashing just die with no error.On win32 things are reasonable if you run the tests one at a time (hits #10377 but otherwise ok), though trying to run all of the tests in the same process results in this:
The text was updated successfully, but these errors were encountered: