-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
USE LBT 5.0 and the new MKL ILP64 suffixes #104
Conversation
1.6 failures are due to #95. |
Not sure why we get this error when it clearly seems to have loaded fine on the earlier line: https://github.com/JuliaLinearAlgebra/MKL.jl/runs/4903803170?check_suite_focus=true#step:6:136
Is it because MKL itself is not getting installed? @staticfloat @giordano Any ideas? |
ComplexF32 |
Fixed the issues with LBT v5.0.1, so we're ready to push this forward as soon as JuliaLang/julia#44017 is merged. |
@giordano @staticfloat Shouldn't this be now picking up a julia nightly with LBT5? Still failing in the same place on complex dot. |
The nightly build used is shown in the log):
This is JuliaLang/julia@b486f0d, so after the update of libblastrampoline |
Ok mac is old - 1444 - presumably because the buildbots are behind and under transition. Windows is the right build, but it is segfaulting. |
Okay for Windows, my best guess is that what's happening is that there's a gfortran ABI issue. So on Windows, the x86_64 ABI used is different than on Linux, and my guess is that the "gfortran ABI" that I'm expecting from Linux isn't correct on Windows. I'm a little surprised though, since the LTB tests themselves pass on Windows. Can we get some printf debugging done here to try and figure out what BLAS call is crashing? |
I'll see if I can find a reproducer. Surprisingly then, windows x86 seems to be working almost fine (and goes a long way before failing). x86 linux is also crashing much later in the tests. In both cases, they seem to run out of memory. Both x86 platforms are also failing with Julia 1.7 - so I wonder if something has changed on the CI side and we are getting lesser memory than before. |
Maybe too long? It has been running for 4 hours now |
a3ae11a
to
edc68d8
Compare
I have updated this PR to use LBT 5 and bump release to 0.6 and will also require Julia 1.8. We should merge this closer to the 1.8 release (and after fixing various issues that have shown up with LBT 5). In the meanwhile we should lock down the 0.5 releases for Julia 1.7. Cc @KristofferC |
We have a bunch of new stuff coming in #104 for Julia 1.8 - and it doesn't make sense to test this package on anything but Julia 1.7.
67ddf51
to
b0aa1e3
Compare
I can't reproduce the mac test failure locally. Combined with the other general hangs and failures on the other platforms, I suspect there is some corruption happening with the new autodetection code in LBT 5. I think we need to track that down first, and I am hoping that also fixes the other failures elsewhere. Basically everything is failing in different places. |
@chriselrod In case you are curious about MKL progress - this is the PR that we need to get merged next. |
I spent some time looking into this, and I'm not sure what's going wrong here. I'm starting to suspect a Julia bug. :P The issue is independent of LBT; it's triggerable with the following:
Here, we load
Note that I have no problems on Linux, and I have no problems on Julia v1.7.2:
I wanted to see if this might be a problem with MKL itself, but my C test case works flawlessly, even through LBT. I also wanted to check to see if perhaps something funny was going on in the argument passing, so I used
At this point, I think I need someone more knowledgable to step in and help debug why we're getting these inconsistent results. |
I see the same issues. The issue is multi-threading (also discussed in #98). This works on mac (while your example does fail for me):
|
0297925
to
e9954bd
Compare
On mac, there is one gsvd test that gives slightly different answers, which we can ignore. |
3fe2ea6
to
144ce9d
Compare
We keep on running out of memory, so let's see if this is any better.
Also see if this helps with the OOM errors we've been getting, as the code generated should be split across multiple workers.
Testing in parallel has improved the situation significantly; remaining issues:
My plan is to focus on Base CI and get the nightlies back in better shape, then return to this. |
@giordano Is the pipeline.yml in this PR ok? Does it need to be bumped to a recent version? |
The test failures don't seem to have anything to do with LBT. On mac, the results are slightly different on gsvd, and on windows, things seem to run out of memory (I believe). I am merging this for now, but we should hold off a bit on registering. It would be good if people can try this out and report issues. |
Also have the ability to load LP64
Fix #97