Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USE LBT 5.0 and the new MKL ILP64 suffixes #104

Merged
merged 6 commits into from
Aug 24, 2022
Merged

USE LBT 5.0 and the new MKL ILP64 suffixes #104

merged 6 commits into from
Aug 24, 2022

Conversation

ViralBShah
Copy link
Contributor

Also have the ability to load LP64

Fix #97

@ViralBShah ViralBShah mentioned this pull request Jan 22, 2022
.github/workflows/ci.yml Outdated Show resolved Hide resolved
@ViralBShah
Copy link
Contributor Author

1.6 failures are due to #95.

@ViralBShah
Copy link
Contributor Author

ViralBShah commented Jan 22, 2022

Not sure why we get this error when it clearly seems to have loaded fine on the earlier line:

https://github.com/JuliaLinearAlgebra/MKL.jl/runs/4903803170?check_suite_focus=true#step:6:136

LinearAlgebra.BLAS.get_config() = LBTConfig([ILP64] libmkl_rt.so, [LP64] libmkl_rt.so)
Error: no BLAS/LAPACK library loaded!

Is it because MKL itself is not getting installed? @staticfloat @giordano Any ideas?

@ViralBShah
Copy link
Contributor Author

ComplexF32 dot has an issue due to LBT. Filed JuliaLinearAlgebra/libblastrampoline#56

@staticfloat
Copy link
Member

Fixed the issues with LBT v5.0.1, so we're ready to push this forward as soon as JuliaLang/julia#44017 is merged.

@ViralBShah ViralBShah changed the title USE LBT 4.0 and the new MKL ILP64 suffixes USE LBT 5.0 and the new MKL ILP64 suffixes Feb 3, 2022
@ViralBShah ViralBShah closed this Feb 4, 2022
@ViralBShah ViralBShah reopened this Feb 4, 2022
@ViralBShah
Copy link
Contributor Author

@giordano @staticfloat Shouldn't this be now picking up a julia nightly with LBT5? Still failing in the same place on complex dot.

@giordano
Copy link
Contributor

giordano commented Feb 4, 2022

The nightly build used is shown in the log):

Julia Version 1.8.0-DEV.1450
Commit b486f0dec2 (2022-02-04 08:56 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.0 (ORCJIT, skylake-avx5

This is JuliaLang/julia@b486f0d, so after the update of libblastrampoline

@ViralBShah
Copy link
Contributor Author

ViralBShah commented Feb 4, 2022

Ok mac is old - 1444 - presumably because the buildbots are behind and under transition. Windows is the right build, but it is segfaulting.

@staticfloat
Copy link
Member

Okay for Windows, my best guess is that what's happening is that there's a gfortran ABI issue.

So on Windows, the x86_64 ABI used is different than on Linux, and my guess is that the "gfortran ABI" that I'm expecting from Linux isn't correct on Windows. I'm a little surprised though, since the LTB tests themselves pass on Windows. Can we get some printf debugging done here to try and figure out what BLAS call is crashing?

@ViralBShah
Copy link
Contributor Author

ViralBShah commented Feb 4, 2022

I'll see if I can find a reproducer. Surprisingly then, windows x86 seems to be working almost fine (and goes a long way before failing). x86 linux is also crashing much later in the tests. In both cases, they seem to run out of memory.

Both x86 platforms are also failing with Julia 1.7 - so I wonder if something has changed on the CI side and we are getting lesser memory than before.

@giordano
Copy link
Contributor

giordano commented Feb 4, 2022

and goes a long way before failing

Maybe too long? It has been running for 4 hours now

@ViralBShah
Copy link
Contributor Author

ViralBShah commented Feb 19, 2022

I have updated this PR to use LBT 5 and bump release to 0.6 and will also require Julia 1.8. We should merge this closer to the 1.8 release (and after fixing various issues that have shown up with LBT 5).

In the meanwhile we should lock down the 0.5 releases for Julia 1.7.

Cc @KristofferC

Project.toml Outdated Show resolved Hide resolved
ViralBShah added a commit that referenced this pull request Feb 19, 2022
We have a bunch of new stuff coming in #104 for Julia 1.8 - and it doesn't make sense to test this package on anything but Julia 1.7.
@ViralBShah ViralBShah force-pushed the vs/lp64 branch 2 times, most recently from 67ddf51 to b0aa1e3 Compare February 19, 2022 22:08
@ViralBShah
Copy link
Contributor Author

ViralBShah commented Feb 19, 2022

I can't reproduce the mac test failure locally. Combined with the other general hangs and failures on the other platforms, I suspect there is some corruption happening with the new autodetection code in LBT 5. I think we need to track that down first, and I am hoping that also fixes the other failures elsewhere.

Basically everything is failing in different places.

@ViralBShah
Copy link
Contributor Author

@chriselrod In case you are curious about MKL progress - this is the PR that we need to get merged next.

@staticfloat
Copy link
Member

staticfloat commented Mar 2, 2022

I spent some time looking into this, and I'm not sure what's going wrong here. I'm starting to suspect a Julia bug. :P

The issue is independent of LBT; it's triggerable with the following:

using Libdl, MKL_jll, OpenBLAS32_jll
BlasInt = Int32
N = 2000
A = ones(N, N)
B = ones(N, N)
C = ones(N, N)
for lib_handle in (OpenBLAS32_jll.libopenblas_handle, MKL_jll.libmkl_rt_handle)
    ccall(dlsym(lib_handle, :dgemm_), Cvoid, (Ref{UInt8}, Ref{UInt8}, Ref{BlasInt}, Ref{BlasInt},
                        Ref{BlasInt}, Ref{Float64}, Ptr{Float64}, Ref{BlasInt},
                        Ptr{Float64}, Ref{BlasInt}, Ref{Float64}, Ptr{Float64},
                        Ref{BlasInt}),
                        'N', 'N', N, N, N, 1.0, A, N, B, N, 0.0, C, N)
    @show C[1,1]
end

Here, we load MKL and OpenBLAS32 and directly call dgemm_ with some NxN matrices where N is 2000. On my x86_64 mac mini, I get unpredictable results from MKL for this, while OpenBLAS is solidly 2000:

$ julia-master --project=. blas_test.jl 
C[1, 1] = 2000.0
C[1, 1] = 3808.0
$ julia-master --project=. blas_test.jl 
C[1, 1] = 2000.0
C[1, 1] = 3152.0

Note that I have no problems on Linux, and I have no problems on Julia v1.7.2:

$ julia --project=. blas_test.jl 
C[1, 1] = 2000.0
C[1, 1] = 2000.0

I wanted to see if this might be a problem with MKL itself, but my C test case works flawlessly, even through LBT.

I also wanted to check to see if perhaps something funny was going on in the argument passing, so I used lldb to print out the function arguments when we entered dgemm_ for both libraries, but they both looked flawless:

# Transpose characters
p ((char *)$rdi)[0]
p ((char *)$rsi)[0]

# N sizes (should be 2000)
p ((int32_t *)$rdx)[0]
p ((int32_t *)$rcx)[0]
p ((int32_t *)$r8)[0]

# alpha (should be 1.0)
p ((double *)$r9)[0]

# A (should be 1.0)
p (*(double **)($sp+8))[0]

# N size (should be 2000)
p (*(int32_t **)($sp+16))[0]

# B (should be 1.0)
p (*(double **)($sp+24))[0]

# N size (should be 2000)
p (*(int32_t **)($sp+32))[0]

# beta (should be 0)
p (*(double **)($sp+40))[0]

# C (should be 1.0, not that it matters)
p (*(double **)($sp+48))[0]

# N size (should be 2000)
p (*(int32_t **)($sp+56))[0]

At this point, I think I need someone more knowledgable to step in and help debug why we're getting these inconsistent results. Is it possible Julia itself is corrupting the values somehow? But why would it only be corrupting them for MKL? EDIT: I quickly tested this by using lldb to print the value of the array as it comes back out after finish'ing the dgemm_ call, and it looks like it is indeed corrupted from within MKL.

@ViralBShah
Copy link
Contributor Author

ViralBShah commented Mar 3, 2022

I see the same issues. The issue is multi-threading (also discussed in #98). This works on mac (while your example does fail for me):

using Libdl, MKL_jll, OpenBLAS32_jll
BlasInt = Int32

@enum Threading begin
    THREADING_INTEL
    THREADING_SEQUENTIAL
    THREADING_PGI
    THREADING_GNU
    THREADING_TBB
end

err = ccall((:MKL_Set_Threading_Layer, libmkl_rt), Cint, (Cint,),
            THREADING_SEQUENTIAL)

N = 2000
A = ones(N, N)
B = ones(N, N)
C = ones(N, N)
for lib_handle in (MKL_jll.libmkl_rt_handle,)
    ccall(dlsym(lib_handle, :dgemm_), Cvoid, (Ref{UInt8}, Ref{UInt8}, Ref{BlasInt}, Ref{BlasInt},
                        Ref{BlasInt}, Ref{Float64}, Ptr{Float64}, Ref{BlasInt},
                        Ptr{Float64}, Ref{BlasInt}, Ref{Float64}, Ptr{Float64},
                        Ref{BlasInt}),
                        'N', 'N', N, N, N, 1.0, A, N, B, N, 0.0, C, N)
    @show C[1,1]
end

@staticfloat staticfloat force-pushed the vs/lp64 branch 3 times, most recently from 0297925 to e9954bd Compare March 12, 2022 19:01
@ViralBShah
Copy link
Contributor Author

On mac, there is one gsvd test that gives slightly different answers, which we can ignore.

@staticfloat staticfloat force-pushed the vs/lp64 branch 4 times, most recently from 3fe2ea6 to 144ce9d Compare March 15, 2022 20:44
ViralBShah and others added 3 commits March 15, 2022 20:47
We keep on running out of memory, so let's see if this is any better.
Also see if this helps with the OOM errors we've been getting, as the
code generated should be split across multiple workers.
@staticfloat
Copy link
Member

Testing in parallel has improved the situation significantly; remaining issues:

  • Windows x86_64 nightly is 5 days old, so it's still failing. :(
  • Linux i686 nightly is also quite old and is failing, although I'm not 100% certain why (there are NaN's in places that I don't recognize)

My plan is to focus on Base CI and get the nightlies back in better shape, then return to this.

@staticfloat staticfloat reopened this Mar 24, 2022
@ViralBShah ViralBShah closed this Apr 24, 2022
@ViralBShah ViralBShah reopened this Apr 24, 2022
@ViralBShah ViralBShah closed this Jun 19, 2022
@ViralBShah ViralBShah reopened this Jun 19, 2022
@ViralBShah ViralBShah closed this Jul 14, 2022
@ViralBShah ViralBShah reopened this Jul 14, 2022
@ViralBShah
Copy link
Contributor Author

@giordano Is the pipeline.yml in this PR ok? Does it need to be bumped to a recent version?

@ViralBShah
Copy link
Contributor Author

ViralBShah commented Aug 24, 2022

The test failures don't seem to have anything to do with LBT. On mac, the results are slightly different on gsvd, and on windows, things seem to run out of memory (I believe).

I am merging this for now, but we should hold off a bit on registering. It would be good if people can try this out and report issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segfault with Python
3 participants