Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.7.0-rc2: StackOverflowError with complex-valued matrix exp #886

Closed
daviehh opened this issue Nov 8, 2021 · 9 comments · Fixed by JuliaLang/julia#43300
Closed

1.7.0-rc2: StackOverflowError with complex-valued matrix exp #886

daviehh opened this issue Nov 8, 2021 · 9 comments · Fixed by JuliaLang/julia#43300
Labels
bug Something isn't working external dependencies Involves LLVM, OpenBLAS, or other linked libraries regression Regression in behavior compared to a previous version

Comments

@daviehh
Copy link

daviehh commented Nov 8, 2021

On macos, using version 1.7.0-rc2, julia just shows StackOverflowError with no other info when taking the matrix exp with a ~ 300x300 complex-valued matrix. Minimum example:

using LinearAlgebra

n = 300
m = rand(ComplexF64, n, n);
mex = exp(m);

Alro ran with --startup-file=no to make sure it's not some clash with other packages.

Screen Shot 2021-11-08 at 5 52 35 PM

my versioninfo():

Julia Version 1.7.0-rc2
Commit f23fc0d27a (2021-10-20 12:45 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.5.0)
  CPU: Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
  JULIA_MATHLINK = /Applications/Mathematica.app/Contents/Frameworks/mathlink.framework
  JULIA_MATHKERNEL = /Applications/Mathematica.app/Contents/MacOS/MathKernel

Same code runs fine with Version 1.6.3 (2021-09-23).

Hardware: Macbook pro (intel) with 16 GB ram, activity monitor shows low memory usage.

In addition, the same code sometimes gives ERROR: LoadError: ReadOnlyMemoryError(), not sure how to reproduce that one...

Thanks!

@vtjnash
Copy link
Member

vtjnash commented Nov 8, 2021

I can confirm (and seems fixed on master). Looks like an openblas issue:

(lldb) bt
* thread JuliaLang/julia#1, queue = 'com.apple.main-thread'
  * frame #0: 0x000000011b9e23a8 libopenblas64_.0.3.13.dylib`zgetrf_parallel + 24
    frame JuliaLang/julia#1: 0x000000011b9e2533 libopenblas64_.0.3.13.dylib`zgetrf_parallel + 419
    frame JuliaLang/julia#2: 0x000000011b9e2533 libopenblas64_.0.3.13.dylib`zgetrf_parallel + 419
    frame JuliaLang/julia#3: 0x000000011b9e2533 libopenblas64_.0.3.13.dylib`zgetrf_parallel + 419
    frame JuliaLang/julia#4: 0x000000011b9e2533 libopenblas64_.0.3.13.dylib`zgetrf_parallel + 419
    frame JuliaLang/julia#5: 0x000000011b9e2533 libopenblas64_.0.3.13.dylib`zgetrf_parallel + 419
    frame JuliaLang/julia#6: 0x000000011b9e2533 libopenblas64_.0.3.13.dylib`zgetrf_parallel + 419
    frame JuliaLang/julia#7: 0x000000011b9e2533 libopenblas64_.0.3.13.dylib`zgetrf_parallel + 419
    frame JuliaLang/julia#8: 0x000000011b7e0372 libopenblas64_.0.3.13.dylib`zgesv_64_ + 402
    frame JuliaLang/julia#9: 0x00000001100fc1a4
    frame JuliaLang/julia#10: 0x00000001100fead8
    frame JuliaLang/julia#11: 0x0000000110100ea4
    frame JuliaLang/julia#12: 0x00000001063385f0 libjulia-internal.1.7.dylib`do_call + 208

@vtjnash vtjnash added the external dependencies Involves LLVM, OpenBLAS, or other linked libraries label Nov 8, 2021
@daviehh
Copy link
Author

daviehh commented Nov 9, 2021

Looks like it, current master uses libopenblas64_.0.3.17.dylib, and by running

BLAS.lbt_forward("/path/to/libopenblas64_.0.3.17.dylib"; clear=true)

in 1.7.0-rc2 the issue is resolved,

image

so maybe just bump openblas_jll for 1.7?

@carstenbauer
Copy link
Member

https://discourse.julialang.org/t/inv-causes-stack-overflow-on-julia-1-7-0-and-mac-os/72411/12 seems to be very similar. Perhaps the same cause?

@stevengj
Copy link
Member

stevengj commented Dec 2, 2021

I think it's the same cause as the abovementioned discourse thread.

I'm getting the same issue with exp as above on 1.7.0 on macOS (x86_64). It boils down to a LAPACK call: I get StackOverflowError from

using LinearAlgebra
n = 300
A = rand(ComplexF64,n,n)
B = copy(A)
LAPACK.gesv!(A,B)

or even if I directly ccall to LAPACK:

import LinearAlgebra.BLAS: @blasfunc, libblastrampoline, BlasInt
ipiv = similar(A, BlasInt, n)
info = Ref{BlasInt}()
ccall((@blasfunc(cgesv_), libblastrampoline), Cvoid,
    (Ref{BlasInt}, Ref{BlasInt}, Ptr{ComplexF32}, Ref{BlasInt}, Ptr{BlasInt},
    Ptr{ComplexF32}, Ref{BlasInt}, Ptr{BlasInt}),
    n, size(B,2), A, max(1,stride(A,2)), ipiv, B, max(1,stride(B,2)), info)

Similarly, the StackOverflowError in the discourse thread for inv(A) boils down to a ccall((@blasfunc(sgetrf_), libblastrampoline), ...).

@carstenbauer
Copy link
Member

Mentioned by @KristofferC on Slack: JuliaPackaging/Yggdrasil#3996

@gbaraldi
Copy link
Member

gbaraldi commented Dec 2, 2021

It might be that macos is more susceptible to these stackoverflows because, unless I understood incorrectly, the default pthread stack is 512kb on macos and it's larger on other OSs. Linux seems to be 2Mb and windows 1Mb.

giordano referenced this issue in giordano/julia Dec 2, 2021
This version has been rebuilt to have 32 threads by default, instead of 512 as
it accidentally happened before.  The large number of threads caused problems on
some platforms, including `StackOverflowError`s.

Fix #43008.
@vchuravy vchuravy linked a pull request Dec 2, 2021 that will close this issue
KristofferC referenced this issue in JuliaLang/julia Dec 2, 2021
This version has been rebuilt to have 32 threads by default, instead of 512 as
it accidentally happened before.  The large number of threads caused problems on
some platforms, including `StackOverflowError`s.

Fix #43008.
@BSnelling
Copy link

I also saw this with v1.6.4, but v1.6.3 worked fine.
Screenshot 2021-12-10 at 17 31 49

@KristofferC
Copy link
Member

Yes, work is in progress for releasing 1.6.5 and 1.7.1 to address this

@giordano
Copy link
Contributor

This issue should have been fixed by JuliaLang/julia#43300

@KristofferC KristofferC transferred this issue from JuliaLang/julia Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working external dependencies Involves LLVM, OpenBLAS, or other linked libraries regression Regression in behavior compared to a previous version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants