Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SYSTEM BLAS on OS X, without using system LAPACK #3365

Closed
ViralBShah opened this issue Jun 12, 2013 · 47 comments
Closed

Use SYSTEM BLAS on OS X, without using system LAPACK #3365

ViralBShah opened this issue Jun 12, 2013 · 47 comments
Labels
building Build system, or building Julia or its dependencies system:mac Affects only macOS

Comments

@ViralBShah
Copy link
Member

I uncommented the USE_SYSTEM_LAPACK stuff in deps/Makefile and did make USE_SYSTEM_BLAS=1 USE_SYSTEM_LAPACK=0 on os x. The idea is to try replace openblas with Apple BLAS, but we need to build lapack since the one that is included with os x is quite old.

The spotrf test fails in test-linalg, which is the first test to run. However only the Float32 case seems to fail, and dpotrf seems to work just fine. I wonder if this is an issue with ccall or something else.

@ViralBShah
Copy link
Member Author

Cc: @staticfloat

@staticfloat
Copy link
Member

Last time I tried this I got crashes because Julia was expecting function endpoints that were provided by OpenBLAS and not by Accelerate/VecLib. I'll try again soon, I think my problem was I wasn't building LAPACK.

If we get this working just fine, I'll definitely include this in my eventual performance regressions framework.

@ViralBShah
Copy link
Member Author

You need to do just this. No crashes, but there's something funny. I verified that it is calling the compiled LAPACK as well, and not the one from vecLib.

My Make.user is:

override USE_SYSTEM_BLAS = 1
override USE_SYSTEM_LAPACK = 0
override USE_BLAS64 = 0

And this is the patch to deps/Makefile

diff --git a/deps/Makefile b/deps/Makefile
index f11063c..950e5d6 100644
--- a/deps/Makefile
+++ b/deps/Makefile
@@ -116,9 +116,9 @@ ifeq ($(USE_SYSTEM_SUITESPARSE), 0)
 STAGE2_DEPS += suitesparse
 endif

-#ifeq ($(USE_SYSTEM_LAPACK), 0)
-#STAGE2_DEPS += lapack
-#endif
+ifeq ($(USE_SYSTEM_LAPACK), 0)
+STAGE2_DEPS += lapack
+endif

 #Platform specific flags

@ViralBShah
Copy link
Member Author

The motivation for all this is #3369

@staticfloat
Copy link
Member

I tried, but compilation of LAPACK hung, and I'm not fluent enough with FORTRAN to follow the instructions here:

$ make
  ASCII character set
  Tests completed
  Epsilon                      =    5.96046448E-08
  Safe minimum                 =    1.17549435E-38
  Base                         =    2.00000000    
  Precision                    =    1.19209290E-07
  Number of digits in mantissa =    24.0000000    
  Rounding mode                =    1.00000000    
  Minimum exponent             =   -125.000000    
  Underflow threshold          =    1.17549435E-38
  Largest exponent             =    128.000000
  Overflow threshold           =    3.40282347E+38
  Reciprocal of safe minimum   =    8.50705917E+37
  Epsilon                      =    1.1102230246251565E-016
  Safe minimum                 =    2.2250738585072014E-308
  Base                         =    2.0000000000000000     
  Precision                    =    2.2204460492503131E-016
  Number of digits in mantissa =    53.000000000000000     
  Rounding mode                =    1.0000000000000000     
  Minimum exponent             =   -1021.0000000000000     
  Underflow threshold          =    2.2250738585072014E-308
  Largest exponent             =    1024.0000000000000     
  Overflow threshold           =    1.7976931348623157E+308
  Reciprocal of safe minimum   =    4.4942328371557898E+307
 Time for  0.100E+09 SAXPY ops =   0.00     seconds
 *** Warning:  Time for operations was less or equal than zero => timing in TESTING might be dubious
 Including SECOND, time        =  0.565E-02 seconds
 Average time for SECOND       =  0.113E-03 milliseconds
 Time for  0.100E+09 DAXPY ops =  0.100E-05 seconds
 DAXPY performance rate        =  0.100E+09 mflops 
 Including DSECND, time        =  0.567E-02 seconds
 Average time for DSECND       =  0.113E-03 milliseconds
 Equivalent floating point ops =  0.113E+08 ops
 We are about to check whether infinity arithmetic
 can be trusted.  If this test hangs, set
 ILAENV = 0 for ISPEC = 10 in LAPACK/SRC/ilaenv.f

 Infinity arithmetic performed as per the ieee spec.
 However, this is not an exhaustive test and does not
 guarantee that infinity arithmetic meets the ieee spec.

 We are about to check whether NaN arithmetic
 can be trusted.  If this test hangs, set
 ILAENV = 0 for ISPEC = 11 in LAPACK/SRC/ilaenv.f

 NaN arithmetic performed as per the ieee spec.
 However, this is not an exhaustive test and does not
 guarantee that NaN arithmetic meets the ieee spec.

 LAPACK            3 .           4 .           2

In short, I'm not sure what setting ILAENV = 0 for ISPEC = 11 means.

@staticfloat
Copy link
Member

Scratch that, I was just too impatient. I'm running into errors involving zgesdd_ missing in libblas, but I'm not convinced I'm doing everything right yet. Story developing....

@ViralBShah
Copy link
Member Author

zgesdd is not present in old lapacks like Apple's. You need to change build_h.jl manually and replace libblas with liblapack in order to use your compiled lapack.

@ViralBShah
Copy link
Member Author

The above commit should help try this out easily.

@ViralBShah
Copy link
Member Author

This is the failure I get.

$ make test-linalg
    JULIA test/linalg
     * linalg
Warning: Possible conflict in library symbol spotrf_
exception on 1: ERROR: PosDefException(9)
 in cholfact! at linalg/factorization.jl:13
 in cholfact at linalg/factorization.jl:18
 in anonymous at no file:11
 in runtests at /Users/viral/julia-systemblas/test/testdefs.jl:5
 in anonymous at multi.jl:493
 in run_work_thunk at multi.jl:457
 in anonymous at task.jl:59
at linalg.jl:132

ERROR: PosDefException(9)
 in cholfact! at linalg/factorization.jl:13
 in cholfact at linalg/factorization.jl:18
 in anonymous at no file:11
 in runtests at /Users/viral/julia-systemblas/test/testdefs.jl:5
 in anonymous at multi.jl:493
 in run_work_thunk at multi.jl:457
 in anonymous at task.jl:59
at linalg.jl:132
at /Users/viral/julia-systemblas/test/runtests.jl:18

make[1]: *** [linalg] Error 1
make: *** [test-linalg] Error 2

@ViralBShah
Copy link
Member Author

Here is how to reproduce this easily:

julia> a = rand(10,10)
10x10 Float64 Array:
 0.98917   0.614096  0.167126   …  0.756016   0.0528819  0.000520931
 0.362091  0.972038  0.556411      0.331961   0.350804   0.457314   
 0.450018  0.221509  0.0323143     0.756397   0.0151316  0.454139   
 0.116847  0.370899  0.434958      0.177969   0.97688    0.469024   
 0.670582  0.988428  0.664748      0.63442    0.319027   0.0489275  
 0.520839  0.369563  0.845793   …  0.752077   0.0793401  0.7665     
 0.66043   0.403248  0.028529      0.646566   0.681145   0.0503269  
 0.482224  0.45909   0.977687      0.978137   0.802664   0.413535   
 0.713525  0.357882  0.589802      0.594471   0.41824    0.390158   
 0.559475  0.936918  0.298123      0.0794276  0.326732   0.620815   

julia> cholfact(a'*a)
Cholesky{Float64}(10x10 Float64 Array:
 1.88083  1.71466  1.27411   1.55326   …   1.81497     0.996198    0.912195
 0.0      1.03024  0.531615  0.291271     -0.18402     0.460702    0.406083
 0.0      0.0      1.08281   0.37497       0.558875    0.501188    0.551226
 0.0      0.0      0.0       0.467606      0.118399   -0.276056   -0.194191
 0.0      0.0      0.0       0.0           0.0618378   0.441066    0.465689
 0.0      0.0      0.0       0.0       …  -0.271411   -0.227609   -0.223965
 0.0      0.0      0.0       0.0          -0.180328   -0.200068   -0.273415
 0.0      0.0      0.0       0.0           0.473103    0.0170495  -0.211094
 0.0      0.0      0.0       0.0           0.0         0.871579   -0.343059
 0.0      0.0      0.0       0.0           0.0         0.0         0.306369,'U')

julia> cholfact(float32(a'*a))
ERROR: PosDefException(3)
 in cholfact! at linalg/factorization.jl:13
 in cholfact at linalg/factorization.jl:18

Cc: @andreasnoackjensen

@staticfloat
Copy link
Member

I'm getting the same thing now, and the angle I'm attacking is Warning: Possible conflict in library symbol spotrf_.

I've figured out that linking to vecLib in the way we do brings in vecLib's LAPACK. I'm experimenting with linking against -lBLAS, only, which means sprinkling some lines in deps/Makefile with $(LIBBLAS) $(LIBLAPACK) instead of just $(LIBBLAS).

@ViralBShah
Copy link
Member Author

Seems like a good plan. Perhaps we should write an independent test program and see if spotrf barfs outside of julia in this configuration.

@staticfloat
Copy link
Member

Update: You need to clean LAPACK as well, because it was linking against vecLib which itself links against Apple's libLAPACK. I'm recompiling now, hopefully this will eradicate the issues we were having.

@staticfloat
Copy link
Member

Alright. I've managed to track down some troubling news; vecLib/libBLAS.dylib contains symbols that are also in our liblapack.dylib. Specifically, we can see that when running your test code above, (and getting the spotrf_ warning) using vmmap to inspect the current julia process gives us:

$ LIBLIST=$(vmmap julia | awk '{ print $7;}' | grep -i dylib | sort | uniq)
$ for f in $LIBLIST; do if [[ ! -z $(nm $f | grep -i spotrf) ]]; then echo $f; fi; done
/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
/Users/sabae/src/julia/usr/lib/liblapack.dylib

I'm completely unfamiliar with where the boundary between BLAS and LAPACK should be drawn; if I google spotrf, the first hit signifies to me that this would be included in LAPACK, so having it exported in libBLAS.dylib makes me a little uncomfortable.

Investigating further, it seems that Apple agreed with this sentiment after OSX 10.6: we can see that the symbol has been "moved" from libBLAS.dylib to libLAPACK.dylib as we might expect:

$ nm libBLAS.dylib | grep spotrf
0000000000010c05 T $ld$hide$os10.7$_spotrf
0000000000010c05 T $ld$hide$os10.8$_spotrf
000000000000746e T _spotrf
$ nm libLAPACK.dylib | grep spotrf
00000000000010c8 T $ld$hide$os10.4$_spotrf
00000000000010c8 T $ld$hide$os10.5$_spotrf
00000000000010c8 T $ld$hide$os10.6$_spotrf
000000000000765b T _spotrf

I have an SO question open as I have no idea how to inform the compiler that we want to link as if we're on 10.8 and to hide spotrf from us inside libBLAS.dylib.

@nolta
Copy link
Member

nolta commented Jun 19, 2013

Have you tried changing -mmacosx-version-min=10.6 in Make.inc?

@staticfloat
Copy link
Member

I just tried changing that to 10.8, but it doesn't seem to have changed
anything.

@dancasimiro
Copy link
Contributor

I don't know if this changed, but I think that you need to change the base
SDK. The hidden symbols are a feature of weak linking. Here is some info
that I found from [1]:
Build settings in Makefiles

In a makefile-based project (often based on the Autotools), SDK build
settings occur via environment variables
[2]http://wiki.herzbube.ch/index.php/Mac_OS_X_Programming#cite_note-sdk-compat-guide-1
:

  • Using SDKs requires GCC 4.0 or later
  • To choose an SDK, specify the full path to the desired SDK directory
    for the following options:
    • Use the -isysroot option with the compiler. This is typically
      specified using the environment variable CPPFLAGS.
    • Use the -syslibroot option with the linker. This is typically
      specified using the environment variable LDFLAGS.
  • Set the deployment target as follows:
    • Use the environment variable MACOSX_DEPLOYMENT_TARGET for Mac OS X
      builds
    • Use the environment variable IPHONEOS_DEPLOYMENT_TARGET for iOS
      builds

[1] http://wiki.herzbube.ch/index.php/Mac_OS_X_Programming
[2]
http://developer.apple.com/library/ios/documentation/DeveloperTools/Conceptual/cross_development/Configuring/configuring.html

On Tuesday, June 18, 2013, Elliot Saba wrote:

I just tried changing that to 10.8, but it doesn't seem to have changed
anything.


Reply to this email directly or view it on GitHubhttps://github.com//issues/3365#issuecomment-19654089
.

~Dan

@staticfloat
Copy link
Member

Hmmm. This is definitely worth looking into. Thanks for the info! I tried setting MACOSX_DEPLOYMENT_TARGET, but I'm not sure I did it right. :P

@staticfloat
Copy link
Member

I've tried everything listed on that page, and unless I've done something very wrong, I don't think it's working. I'm not entirely sure why that is, other than the fact that most of the weak linking documentation I've found has been for objective-c, and not C/C++, so perhaps the documentation just isn't there for this language. I'm giving up on this for now, pending an answer on SO, and just hoping that we can resolve the OpenBLAS issues quickly.

@ViralBShah
Copy link
Member Author

@vtjnash Do you have any pointers that may help fix this issue?

@ViralBShah
Copy link
Member Author

This is a useful read on the topic, but did not help resolve the issue:

http://www.cs.umd.edu/Library/TRs/CS-TR-4585/CS-TR-4585.pdf

@vtjnash
Copy link
Member

vtjnash commented Jun 24, 2013

While interesting, this line of work is unrelated to the source of the warning, the julia codegen, a currently has no adverse affects -- it is there to suggest that static compilation may have different results. Jeff / Keno / I have discussed a solution to remove it entirely.

There are also more interesting failures that avoid the warnings entirely:

julia> eigvals(a'*a)'
1x10 Float64 Array:
 0.0282608  0.0475024  0.211544  0.361698  0.419391  0.760586  1.35504  1.681  2.73861  26.6356

julia> eigvals(float32(a'*a))'
1x10 Float32 Array:
 -3.68935f19  -129505.0f0  -50746.7f0  -1383.94f0  -53.4826f0  4.07395f0  316.834f0  10145.9f0  23074.5f0  3.68935f19

@vtjnash
Copy link
Member

vtjnash commented Jun 24, 2013

Compiling lapack with -ff2c fixes some apparent calling convention differences and gets LAPACK to successfully pass all of its builtin tests when compiled in this manner (and the julia tests mostly -- I haven't completed the analysis and correct compilation of everything)

(edit: note that the easiest way to do this is to edit lapack/make.inc and add -ff2c to all 5 lines OPTS=, make cleanall && make)

@vtjnash
Copy link
Member

vtjnash commented Jun 24, 2013

For more specific information on the incompatibilities,and another way to fix it see http://savannah.gnu.org/file/blaswrap.c?file_id=22779 (LGPL)

@ViralBShah
Copy link
Member Author

@vtjnash This is really good detective work! It would be nice to go with the -ff2c approach if it works rather than introducing yet another wrapper. Perhaps we should be using the CBLAS and LAPACKe interfaces rather than the fortran interfaces.

@ViralBShah
Copy link
Member Author

Of course, the -ff2c approach cannot be used for Apple BLAS.

@ViralBShah
Copy link
Member Author

Useful discussion here: http://www.macresearch.org/lapackblas-fortran-106

@StefanKarpinski
Copy link
Member

Another alternative approach would be to implement fcall (see #2167) and have it not just be a simple wrapper around ccall but rather correctly implement the Fortran ABI.

@vtjnash
Copy link
Member

vtjnash commented Jun 24, 2013

@StefanKarpinski the main problem is that gcc decided to change the fortran ABI, rather than maintaining binary compatibility with everyone else. Implementing fcall is not a true fix for gcc/gfortran's behavior.

@StefanKarpinski
Copy link
Member

Ah, I see. Bad gcc/gfortran.

@ViralBShah
Copy link
Member Author

In theory, fcall could be smart enough to work with gcc/gfortran and the other fortran compilers that do the standard stuff. Even though gcc made that decision, gfortran is pretty widely used for us to support its ABI. I guess all this can be deferred until they still support -ff2c.

@vtjnash
Copy link
Member

vtjnash commented Jun 25, 2013

Completed interface file can be downloaded here: https://gist.github.com/vtjnash/5855643
Instructions for basic usage & testing within the context of LAPACK:

  1. download file to lapack-3.4.2 folder, name it blaswrap.c
  2. run clang -fPIC blaswrap.c -o blaswrap.o -DUSE_BLASWRAP -c
  3. run echo BLASLIB = ../../blaswrap.o -framework vecLib -lBLAS >> make.inc
  4. make cleanall && make

Note that all we need to do now is make sure that we use the symbols in our (statically-compiled) blaswrap.o file before libBLAS. I have it working (read: passing julia and lapack tests) on my machine, but I need to clean it up before I can commit anything.

@ViralBShah
Copy link
Member Author

Great. Looking forward to the commit.

vtjnash added a commit that referenced this issue Jun 26, 2013
vtjnash added a commit that referenced this issue Jun 27, 2013
@ViralBShah
Copy link
Member Author

I just did a distclean and noticed that libgfortblas.dylib did not get cleaned. What other rules should trigger cleaning it?

@vtjnash
Copy link
Member

vtjnash commented Jun 27, 2013

I only added just enough rules to get it to build. You're welcome to add the standard set of rules if you think they would be helpful. make clean-lapack could also be made to trigger this, since it is a tiny dependency anyways.

@staticfloat
Copy link
Member

This is some neat work, Jameson. I tried testing it out on my mac mini, and everything is fine except for ARPACK. When I run the tests, ARPACK fails:

From worker 4:       * arpack
exception on 4: ERROR: ARPACKException(-8)
 in aupd_wrapper at linalg/arpack.jl:48
 in eigs at linalg/arnoldi.jl:17
 in anonymous at arpack.jl:11
 in runtests at /Users/sabae/src/julia/test/testdefs.jl:5
 in anonymous at multi.jl:762
 in run_work_thunk at multi.jl:503
 in anonymous at task.jl:59
at arpack.jl:20

I'm not doing anything special, I'm just compiling with the following Make.user, and expecting everything to work out:

override USE_SYSTEM_BLAS = 1
override USE_SYSTEM_LAPACK = 0
override USE_BLAS64 = 0
override USE_QUIET = 0

I did a make -C deps distclean-{arpack,lapack,openblas,suitesparse}, so I don't think it's a problem with any of those packages getting confused by switching BLAS implementations.

@ViralBShah ViralBShah reopened this Jun 27, 2013
@ViralBShah
Copy link
Member Author

My bad - ignore the previous comment. I now get the arpack failure as well. Looking into it.

@ViralBShah
Copy link
Member Author

The ARPACK exception decodes to an LAPACK error.

c          = -8: Error return from LAPACK eigenvalue calculation;

@ViralBShah
Copy link
Member Author

The failures occur in the Float32 and Complex64 tests in test/arpack.jl. It does not fail for Float64 and Complex128.

@ViralBShah
Copy link
Member Author

@vtjnash Do we still need to build all our fortran libraries with -ff2c?

@vtjnash
Copy link
Member

vtjnash commented Jun 27, 2013

No, I just made a stupid error and Mac's case-insenstive file system covered up for me.

@staticfloat
Copy link
Member

After a make -C deps distclean-arpack, it's fixed for me! Thanks @vtjnash!

@vtjnash
Copy link
Member

vtjnash commented Jun 28, 2013

That's good. For everyone else, I suspect rm -r deps/arpack-ng-3.1.3 deps/lapack-3.4.2/liblapack.dylib usr/lib/libspqr.dylib may be necessary to apply that patch.

@ViralBShah
Copy link
Member Author

Works for me too - but I had to also delete deps/libgfortblas.dylib. Thanks.

@ViralBShah
Copy link
Member Author

It is amazing to have this flexibility on the mac now!

@ViralBShah
Copy link
Member Author

@vtjnash Will gfortblas also likely solve some of the other issues we are having calling 32-bit BLAS on 64-bit on both OS X and linux such as #1804 and #3500 ?

@vtjnash
Copy link
Member

vtjnash commented Jun 29, 2013

gfortblas is a wrapper around cblas to present a gfortran interface to the blas library. this isn't an issue if everything was compiled with gfortran anyways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
building Build system, or building Julia or its dependencies system:mac Affects only macOS
Projects
None yet
Development

No branches or pull requests

6 participants