Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests failing with USE_MKL #3902

Closed
ViralBShah opened this issue Aug 1, 2013 · 32 comments
Closed

Tests failing with USE_MKL #3902

ViralBShah opened this issue Aug 1, 2013 · 32 comments
Labels
bug Indicates an unexpected problem or unintended behavior
Milestone

Comments

@ViralBShah
Copy link
Member

The arpack test crashes when julia is built with USE_MKL. This needs to be verified again and fixed.

@staticfloat
Copy link
Member

Confirmed. ARPACK worker silently dies, amidst a sea of library symbol conflict warnings. I won't be able to look into this further tonight, however.

@ViralBShah
Copy link
Member Author

Cc: @nolta

@ViralBShah
Copy link
Member Author

This is what I get with runtests("arpack")

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff1524450 in mkl_blas_cdotc ()
   from /home/intel/composer_xe_2013.2.146/mkl/lib/intel64/libmkl_intel_thread.so
(gdb) bt
#0  0x00007ffff1524450 in mkl_blas_cdotc ()
   from /home/intel/composer_xe_2013.2.146/mkl/lib/intel64/libmkl_intel_thread.so
#1  0x00007ffff0dd9e27 in cdotc_ ()
   from /home/intel/composer_xe_2013.2.146/mkl/lib/intel64/libmkl_intel_ilp64.so
#2  0x00007ffff4e8399a in cdotc_ ()
   from /home/intel/composer_xe_2013.2.146/mkl/lib/intel64/libmkl_rt.so
#3  0x00007fffee9c1e20 in cneupd_ () from /home/viral/julia-mkl/usr/bin/../lib/libarpack.so

@nolta
Copy link
Member

nolta commented Aug 1, 2013

My MKL machine is down at the moment, but the last time i tried the tests passed.

Does setting USE_BLAS64=0 fix it?

@staticfloat
Copy link
Member

It doesn't for me. I distcleaned arpack/suite sparse and made without
blas64. Perhaps this is something particular to julia.mit.edu?
On Aug 1, 2013 11:20 AM, "Mike Nolta" [email protected] wrote:

My MKL machine is down at the moment, but the last time i tried the tests
passed.

Does setting USE_BLAS64=0 fix it?


Reply to this email directly or view it on GitHubhttps://github.com//issues/3902#issuecomment-21958345
.

@ViralBShah
Copy link
Member Author

This did work for me until recently too - so it is certainly a recent issue. We should certainly make it work with USE_BLAS64 too while we fix this.

@Keno
Copy link
Member

Keno commented Aug 26, 2013

I managed to compile arpack with ifort which makes the segfault go away so I assume that's the problem here. We should see if there are any ABI incompatibilities between gfortran and ifort.

@Keno
Copy link
Member

Keno commented Aug 26, 2013

@vtjnash Perhaps we need something similar to gfortblas here? Can you comment?

@vtjnash
Copy link
Member

vtjnash commented Aug 26, 2013

@loladiro is correct. ifort uses the standard g77/f2c calling convention rather than the more c-like calling convention that gfortran decided to introduce (arbitrarily, IMHO) much later.

maybe we should just "fix" our gfortran compiler on mac by passing "-f2c", and delete the gfortblas wrapper. note however, that this would require changing many of the function signatures in base.

@Keno
Copy link
Member

Keno commented Aug 27, 2013

Maybe push this issue to 0.3 then and try to implement #2167?

@ViralBShah
Copy link
Member Author

Passing the f2c option seems like the right thing to do. I think we should do it in 0.2 since mkl is important for performance for a lot of people.

@ViralBShah
Copy link
Member Author

fcall would be nice but that is certainly for the next release.

@vtjnash
Copy link
Member

vtjnash commented Aug 27, 2013

What @loladiro and I am trying to say is that those two issues aren't separable. We can't reasonably/easily switch to -f2c until we implement fcall.

@vtjnash
Copy link
Member

vtjnash commented Aug 27, 2013

For 0.2, I think the message needs to be that we don't support MKL (it's mentioned in the ifort manual that you can't link gfortran and ifort, and julia = gfortran for all intents and purposes)

@ViralBShah
Copy link
Member Author

Ok. Got it. I thought that they were two separate things - as in one was a quick fix and the other one longer term.

@Keno
Copy link
Member

Keno commented Jan 24, 2014

This need fcall which will happen as part of the ccall rework for 0.4

@ViralBShah
Copy link
Member Author

MKL has a version compiled with gfortran, but we are unable to successfully link to it and use it.

@ViralBShah
Copy link
Member Author

Note that fcall may not fix this. The issue here is different calling conventions between gfortran compiled ARPACK in the julia build calling MKL compiled with IFC.

@morrisonlevi
Copy link

This may or may not be helpful: https://software.intel.com/en-us/articles/how-to-resolve-arpack-issues-with-intel-mkl-110-update-3

I hit this same problem while building Julia this last month.

@vtjnash
Copy link
Member

vtjnash commented Jun 26, 2014

@ViralBShah fcall would help address this, because we could then build arpack with -f2c, and switch everything over to using the f2c calling convention

@ViralBShah
Copy link
Member Author

It may be simpler to just build everything with Intel compilers when using MKL. The MKL does ship with gfortran linked libs, but it didn't work the last I tried, which should have fixed this.

Did that blog post resolve the issue?

@ViralBShah
Copy link
Member Author

@vtjnash Does Intel use the f2c calling convention? In that case, yes, fcall would do it.

@vtjnash
Copy link
Member

vtjnash commented Jun 26, 2014

It is my understanding currently that they are the same. Of course, if you have a license to MKL, you're more likely to have a license to the intel compiler too.

@morrisonlevi
Copy link

I am willing to try using Intel compilers if you can tell me how I should do it.

@tkelman
Copy link
Contributor

tkelman commented Jun 28, 2014

Theoretically would like things to work using just the environment variables that were tried as part of #6917. We probably need additional rpath flags in order to get LLVM to link properly with Intel compilers.

@ViralBShah
Copy link
Member Author

Try make CC=icc CXX=icpc FC=ifort USE_MKL=1.

@morrisonlevi
Copy link

I have built Julia using Intel compilers and am getting this problem when running make test:

    From worker 8:       * math
exception on 8: ERROR: assertion failed: |sinpi(convert(T,x))::T - convert(T,sin(pi * x))| <= 9.536743e-7
  sinpi(convert(T,x))::T = -0.8090169
  convert(T,sin(pi * x)) = -1.0
  difference = 0.19098312 > 9.536743e-7
 in error at error.jl:22
 in test_approx_eq at test.jl:109
 in anonymous at no file:19
 in runtests at /apps/src/julia/src/216ecacc39/test/testdefs.jl:5
 in anonymous at multi.jl:847
 in run_work_thunk at multi.jl:613
 in anonymous at task.jl:847
while loading math.jl, in expression starting on line 8

After which tests seem to stall and not run anymore.

Here's what I used:

$ icc --version
icc (ICC) 14.0.2 20140120
Copyright (C) 1985-2014 Intel Corporation.  All rights reserved.
$ make -j 12 CC=icc CXX=icpc FC=ifort USE_MKL=1 USE_INTEL_IMF=1 \
       CFLAGS="-O3 -xhost -fp-model strict" \
       CXXFLAGS="O3 -xhost -fp-model strict" \
       FFLAGS="-O3 -xhost -fp-model strict" 

@tkelman
Copy link
Contributor

tkelman commented Jun 30, 2014

That looks a little bit like JuliaMath/openlibm#57, but with the difference being intel compiler rather than i486 arch. Openlibm has some problems, apparently.

I don't see any signs of USE_INTEL_IMF anywhere in the Julia codebase at the moment. I believe that should be USE_INTEL_LIBM now.

@morrisonlevi
Copy link

Hmm. I changed USE_INTEL_IMF to USE_INTEL_LIBM and get these error now:

    From worker 8:       * math
/apps/src/julia/src/216ecacc39/usr/bin/./julia: symbol lookup error: /apps/src/julia/src/216ecacc39/usr/bin/../lib/libopenspecfun.so: undefined symbol: complex_
Worker 8 terminated.
exception on 5: ERROR: ccall: could not find function __ieee754_rem_pio2 in library libimf
 in read at ./iobuffer.jl:74
 in read at stream.jl:691
 in anonymous at task.jl:829
while loading linalg4.jl, in expression starting on line 235
ERROR: ccall: could not find function __ieee754_rem_pio2 in library libimf
 in anonymous at task.jl:1350
while loading linalg4.jl, in expression starting on line 235
while loading /apps/src/julia/src/216ecacc39/test/runtests.jl, in expression starting on line 46

make[1]: *** [all] Error 1

@tkelman
Copy link
Contributor

tkelman commented Jun 30, 2014

(just checking) was that trying to re-use a previously built libopenspecfun? If so, may need make -C deps distclean-openspecfun.

The issue with __ieee754_rem_pio2 is already open as #5365, apparently it's as much of a problem with system libm as with Intel's.

@ViralBShah
Copy link
Member Author

As part of #7547, when compiling everything with intel compilers, everything works fine. I am satisfied enough with this and will close this issue when the icc PR is merged.

@ViralBShah
Copy link
Member Author

Fixed in #7547. Resolution is to build with Intel compilers when using MKL.

@ViralBShah ViralBShah modified the milestones: 0.3, 0.4 Jul 10, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

7 participants