Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set OPENBLAS_DIR in openblas.yaml when build dependencies #709

Merged
merged 1 commit into from
Apr 1, 2015
Merged

set OPENBLAS_DIR in openblas.yaml when build dependencies #709

merged 1 commit into from
Apr 1, 2015

Conversation

cmaurini
Copy link
Contributor

@cmaurini cmaurini commented Apr 1, 2015

I found that the numpy package is not able to find openblas at compile time, because it is looking in $OPENBLAS_DIR/lib and OPENBLAS_DIR is not set in openblas.yaml. I think that this setting should be unable. At least with this change numpy finds openblas for me.

I found that the numpy package is not able to find openblas at compile time, because it is looking in `$OPENBLAS_DIR/lib` and `OPENBLAS_DIR` is not set in `openblas.yaml `. I think that this setting should be unable. At least with this change numpy finds openblas for me.
@ahmadia
Copy link
Contributor

ahmadia commented Apr 1, 2015

Hi @cmaurini. This looks good and I'm going to merge it. I just want to note that the reason this is happening is that we use openblas to fill the role of the blas interface, so that BLAS_DIR may be set. There's no risk in ensuring that this environment variable is always set, no matter where openblas is being used.

ahmadia added a commit that referenced this pull request Apr 1, 2015
set OPENBLAS_DIR in openblas.yaml when build dependencies
@ahmadia ahmadia merged commit ebc69c9 into hashdist:master Apr 1, 2015
@ahmadia
Copy link
Contributor

ahmadia commented Apr 1, 2015

Thanks for the PR.

@certik
Copy link
Member

certik commented Apr 1, 2015

This fixes the problem at hand, since numpy indeed uses library_dirs = ${OPENBLAS_DIR}/lib:

library_dirs = ${OPENBLAS_DIR}/lib

But this is not the right solution in my opinion. Numpy should not use openblas like this, it should use BLAS_DIR and LAPACK_DIR, and the user should specify in the profile which one to use. If more variables are needed (which might be the case), we can have more variables, but those should be for a general BLAS and LAPACK. The actual setting which implementation should be used should be done at one place, in the profile by the user. The numpy package neeeds to be fixed to work with any blas implementation that we provide.

cc @cekees, @dagss

@ahmadia
Copy link
Contributor

ahmadia commented Apr 1, 2015

@certik - I'm in semi-agreement. In an ideal world, all packages would produce and consume the BLAS interface. As it is, each vendor tends to do something just slightly different, which is enough to justify something like this.

The actual setting which implementation should be used should be done at one place, in the profile by the user.

This is still true. NumPy requires a blas, and the openblas environment variable is ONLY exported for the numpy build when openblas is a dependency. There isn't a danger of two blas implementations accidentally being used unless somebody modified the numpy dependency list.

@certik
Copy link
Member

certik commented Apr 1, 2015

I am positive that things can be fixed so that the numpy package has no reference to a particular blas and works with all of them.

Here are the relevant parts of our various blas and lapack implementations:
host-lapack:

when_build_dependency:
  - {set: 'LAPACK_LDFLAGS', value: '-llapack'}

host-blas:

when_build_dependency:
  - {set: 'BLAS_LDFLAGS', value: '-lblas'}
  - when platform == 'Cygwin':
    - {prepend_path: PATH, value: '/usr/lib/lapack'}

lapack:

when_build_dependency:
- {set: 'LAPACK_LDFLAGS', value: '-Wl,-rpath,${ARTIFACT}/lib -L${ARTIFACT}/lib -llapack'}
- {set: 'BLAS_LDFLAGS', value: '-Wl,-rpath,${ARTIFACT}/lib -L${ARTIFACT}/lib -lblas'}

openblas (before this PR):

when_build_dependency:
- {set: 'BLAS_LDFLAGS', value: '-Wl,-rpath,${ARTIFACT}/lib -L${ARTIFACT}/lib -lopenblas'}
- {set: 'LAPACK_LDFLAGS', value: '-Wl,-rpath,${ARTIFACT}/lib -L${ARTIFACT}/lib -lopenblas'}

Then the user sets a particular implementation (from above) in a profile, e.g.:

  blas:
    use: lapack
  lapack:
    use: lapack

or

  blas:
    use: openblas
  lapack:
    use: openblas

So the BLAS_LDFLAGS and LAPACK_LDFLAGS is our "Lapack interface" and numpy should just use it. However, since the numpy build requires more granular info, we need to extend our "Lapack interface". In particular, it seems we need at least (I'll use openblas as an example, other implementations are analogous):

- {set: 'BLAS_LDFLAGS', value: '-Wl,-rpath,${ARTIFACT}/lib -L${ARTIFACT}/lib'}
- {set: 'BLAS_LIBRARIES', value: '-lopenblas'}
- {set: 'LAPACK_LDFLAGS', value: '-Wl,-rpath,${ARTIFACT}/lib -L${ARTIFACT}/lib'}
- {set: 'LAPACK_LIBRARIES', value: '-lopenblas'}

This should allow the numpy package to do what it needs, e.g. this line:

 export LDFLAGS="-shared -Wl,-rpath=${PYTHON_DIR}/lib -Wl,-rpath=${OPENBLAS_DIR}/lib $(${PYTHON_DIR}/bin/python-config --ldflags)"

Needs to be modified to:

 export LDFLAGS="-shared -Wl,-rpath=${PYTHON_DIR}/lib ${LAPACK_LDFLAGS} $(${PYTHON_DIR}/bin/python-config --ldflags)"

This should fix everything, except this part:

bash: |
  cat > site.cfg << EOF
  [atlas]
  atlas_libs = openblas
  library_dirs = ${OPENBLAS_DIR}/lib
  EOF

which I currently don't know how to best fix --- this seems to be very specific to openblas, and if the user wants to use a different lapack implementation, it seems to me this works just by accident. @ahmadia or do you consider this to be a great design? I think this is extremely fragile.

One solution using my new "Lapack interface" from above is:

bash: |
  cat > site.cfg << EOF
  [atlas]
  atlas_libs = ${LAPACK_LIBRARIES}
  library_dirs = ${LAPACK_LDFLAGS}
  EOF

however, in the above ${LAPACK_LIBRARIES} would contain -lopenblas (for openblas) and it seems numpy requires just openblas. Maybe both would work. If not, we need to upgrade our "Lapack interface" even further, perhaps to:

- {set: 'BLAS_LDFLAGS', value: '-Wl,-rpath,${ARTIFACT}/lib -L${ARTIFACT}/lib'}
- {set: 'BLAS_LIBRARIES', value: '-lopenblas'}
- {set: 'BLAS_LIBRARIES_NAMES', value: 'openblas'}
- {set: 'LAPACK_LDFLAGS', value: '-Wl,-rpath,${ARTIFACT}/lib -L${ARTIFACT}/lib'}
- {set: 'LAPACK_LIBRARIES', value: '-lopenblas'}
- {set: 'LAPACK_LIBRARIES_NAMES', value: 'openblas'}

And one would use atlas_libs = ${LAPACK_LIBRARIES_NAMES}.

The point is to make our "Lapack interface" general enough, so that any package can use it and just work. We also better make sure our "Lapack interface" supports MKL. Here are sample options from
https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor:

 -L${MKLROOT}/lib/mic -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread -lmkl_blacs_intelmpi_ilp64 -lpthread -lm

 -DMKL_ILP64 -openmp -I${MKLROOT}/include -mmic

So this fits nicely with our interface above, but we also need to introduce LAPACK_INCLUDE=-I${MKLROOT}/include.

Note: -Wl,-rpath,${ARTIFACT}/lib -L${ARTIFACT}/lib seems Linux (and OS X?) specific. So maybe we can have an OS switch when we export these variables.

Finally, last possible problem that I can see is that we also want to support multiple compilers. I don't know if the above options work in all compilers on all platforms. If not, then we perhaps need to add some machinery to "hit" that would take a compiler independent specification of libraries, perhaps:

${ARTIFACT}/lib/openblas.so

and it would convert it to the proper compiler+platform specific way, e.g.:

-Wl,-rpath,${ARTIFACT}/lib -L${ARTIFACT}/lib -lopenblas

@cmaurini
Copy link
Contributor Author

cmaurini commented Apr 1, 2015

Thanks for looking at it. This is very useful. As a user I personally found in the past (and now) linking to the correct blas/lapack one of the most difficult think to compile on clusters. Having in hashdist the possibility to compile your own blas is very practical. At least it lets you compile things. However, I would like to be able to link against mkl when available.

Further, are you aware of this thread pointing out the dangers in the use of a threaded blas?
http://fenicsproject.org/pipermail/fenics/2015-March/002660.html. The conclusion is that one should compile by the default with

blas:
    use: openblas
    extra_flags: USE_THREAD=0

@certik
Copy link
Member

certik commented Apr 1, 2015

Definitely, the point is to allow to use any lapack implementation from Hashdist (all of them should just work), thus it is easy to use, as well as any installation already on the cluster (MKL, ...), which might be faster. The same with MPI, when you really have to use the one provided on the cluster, since it is configured properly --- the MPI in Hashstack works well, but doesn't know about the cluster layout, so it won't work to actually run jobs besides just the local node.

Good point --- I didn't know that if your application is multithreaded, it will conflict with the openblas multithreading. It seems one can also just sent OPENBLAS_NUM_THREADS=1 at runtime.

@cekees
Copy link
Contributor

cekees commented Apr 1, 2015

I just pushed two branches for supporting some new clusters as they related
to this issue, although they're not ready to merge.

As @ondrej mentioned, MKL is one thing thing that has to be addressed. It's
a pain because you have to use multiple libraries, typically sent to the
linker as a group and often coming from multiple directories. The current
approach is to write the required stuff into the numpy and scipy site.cfg
files, but the problem is that the way you do this is host blas/lapack
specific. The info currently isn't being obtained from the host--blas and
host-
-lapack packages directly. We have to do it inside numpy.yaml and
scipy.yaml. I think we're going to have to make the host package
definitions richer in order to pull out the platform specific stuff from
scipy.yaml and numpy.yaml.

https://github.com/hashdist/hashstack/tree/cekees/add_us_support
https://github.com/hashdist/hashstack/tree/cekees/add_spirit_support

On Wed, Apr 1, 2015 at 2:37 PM, Ondřej Čertík [email protected]
wrote:

Definitely, the point is to allow to use any lapack implementation from
Hashdist (all of them should just work), thus it is easy to use, as well as
any installation already on the cluster (MKL, ...), which might be faster.
The same with MPI, when you really have to use the one provided on the
cluster, since it is configured properly --- the MPI in Hashstack works
well, but doesn't know about the cluster layout, so it won't work to
actually run jobs besides just the local node.

Good point --- I didn't know that if your application is multithreaded, it
will conflict with the openblas multithreading. It seems one can also just
sent OPENBLAS_NUM_THREADS=1 at runtime.


Reply to this email directly or view it on GitHub
#709 (comment).

@dagss
Copy link
Member

dagss commented Apr 1, 2015

What me and Mark and Johannes worked on/discussed touches on a lot in this thread. Me and Mark had a spec ready on Thursday that may be a step forward, but I wanted to discuss it with Johannes first (we were seperated due to too much snow in Oslo) and I've been too swamped with other things to get it cleaned up and mail it out, I'll do it one of these days..

More concretely on MKL etc. etc. it really is a total mess -- like, if you link with MKL, you must also be careful to not use a specific set of FFTs from FFTW3 because MKL clobbers the namespace with its own FFTW3-API -- and at least last time I looked at it NumPy won't work unless there's specific MKL hacks to load the libraries with shared symbols rather than private symbols -- and so on. I've been bitten by this more than once, having my packages suddenly fail to work with Anaconda/EPD...

I'm sceptical about a generic BLAS_LDFLAGS and LAPACK_LDFLAGS (unless somebody already pulled it off and proves me wrong) -- there probably has to be special support in the NumPy build for every case.

I think ideally the NumPy package should take the BLAS to use as a parameter. But it doesn't hurt IMO that it supports 4 different BLASes as special cases in its configure scripts like it's always been, as long as it's all nicely exposed at the Hashdist level.

Multi-threading:

Some packages may want single-threaded openblas and other packages may want multi-threaded openblas, we may need to support both just like Julia and Numba likely needs different builds of LLVM. (BTW, @ahmadia and others who know, what's the best way to make sure a library plays nicely with optional threading support? Stick with OpenMP only? I wish for something a bit lower level and more standard that both OpenMP and other threading frameworks could use to avoid this problem, does that exist?...)

@dagss
Copy link
Member

dagss commented Apr 1, 2015

"I think ideally the NumPy package should take the BLAS to use as a parameter." That's perhaps not really possible in current hashdist/hashstack though and is the problem the spec from Thursday aims to solve. I'll try to get back to you tomorrow sorry for being in a rush.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants