-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build OpenBLAS with MAX_STACK_ALLOC=2048 #10780
Comments
We can easily do that. @xianyi Is it possible to make this the default in OpenBLAS? In the meanwhile, is there any downside to use this in the Julia build. If we end up doing this, we may also want to backport it. |
@hiccup7 A possible workaround for you may be to just roll out your own matrix-vector multiplication implementation in pure julia, writing the doubly nested for loop. In some cases, that could get you the performance you need on small problems - until we sort this out. |
@ViralBShah , I can enable this flag as default. |
Are these next steps?
|
Until then, we don't mind having this flag in the julia build. |
I found that this flag would cause a segfult with dgemv_t. I am working on Zhang Xianyi
|
Good to know, thanks for the warning. We should be able to upgrade to the latest tagged v0.2.14, though it looks like DESTDIR support may not work correctly with mingw prior to OpenMathLib/OpenBLAS@0ac787e? We recently switched to using DESTDIR for more dependencies with the "more atomic dependency install" change, but we might be handling this specially in |
@xianyi Can we have a 0.2.15 with this bugfix? Thanks. |
I see here that Julia v0.4 is estimated to be 2.3 months away: https://github.com/JuliaLang/julia/milestones |
Potentially. We don't always backport dependency version bumps, but the blas/lapack bindings haven't changed too dramatically on the Julia side between 0.3 and 0.4. So if things get tested on master for at least a few days, preferably a week or so without anyone seeing any problems, then it should be okay to backport an openblas upgrade. |
Note that max stack alloc cannot be used until the next release of OpenBLAS as discussed. After that, a few days of testing on master, and then 0.3.8. |
I recently hit bugs (segfaults) on haswell windows 64bit with MAX_STACK_ALLOC=2048 Openblas0-2-14 with inofficial numpy/scipy openblas builds. Increasing the stacksize didn't help. |
@carlkl , please try the develop branch. I just push the codes to fix this segfault bug. |
@xianyi , the development branch works now with MAX_STACK_ALLOC=2048 and the latest patches on haswell, thanks. |
@carlkl Do all julia tests pass with this version? That would be useful input for the next openblas release. |
I accidently commented on this repo. Usually I test numpy, scipy, but I can 2015-04-15 12:38 GMT+02:00 Viral B. Shah [email protected]:
|
@carlkl , I volunteer to do some Julia testing if someone kindly puts libopenblas.dll online. It would need to be a drop-in replacement for my current libopenblas.dll in Julia 0.3.7, where versioninfo() shows libopenblas as "USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell". |
latest openblas develop compiled with USE64BITINT: 2015-04-15 22:17 GMT+02:00 Eric [email protected]:
|
Thanks Carl, though I don't think that'll be a drop-in replacement for Julia since we also rename the symbols when using 64-bit ints via |
@hiccup7 (and anyone else who's got some spare disk space and cpu cycles) give http://julianightlies.s3.amazonaws.com/bin/winnt/x64/0.4/julia-0.4.0-9e5eda975c-win64.exe a try, checking |
Off-topic If there is an easy way to get a Windows VM and do stuff through vagrant, I don't mind becoming a slightly more regular windows tester. |
You can build one through packer without too much hassle (been a while since I last did it), see staticfloat/julia-vagrant#2 (comment) for instructions and a Vagrantfile with provisioning script |
There are also images available at http://modern.ie. Only 32-bit, but maybe On Thu, Apr 16, 2015 at 12:29 PM, Tony Kelman [email protected]
|
I eventually ran into #7942 but assuming all the linalg tests run as a block, WFM. |
@tkelman , are you sure you used the |
To clarify my last post, my test today with Julia v0.3.7 and the new libopenblas.dll file is about 4 times slower than with the released Julia v0.3.7. Good thing we tested before OpenBLAS did a release! |
Would also be good to compare to openblas from a standard 0.4-dev nightly which as of a few days ago will be openblas 0.2.14, to see whether the difference is between 0.2.13 vs 0.2.14, or 0.2.14 vs develop. |
Today, I downloaded a standard Julia v0.4-dev nightly, extracted libopenblas.dll v0.2.14, and used it with Julia v0.3.7 on Windows. It performs the same as the develop branch of OpenBLAS on the test from OpenMathLib/OpenBLAS#530, which is 15% faster than OpenBLAS v0.2.13 contained in original Julia v0.3.7. OpenBLAS v0.2.14 also performed the same as OpenBLAS v0.2.13 on the tests from OpenMathLib/OpenBLAS#532, which is 4 times faster than the develop branch of OpenBLAS. The first conclusion I get from these tests is that OpenBLAS v0.2.14 on the Julia master branch is the best-working so far. I recommend keeping MAX_STACK_ALLOC=2048 in the OpenBLAS build options for now. Using the small-vector test code from OpenMathLib/OpenBLAS#532, Julia allocates 2GB of memory with both v0.2.14 and the develop branch of OpenBLAS. Therefore, there is still a serious problem with GEMV. Also, the test in OpenMathLib/OpenBLAS#530 shows that develop branch of OpenBLAS is still 3 times slower than MKL BLAS, so it didn't fix the issue. @xianyi , we need some further diagnosis of the problems with the develop branch of OpenBLAS. |
We are not currently setting |
See OpenMathLib/OpenBLAS@6c3a0b5 To unravel the multiple OpenBLAS problems, I suggest merging #10783 now. This will resolve some of the GEMV problems. Once in the standard Julia v0.4-dev nightly, I can extract the libopenblas.dll and retest with Julia v0.3.7. My results will help the OpenBLAS team understand exactly which problems are in the |
Dot products and GEMV() dominate my CPU usage. Being 3 and 6 times slower than MKL BLAS, respectively, these OpenBLAS bugs dominate my speed experience of Julia compared to Python using MKL BLAS. Note that MKL BLAS is available in at least 3 Python binary distributions -- for free. I predict that solving these OpenBLAS bugs will cause wider adoption of Julia. |
Solving openblas bugs and improving it is a good thing in and of itself. "wider adoption of Julia" and comparisons to MKL are another matter entirely, and beyond the scope of the issue tracker. Do you have evidence that merging #10783 and building openblas 0.2.14 with |
The evidence: |
By evidence, I meant measured performance improvements. Data, numbers. Looking at the commit log, I see multiple changes that indicate MAX_STACK_ALLOC is likely to have bugs in the openblas code base as of 0.2.14. There's one way to find out, and we can always revert if this proves to be the case, but this really calls for more thorough evaluation IMO. |
If the Julia team provides a build, I volunteer to measure the performance (with the tests I did before). |
Okay, the commit a48cada is a rebased version of #10783 (so it builds openblas 0.2.14). Wait an hour or two for http://buildbot.e.ip.saba.us:8010/builders/package_win8.1-x64/builds/589 to finish, then check http://s3.amazonaws.com/julianightlies for a julia-0.4.0-a48cada4b0-win64.exe when it's done. |
see the latest comments on OpenMathLib/OpenBLAS#478 2015-04-20 19:44 GMT+02:00 Tony Kelman [email protected]:
|
Today, I downloaded julia-0.4.0-a48cada4b0-win64.exe, extracted libopenblas.dll v0.2.14, and used it with Julia v0.3.7 on Windows. The tests at OpenMathLib/OpenBLAS#532 performed the same as v0.2.14 built without MAX_STACK_ALLOC=2048. The test at OpenMathLib/OpenBLAS#530 performed 15% slower than v0.2.14 built without MAX_STACK_ALLOC=2048. Therefore, it seems best to avoid building OpenBLAS v0.2.14 with MAX_STACK_ALLOC=2048. Since one change has been committed and there is another change waiting to be committed to the OpenBLAS develop branch in the last day, how about if we wait for @xianyi to let us know when we should test the OpenBLAS develop branch next? |
Yes I think that would be the most appropriate thing to do here. I'd like to tag Julia 0.3.8 within the next few days, after just a few more small backports. This can wait, but we appreciate the benchmarking of openblas while they're working on their next release. |
Building OpenBLAS with MAX_STACK_ALLOC=2048 seems necessary to get proper multi-threading performance from gemv() and ger() functions. See OpenMathLib/OpenBLAS#532
Options I see:
a) Negotiate with OpenBLAS project to default to MAX_STACK_ALLOC=2048 if MAX_STACK_ALLOC is unspecified in the make options.
b) Update the Julia build environment to include MAX_STACK_ALLOC=2048 for OpenBLAS.
The text was updated successfully, but these errors were encountered: