Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different parca output on Sherlock vs local #931

Open
tahorst opened this issue Jul 24, 2020 · 38 comments
Open

Different parca output on Sherlock vs local #931

tahorst opened this issue Jul 24, 2020 · 38 comments
Labels

Comments

@tahorst
Copy link
Member

tahorst commented Jul 24, 2020

I noticed that Sherlock output with python3 does not match my local output. This starts with small floating point differences in nearly all expression related items in sim_data. Does anyone else see a difference in parca output on their local environment vs Sherlock? I was using python runscripts/debug/compareParca.py out/local out/sherlock to compare local output to Sherlock output that I copied locally. Just wondering if it's an installation difference or a machine difference.

@U8NWXD
Copy link
Member

U8NWXD commented Jul 24, 2020

I've found simData from runParca.py to differ between my local machine and google cloud machines, and between different google cloud machines (one was running Debian, the other Ubuntu). That was true even before the py3 migration

@1fish2
Copy link
Contributor

1fish2 commented Jul 24, 2020

Hypotheses:

  1. Different libraries installed, or compiled differently, e.g. Sherlock has an older version of OpenBLAS.
  2. Hardware- and OS-dependencies.
  3. Something like psuedo-random number seeding in our code that gets different results on different machines or even different runs.

@1fish2 1fish2 added the bug label Jul 24, 2020
@tahorst
Copy link
Member Author

tahorst commented Jul 24, 2020

Serial vs parallel results were the same locally and on Sherlock so it seems to be reproducible on a given machine, which is good, but not reproducible between. It is interesting that this showed up before py3 on google cloud but it was at least consistent on my machine vs sherlock before the switch. Certainly some good hypotheses Jerry!

@tahorst
Copy link
Member Author

tahorst commented Aug 3, 2020

I checked again and it looks like we might have consistent parca outputs (maybe it was a library issue with the recent update to Theano) but there is still a sim diff between the two. It looks like we've run into another OpenBLAS thread issue. I pushed a branch that does a dot product (arrays were pulled from a sim that was producing different results) and produces different results depending on the OPENBLAS_NUM_THREADS environment variable. I'm not sure if there's an easy way to turn this into a unit test since I don't think we can change the number of threads after loading numpy but this can at least show the problem.

Local machine:

$ cd test-openblas
$ for c in {1..12}; do OPENBLAS_NUM_THREADS=$c ./test.py; done
0.016683805584112754
0.016683805584112615
0.016683805584112702
0.016683805584112643
0.016683805584112643
0.016683805584112647
0.016683805584112532
0.016683805584112532
0.016683805584112647
0.016683805584112643
0.01668380558411267
0.016683805584112536

Sherlock with wcEcoli3 and 8 cpus:

$ for c in {1..12}; do OPENBLAS_NUM_THREADS=$c ./test.py; done
0.016683805584112754
0.016683805584112615
0.016683805584112702
0.016683805584112643
0.016683805584112643
0.016683805584112647
0.016683805584112532
0.016683805584112532
0.016683805584112532
0.016683805584112532
0.016683805584112532
0.016683805584112532

The old pyenv (wcEcoli2) on Sherlock gave consistent results:

$ for c in {1..12}; do OPENBLAS_NUM_THREADS=$c ./test.py; done
0.016683805584112754
0.016683805584112754
0.016683805584112754
0.016683805584112754
0.016683805584112754
0.016683805584112754
0.016683805584112754
0.016683805584112754
0.016683805584112754
0.016683805584112754
0.016683805584112754
0.016683805584112754

@1fish2
Copy link
Contributor

1fish2 commented Aug 3, 2020

Awesome! A compact, repeatable test case is usually the hardest step in debugging.

We could probably make a unit test that uses multiprocessing to exec a process with a different OPENBLAS_NUM_THREADS.

The environment variable NUM_THREADS supposedly affects compiling OpenBLAS by setting the max number of threads. Otherwise it tries to figure that out from the hardware it compiles on, which means we should be careful to not compile it on a Sherlock login node.

@1fish2
Copy link
Contributor

1fish2 commented Aug 5, 2020

I turned test-openblas into a unit test.

Here's the output on my Mac, running with Numpy's embedded OpenBLAS:

$ pytest wholecell/tests/utils/test_openblas_threads.py
...

_____________________________________________________________________ test_openblas _____________________________________________________________________

    def test_openblas():
    	"""Compare a dot product running with various numbers of OpenBLAS threads."""
    	...
>   	np.testing.assert_allclose(np_products, reference, rtol=1e-17)
E    AssertionError:
E    Not equal to tolerance rtol=1e-17, atol=0
E
E    Mismatched elements: 11 / 12 (91.7%)
E    Max absolute difference: 1.38777878e-16
E    Max relative difference: 8.31811887e-15
E     x: array([0.016684, 0.016684, 0.016684, 0.016684, 0.016684, 0.016684,
E           0.016684, 0.016684, 0.016684, 0.016684, 0.016684, 0.016684])
E     y: array([0.016684, 0.016684, 0.016684, 0.016684, 0.016684, 0.016684,
E           0.016684, 0.016684, 0.016684, 0.016684, 0.016684, 0.016684])

wholecell/tests/utils/test_openblas_threads.py:46: AssertionError
----------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------
THREADS               DOT PRODUCT                      DIFF
      1      0.016683805584112754                       0.0
      2      0.016683805584112615   -1.3877787807814457e-16
      3      0.016683805584112702    -5.204170427930421e-17
      4      0.016683805584112643   -1.1102230246251565e-16
      5      0.016683805584112643   -1.1102230246251565e-16
      6      0.016683805584112647   -1.0755285551056204e-16
      7      0.016683805584112647   -1.0755285551056204e-16
      8      0.016683805584112647   -1.0755285551056204e-16
      9      0.016683805584112647   -1.0755285551056204e-16
     10      0.016683805584112647   -1.0755285551056204e-16
     11      0.016683805584112647   -1.0755285551056204e-16
     12      0.016683805584112647   -1.0755285551056204e-16

@tahorst
Copy link
Member Author

tahorst commented Aug 5, 2020

Nice! I guess now we just need to check versions of openblas that would produce the right results and then we can add that unit test in. We could also check the number of cpus available and only run up to that number instead of running 12 each time.

@tahorst
Copy link
Member Author

tahorst commented Aug 5, 2020

I made a small update to the branch to include a test with OPENBLAS_NUM_THREADS being unset and only up to the number of available CPUs

@1fish2
Copy link
Contributor

1fish2 commented Aug 5, 2020

Cool.

I'm testing this in various environments, so far just on my Mac:

  • wcEcoli2,
  • wcEcoli3,
  • in Docker with NO_AVX2=1,
  • after rebuilding wcEcoli3 with numpy/scipy linked to the brew-installed OpenBLAS instead of their embedded version (This does not successfully install a working numpy! Symbol not found: ___addtf3. I don't know if that's somehow a consequence of recreating a python 2.7 environment recently or if it never worked. Various cleanup steps like deleting the pip cache haven't helped.)
  • ditto after compiling OpenBLAS from source code (didn't get there yet)

@U8NWXD were your tests on GCloud in a Docker Container? What was OPENBLAS_NUM_THREADS set to?

Q: Parallelizing vector dot presumably divides the two vectors into one chunk per thread and adds their portions of the dot product. Floating point rounding will thus vary with the number of chunks. How much variation should be acceptable? Where to set the np.testing.assert_allclose(rtol=?, atol=?)?

Some measurements:

[1] (wcEcoli3)
THREADS               DOT PRODUCT                      DIFF
      1      0.016683805584112754                       0.0
      2      0.016683805584112615   -1.3877787807814457e-16
      3      0.016683805584112702    -5.204170427930421e-17
      4      0.016683805584112643   -1.1102230246251565e-16
      5      0.016683805584112643   -1.1102230246251565e-16
      6      0.016683805584112647   -1.0755285551056204e-16
      7      0.016683805584112647   -1.0755285551056204e-16
      8      0.016683805584112647   -1.0755285551056204e-16
      9      0.016683805584112647   -1.0755285551056204e-16
     10      0.016683805584112647   -1.0755285551056204e-16
     11      0.016683805584112647   -1.0755285551056204e-16
     12      0.016683805584112647   -1.0755285551056204e-16

[2] Quitting Docker Desktop didn't change that at all, even though it might've freed up some CPUs.

[3] (wcEcoli2)
      1      0.016683805584112754      0.000000000000000000
      2      0.016683805584112615     -0.000000000000000139
      3      0.016683805584112702     -0.000000000000000052
      4      0.016683805584112643     -0.000000000000000111
      5      0.016683805584112643     -0.000000000000000111
      6      0.016683805584112647     -0.000000000000000108
      7      0.016683805584112647     -0.000000000000000108
      8      0.016683805584112647     -0.000000000000000108
      9      0.016683805584112647     -0.000000000000000108
     10      0.016683805584112647     -0.000000000000000108
     11      0.016683805584112647     -0.000000000000000108
     12      0.016683805584112647     -0.000000000000000108

[4] (wcEcoli5) on Python 3.8.5; I don't remember whether it has numpy & scipy installed from binary vs. source linked to brew openblas. I deleted it when trying to rebuild wcEcoli3 but will recreate it.
      1      0.016683805584112754      0.000000000000000000
      2      0.016683805584112615     -0.000000000000000139
      3      0.016683805584112702     -0.000000000000000052
      4      0.016683805584112643     -0.000000000000000111
      5      0.016683805584112643     -0.000000000000000111
      6      0.016683805584112647     -0.000000000000000108
      7      0.016683805584112647     -0.000000000000000108
      8      0.016683805584112647     -0.000000000000000108
      9      0.016683805584112647     -0.000000000000000108
     10      0.016683805584112647     -0.000000000000000108
     11      0.016683805584112647     -0.000000000000000108
     12      0.016683805584112647     -0.000000000000000108

[5] # In wcm Docker Image, with NO_AVX2=1

THREADS               DOT PRODUCT                      DIFF
      1      0.016683805584112754                       0.0
      2      0.016683805584112615   -1.3877787807814457e-16
      3      0.016683805584112702    -5.204170427930421e-17
      4      0.016683805584112643   -1.1102230246251565e-16
      5      0.016683805584112643   -1.1102230246251565e-16
      6      0.016683805584112702    -5.204170427930421e-17
      7      0.016683805584112702    -5.204170427930421e-17
      8      0.016683805584112702    -5.204170427930421e-17
      9      0.016683805584112702    -5.204170427930421e-17
     10      0.016683805584112702    -5.204170427930421e-17
     11      0.016683805584112702    -5.204170427930421e-17
     12      0.016683805584112702    -5.204170427930421e-17

[6] wcEcoli3 rebuilt with numpy, scipy no-binary, linked to brew-installed openblas

No success yet installing a working numpy this way.

[7] wcEcoli3 rebuilt with numpy, scipy no-binary, linked to manually-compiled openblas with NO_AVX2=1

TODO

@tahorst
Copy link
Member Author

tahorst commented Aug 5, 2020

Q: Parallelizing vector dot presumably divides the two vectors into one chunk per thread and adds their portions of the dot product. Floating point rounding will thus vary with the number of chunks. How much variation should be acceptable? Where to set the np.testing.assert_allclose(rtol=?, atol=?)?

Maybe this is just a problem that can't be solved if we allow parallelization of numpy if we'll always have issues with floating point rounding. I think the main issue arises from the stochastic nature of our sims. Any bit of floating point imprecision could lead to a slightly different random result which produces one molecule instead of another which can greatly amplify the difference (even if it's only off by 1e-17 but produces an essential protein that is needed vs some other protein). I think for that reason it would need to have 0 tolerance for the difference. wcEcoli2 on Sherlock produced the same results no matter how many threads were specified but that might just be because the OpenBLAS library was installed to only use 1 thread and not actually an issue with any of the other OpenBLAS versions tried. I've run with OPENBLAS_NUM_THREADS=1 for a long time which is why I could always reproduce the issues from Jenkins run on Sherlock with wcEcoli2.

@U8NWXD
Copy link
Member

U8NWXD commented Aug 5, 2020

@1fish2 I was not using Docker, and OPENBLAS_NUM_THREADS was not set. The number of threads available to openblas could have explained the different behavior

@1fish2
Copy link
Contributor

1fish2 commented Aug 6, 2020

Agreed. We might have to limit to OPENBLAS_NUM_THREADS=1. Is it sufficient to make that bold in the pyenv setup instructions and tell everyone? Print a warning if the value is different in the parca and sim Firetasks?

[6] numpy cannot load the brew-installed openblas on my Mac despite many tries to link them up. I don't know if that's specific to this Mac after messing with gcc@9 et al to reproduce the Python 2 runtime environment. It makes the Docker approach more attractive.

[7] wcEcoli5 on Python 3.8.5 rebuilt with numpy, scipy no-binary, linked to manually-compiled
openblas with "NO_AVX2=1" (and a table tweak)
THREADS               DOT PRODUCT        DIFF FROM 1 THREAD
      1      0.016683805584112754                       0.0
      2      0.016683805584112615   -1.3877787807814457e-16
      3      0.016683805584112702    -5.204170427930421e-17
      4      0.016683805584112643   -1.1102230246251565e-16
      5      0.016683805584112643   -1.1102230246251565e-16
      6      0.016683805584112702    -5.204170427930421e-17
      7      0.016683805584112702    -5.204170427930421e-17
      8      0.016683805584112702    -5.204170427930421e-17
      9      0.016683805584112702    -5.204170427930421e-17
     10      0.016683805584112702    -5.204170427930421e-17
     11      0.016683805584112702    -5.204170427930421e-17
     12      0.016683805584112702    -5.204170427930421e-17
             0.016683805584112702    -5.204170427930421e-17

[8] ditto but "NO_AVX2=0"
THREADS               DOT PRODUCT        DIFF FROM 1 THREAD
      1      0.016683805584112754                       0.0
      2      0.016683805584112615   -1.3877787807814457e-16
      3      0.016683805584112702    -5.204170427930421e-17
      4      0.016683805584112643   -1.1102230246251565e-16
      5      0.016683805584112643   -1.1102230246251565e-16
      6      0.016683805584112647   -1.0755285551056204e-16
      7      0.016683805584112647   -1.0755285551056204e-16
      8      0.016683805584112647   -1.0755285551056204e-16
      9      0.016683805584112647   -1.0755285551056204e-16
     10      0.016683805584112647   -1.0755285551056204e-16
     11      0.016683805584112647   -1.0755285551056204e-16
     12      0.016683805584112647   -1.0755285551056204e-16
             0.016683805584112647   -1.0755285551056204e-16

"NO_AVX2=1" produced identical results with 1-5 threads and slightly closer results with 6+ threads. What does that imply? It's not the wildly wrong results that made us set "NO_AVX2=1" for Docker for Mac. I guess using the vectorization hardware entails breaking the chunks into smaller vectorization subchunks.

(FWIW, 3 runs each at "NO_AVX2=1" and "NO_AVX2=0" took essentially the same amount of elapsed time, but this duration is probably dominated by process exec/quit, piping, and printing.)

@1fish2
Copy link
Contributor

1fish2 commented Aug 6, 2020

BTW the embedded openblas in numpy & scipy gets the same results in this test as the manually-compiled
openblas with "NO_AVX2=1" and in the same amount of time.

@1fish2
Copy link
Contributor

1fish2 commented Aug 6, 2020

[9] Results on Mac in pyenv wcEcoli2 (with tweaked floating point formatting):
THREADS                DOT PRODUCT         DIFF FROM 1 THREAD
      1       0.016683805584112754                          0
      2       0.016683805584112615    -1.3877787807814457e-16
      3       0.016683805584112702    -5.2041704279304213e-17
      4       0.016683805584112643    -1.1102230246251565e-16
      5       0.016683805584112643    -1.1102230246251565e-16
      6       0.016683805584112647    -1.0755285551056204e-16
      7       0.016683805584112647    -1.0755285551056204e-16
      8       0.016683805584112647    -1.0755285551056204e-16
      9       0.016683805584112647    -1.0755285551056204e-16
     10       0.016683805584112647    -1.0755285551056204e-16
     11       0.016683805584112647    -1.0755285551056204e-16
     12       0.016683805584112647    -1.0755285551056204e-16
              0.016683805584112647    -1.0755285551056204e-16
[10] on a 4-CPU Sherlock node in pyenv wcEcoli2 (ditto wcEcoli-paper):
THREADS                DOT PRODUCT         DIFF FROM 1 THREAD
      1       0.016683805584112754                          0
      2       0.016683805584112754                          0
      3       0.016683805584112754                          0
      4       0.016683805584112754                          0
              0.016683805584112754                          0
[11] in pyenv wcEcoli3:
THREADS                DOT PRODUCT         DIFF FROM 1 THREAD
      1       0.016683805584112754                          0
      2       0.016683805584112615    -1.3877787807814457e-16
      3       0.016683805584112702    -5.2041704279304213e-17
      4       0.016683805584112643    -1.1102230246251565e-16
              0.016683805584112643    -1.1102230246251565e-16

Hypotheses:

  • [Likely] A change in OpenBLAS versions, either a bug or they parallelized dot product.
  • A change from Numpy 1.14.6 to 1.19.0.
  • [Unlikely] A change from Python 2 to 3.

Q. Is this OpenBLAS THREADs difference the only cause of parca differences?

@tahorst did this test data come from a parca run?

@1fish2
Copy link
Contributor

1fish2 commented Aug 6, 2020

More tests on Mac
We can rule out a change to OpenBLAS, Numpy, or Python (except maybe combinations):

[12] Python 3.8.3, numpy==1.19.0, openblas-0.3.5:
THREADS                DOT PRODUCT         DIFF FROM 1 THREAD
      1       0.016683805584112754                          0
      2       0.016683805584112615    -1.3877787807814457e-16
      3       0.016683805584112702    -5.2041704279304213e-17
      4       0.016683805584112643    -1.1102230246251565e-16
      5       0.016683805584112643    -1.1102230246251565e-16
      6       0.016683805584112702    -5.2041704279304213e-17
      7       0.016683805584112702    -5.2041704279304213e-17
      8       0.016683805584112702    -5.2041704279304213e-17
      9       0.016683805584112702    -5.2041704279304213e-17
     10       0.016683805584112702    -5.2041704279304213e-17
     11       0.016683805584112702    -5.2041704279304213e-17
     12       0.016683805584112702    -5.2041704279304213e-17
              0.016683805584112702    -5.2041704279304213e-17
[13] Python 3.8.3, numpy==1.14.6, openblas-0.3.5:
THREADS                DOT PRODUCT         DIFF FROM 1 THREAD
      1       0.016683805584112754                          0
      2       0.016683805584112615    -1.3877787807814457e-16
      3       0.016683805584112702    -5.2041704279304213e-17
      4       0.016683805584112643    -1.1102230246251565e-16
      5       0.016683805584112643    -1.1102230246251565e-16
      6       0.016683805584112702    -5.2041704279304213e-17
      7       0.016683805584112702    -5.2041704279304213e-17
      8       0.016683805584112702    -5.2041704279304213e-17
      9       0.016683805584112702    -5.2041704279304213e-17
     10       0.016683805584112702    -5.2041704279304213e-17
     11       0.016683805584112702    -5.2041704279304213e-17
     12       0.016683805584112702    -5.2041704279304213e-17
              0.016683805584112702    -5.2041704279304213e-17
[14] Python 2.7.16, numpy==1.14.6, openblas-0.3.5:
THREADS                DOT PRODUCT         DIFF FROM 1 THREAD
      1       0.016683805584112754                          0
      2       0.016683805584112615    -1.3877787807814457e-16
      3       0.016683805584112702    -5.2041704279304213e-17
      4       0.016683805584112643    -1.1102230246251565e-16
      5       0.016683805584112643    -1.1102230246251565e-16
      6       0.016683805584112702    -5.2041704279304213e-17
      7       0.016683805584112702    -5.2041704279304213e-17
      8       0.016683805584112702    -5.2041704279304213e-17
      9       0.016683805584112702    -5.2041704279304213e-17
     10       0.016683805584112702    -5.2041704279304213e-17
     11       0.016683805584112702    -5.2041704279304213e-17
     12       0.016683805584112702    -5.2041704279304213e-17
              0.016683805584112702    -5.2041704279304213e-17

@tahorst are you getting similar results on your local machine?

Did it ever produce consistent results on macOS with varying OPENBLAS_NUM_THREADS?

Hypotheses left?

  • gcc 10?
  • OS?

@tahorst
Copy link
Member Author

tahorst commented Aug 6, 2020

Q. Is this OpenBLAS THREADs difference the only cause of parca differences? did this test data come from a parca run?

I first noticed a difference between Sherlock and local runs when I couldn't reproduce the Jenkins failures locally. I checked the parca output and it showed differences. Then there were some updates (newer Theano version and other changes) and it looked like parca output was the same but sim output differed so I don't think there are actually any parca differences any more (it could be worth it to confirm again with THREADS unset). This test data is actually from a sim calculating the mass update for a time step.

are you getting similar results on your local machine?

I get the same results for all local environments tested and it looks like it matches you up to your 5 threads:

Python 3.8.3, numpy==1.19.0, openblas default from numpy
Python 3.8.3, numpy==1.19.0, openblas-0.3.5
Python 2.7.16, numpy==1.14.6, openblas-0.3.5

THREADS                DOT PRODUCT         DIFF FROM 1 THREAD                                                                                  
      1       0.016683805584112754                          0                                                                                  
      2       0.016683805584112615    -1.3877787807814457e-16                               
      3       0.016683805584112702    -5.2041704279304213e-17                                      
      4       0.016683805584112643    -1.1102230246251565e-16                                                                                  
      5       0.016683805584112643    -1.1102230246251565e-16
      6       0.016683805584112647    -1.0755285551056204e-16                        
      7       0.016683805584112532    -2.2204460492503131e-16                         
      8       0.016683805584112532    -2.2204460492503131e-16                                
      9       0.016683805584112647    -1.0755285551056204e-16                              
     10       0.016683805584112643    -1.1102230246251565e-16                       
     11       0.016683805584112671    -8.3266726846886741e-17
     12       0.016683805584112536    -2.1857515797307769e-16                
              0.016683805584112536    -2.1857515797307769e-16

I'm thinking at this point we just make sure we set OPENBLAS_NUM_THREADS=1 (or install openblas with an option for only one thread). It seems like the OpenBLAS library linked on Sherlock for wcEcoli2 (0.3.9) is the only one that is independent from the threads specified. Did you test OpenBLAS version 0.3.9 in any other setups?

@tahorst
Copy link
Member Author

tahorst commented Aug 7, 2020

I got the unit test pass locally by installing a single thread version of OpenBLAS (instructions from Sherlock with USE_THREAD=0 added) and linking numpy to that version with pip install numpy==1.19.0 --no-binary numpy and proper .numpy-site.cfg paths:

git clone https://github.com/xianyi/OpenBLAS
cd OpenBLAS/
git checkout v0.3.9
USE_THREAD=0 make FC=gfortran
make PREFIX=/opt/OpenBLAS-0.3.9 install

I think we can either specify instructions to install OpenBLAS with a single thread or use OPENBLAS_NUM_THREADS=1 and that should fix this issue.

@1fish2
Copy link
Contributor

1fish2 commented Aug 19, 2020

Did you test OpenBLAS version 0.3.9 in any other setups?

Yes, and lately I've been using 0.3.10 on my Mac and in wcm-runtime/Dockerfile, including (IIRC) in the test runs above.

Initially we had to clone the OpenBLAS repo to get an unreleased bug fix. Now there are downloadable releases up to 0.3.10.

Travis found that building OpenBLAS with NO_AVX2=1 produces different results, so it's better to avoid that when possible, and building on Linux with NO_AVX2=0 inside a Dockerfile produces an Image that ran just fine on Mac! Let's test that with a Cloud Build and if it all works, edit cloud/docker/runtime/Dockerfile and related files.

@1fish2
Copy link
Contributor

1fish2 commented Aug 20, 2020

Testing a cloud-built wcm-code using the test-openblas branch plus these adjustments not checked in:

  • NO_AVX2=0
  • numpy==1.19.1 scipy==1.5.2 (I forgot what bug fixes led me to pick up these "dot-dot" updates)

(1) test_openblas_threads.py gets results consistent with pyenv wcEcoli2 on Sherlock:

THREADS                DOT PRODUCT         DIFF FROM 1 THREAD
      1       0.016683805584112754                          0
      2       0.016683805584112754                          0
      3       0.016683805584112754                          0
      4       0.016683805584112754                          0
      5       0.016683805584112754                          0
      6       0.016683805584112754                          0
              0.016683805584112754                          0

That further validates Travis' idea that we can build Docker Images with NO_AVX2=0 on Linux and still run on Mac, and it disproves the OpenBLAS team's hypothesis about where the AVX2 problem is.

(2) That's surprising since the macOS host outside Docker was not getting consistent results (see above) but -- whoa! -- it does now! I do not know change made that happen but testing with a Python 3.8.5 virtualenv that has numpy==1.19.0 shows it's not numpy update.

(3) Inside Docker-for-Mac without the --user $(id -u):$(id -g) option, $(id -u):$(id -g) is 0:0; in the host it's 501:20, and yet it creates files with the right host-side owner in a mapped directory! The --user option was a workaround we needed in Google Compute Engine, but maybe it's irrelevant in Docker Desktop.

@tahorst, does this happen in Docker-for-Windows?

(4) I tried keeping the openblas build directory, then running make tests in that directory inside the container. It doesn't get far:

touch libopenblas_haswellp-r0.3.10.a
make -j 6 -C test all
make[1]: Entering directory '/openblas/OpenBLAS-0.3.10/test'
gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64   -o sblat1 sblat1.o ../libopenblas_haswellp-r0.3.10.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/x86_64-linux-gnu/8 -L/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/8/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/8/../../..  -lc
gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64   -o dblat1 dblat1.o ../libopenblas_haswellp-r0.3.10.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/x86_64-linux-gnu/8 -L/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/8/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/8/../../..  -lc
gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64   -o cblat1 cblat1.o ../libopenblas_haswellp-r0.3.10.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/x86_64-linux-gnu/8 -L/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/8/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/8/../../..  -lc
gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64   -o zblat1 zblat1.o ../libopenblas_haswellp-r0.3.10.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/x86_64-linux-gnu/8 -L/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/8/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/8/../../..  -lc
gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64   -o sblat2 sblat2.o ../libopenblas_haswellp-r0.3.10.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/x86_64-linux-gnu/8 -L/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/8/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/8/../../..  -lc
gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64   -o dblat2 dblat2.o ../libopenblas_haswellp-r0.3.10.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/x86_64-linux-gnu/8 -L/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/8/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/8/../../..  -lc
///usrusrusr//binbin///binldld/::ld  :.. .././libopenblas_haswellp.libopenblas_haswellp-/-r0.3.10.alibopenblas_haswellp:r0.3.10.a -:filer0.3.10.a  :notfile   filerecognizednot : not/ recognized :file usrfilerecognized  /truncatedtruncated:bin

 /fileld :truncated
.collect2: error: ld returned 1 exit status
collect2: error: ld returned 1 exit status
collect2: error: ld returned 1 exit status
./libopenblas_haswellp-r0.3.10.a: file not recognized: file truncated
collect2: error: ld returned 1 exit status
...

(5) runParca gets nearly done then prints:

Fitting promoter binding
Killed

and exits with code 137.

==> I had to adjust Docker Desktop Preferences/Resources/Memory from 2GB to 4GB. (Would 3GB suffice?) I'm adding a note to the README.

(6) compareParca.py reports that the parca output from this Docker container is the same as with numpy 1.19.0 but has lots of numeric differences from Mac outside Docker.

@tahorst
Copy link
Member Author

tahorst commented Aug 20, 2020

(2) That's surprising since the macOS host outside Docker was not getting consistent results (see above) but -- whoa! -- it does now! I do not know change made that happen but testing with a Python 3.8.5 virtualenv that has numpy==1.19.0 shows it's not numpy update.

The results were inside a Docker container? Do you think there was only 1 CPU available when it was built in the cloud so when compiling OpenBLAS, the max number of threads was set to 1?

@tahorst, does this happen in Docker-for-Windows?

Docker-for-Windows actually runs through WSL2 which is actually a Linux kernel so unsurprisingly it gets the same behavior that we see with other Linux environments and writes as root if the --user option isn't supplied.

(4) I tried keeping the openblas build directory, then running make tests in that directory inside the container. It doesn't get far:

That's for the new image run on Mac? You still get good results when running the parca/sim though?

==> I had to adjust Docker Desktop Preferences/Resources/Memory from 2GB to 4GB. (Would 3GB suffice?) I'm adding a note to the README.

There's actually already a note about this in docs/README.md:

NOTE: If you encounter memory issues while using Docker Desktop (the default allocated memory is 2GB) causing the simulation to get killed midway, click the Docker icon > Preferences > Advanced > adjust memory to 4GB.

@1fish2
Copy link
Contributor

1fish2 commented Aug 21, 2020

(2) That's surprising since the macOS host outside Docker was not getting consistent results (see above) but -- whoa! -- it does now! I do not know change made that happen but testing with a Python 3.8.5 virtualenv that has numpy==1.19.0 shows it's not numpy update.

The results were inside a Docker container?

Results (1) were inside the container (where it thinks there are 6 CPUs -- default Docker allocation), while results (2) were outside the container (where it thinks there are 12 CPUs).

Do you think there was only 1 CPU available when it was built in the cloud so when compiling OpenBLAS, the max number of threads was set to 1?

Could be. Any idea how to tell by examining the /openblas/OpenBLAS-0.3.10 build directory or the libopenblas_haswell-r0.3.10.so file?

  1. I tried keeping the openblas build directory, then running make tests in that directory inside the container. It doesn't get far:

That's for the new image run on Mac?

Yes.

You still get good results when running the parca/sim though?

I just started 2 sim gens. We'll see.

There's actually already a note about this in docs/README.md:

Arg. I'll change it to a setup step so it's harder to miss.

@tahorst
Copy link
Member Author

tahorst commented Aug 21, 2020

Could be. Any idea how to tell by examining the /openblas/OpenBLAS-0.3.10 build directory or the libopenblas_haswell-r0.3.10.so file?

Do you have the build log? It should say if it was single threaded when it was compiled or how many CPUs it was configured for. Not sure how to get that info after compilation...

@1fish2
Copy link
Contributor

1fish2 commented Aug 21, 2020

The build log confirms that OpenBLAS was single threaded: libopenblas_haswell-r0.3.10.a (Single-threading).

I'll test USE_THREAD=0 on Mac to see if that gets results consistent with the others.

In the previous tests inside and outside Docker, the parca and sim outs differ although I didn't run analysis plots yet. Here are the behavior metrics, if that helps judge for good results:

docker-numpy-1.19.1:

                         metric                      mode  expected_min  expected_max     value  passes
0                       dnaMass           end_start_ratio       1.98150      1.981600   1.98511   False
1                  doublingTime             doubling_time    2910.00000   2920.000000      2950   False
2            fluxomeCorrelation               correlation       0.63497      0.634980  0.658865   False
3          limiting_metabolites      limiting_metabolites           NaN           NaN        {}   False
4          limiting_metabolites  num_limiting_metabolites       2.00000      2.000000         0   False
5                      mRnaMass           end_start_ratio       2.43290      2.433000   2.18449   False
6                   proteinMass           end_start_ratio       2.11340      2.113500   2.12372   False
7                   proteinMass                       max     342.72000    342.730000   344.604   False
8                   proteinMass                      mean     241.33000    241.340000   242.366   False
9                   proteinMass                       min     162.16000    162.170000   162.264   False
10                  proteinMass                     stdev      51.85900     51.860000   52.4276   False
11         proteome_correlation       schmidt_correlation       0.75000      0.750010  0.746113   False
12         proteome_correlation    wisniewski_correlation       0.61569      0.615700  0.612515   False
13                     rRnaMass           end_start_ratio       2.25150      2.251600   2.22151   False
14  replication_elongation_rate                      mean    1932.40000   1932.500000   1932.46    True
15                     ribosome      mean_active_fraction       0.86770      0.867702  0.870178   False
16                     ribosome        mean_concentration   16186.00000  16187.000000   15907.1   False
17                rnaPolymerase      mean_active_fraction       0.24678      0.246790   0.24749   False
18                rnaPolymerase        mean_concentration    3294.60000   3294.700000   3194.87   False
19            smallMoleculeMass           end_start_ratio       2.07820      2.078300   2.07858   False
20                     tRnaMass           end_start_ratio       2.28240      2.282500   2.25283   False
21  transcriptionElongationRate                      mean   59155.00000  59156.000000   57590.1   False
22    translationElongationRate                      mean      15.79300     15.794000    16.035   False

mac-numpy-1.19.1:

                         metric                      mode  expected_min  expected_max             value  passes
0                       dnaMass           end_start_ratio       1.98150      1.981600           1.97915   False
1                  doublingTime             doubling_time    2910.00000   2920.000000              2944   False
2            fluxomeCorrelation               correlation       0.63497      0.634980          0.658657   False
3          limiting_metabolites      limiting_metabolites           NaN           NaN  {VAL[c], GLT[c]}    True
4          limiting_metabolites  num_limiting_metabolites       2.00000      2.000000                 2    True
5                      mRnaMass           end_start_ratio       2.43290      2.433000           2.17393   False
6                   proteinMass           end_start_ratio       2.11340      2.113500           2.12721   False
7                   proteinMass                       max     342.72000    342.730000           345.171   False
8                   proteinMass                      mean     241.33000    241.340000           242.583   False
9                   proteinMass                       min     162.16000    162.170000           162.264   False
10                  proteinMass                     stdev      51.85900     51.860000           52.5483   False
11         proteome_correlation       schmidt_correlation       0.75000      0.750010          0.750456   False
12         proteome_correlation    wisniewski_correlation       0.61569      0.615700          0.615472   False
13                     rRnaMass           end_start_ratio       2.25150      2.251600           2.21675   False
14  replication_elongation_rate                      mean    1932.40000   1932.500000           1932.46    True
15                     ribosome      mean_active_fraction       0.86770      0.867702          0.872894   False
16                     ribosome        mean_concentration   16186.00000  16187.000000           15693.1   False
17                rnaPolymerase      mean_active_fraction       0.24678      0.246790          0.246992   False
18                rnaPolymerase        mean_concentration    3294.60000   3294.700000           3195.17   False
19            smallMoleculeMass           end_start_ratio       2.07820      2.078300           2.07505   False
20                     tRnaMass           end_start_ratio       2.28240      2.282500           2.25239   False
21  transcriptionElongationRate                      mean   59155.00000  59156.000000           57537.5   False
22    translationElongationRate                      mean      15.79300     15.794000           16.3066   False

@1fish2
Copy link
Contributor

1fish2 commented Aug 21, 2020

  • OpenBLAS with USE_THREAD=0 on Mac produced lots of parca output differences than without that flag, also different than the docker runs.
  • At least, two USE_THREAD=0 runs in a row on Mac produced matching parca outputs.
  • Docker runs with numpy 1.19.0 and 1.19.1 produced matching parca outputs.

@1fish2
Copy link
Contributor

1fish2 commented Aug 21, 2020

@tahorst do you have a test case for different output with NO_AVX2=1?

@tahorst
Copy link
Member Author

tahorst commented Aug 21, 2020

What kind of output do you want for the test case?

@1fish2
Copy link
Contributor

1fish2 commented Aug 23, 2020

Maybe more than one test case.

  • A NO_AVX2 test -- along with the OPENBLAS_NUM_THREADS test and maybe more -- to tell if we've fixed specific problems that caused inconsistent model output. Cumulatively, these should help us find more inconsistencies and eventually predict whether the model will produce consistent output across repeat runs and wherever it runs (OS, CPU, gcc, laptop, desktop, Sherlock, GCE, inside/outside Docker).
  • A minimal reproducible test case to report to the OpenBLAS team that NO_AVX2 changes the results. Does that vary by where we run it?

@tahorst
Copy link
Member Author

tahorst commented Aug 24, 2020

I'm not sure exactly where the differences are coming from so I don't know if we can create a simple test case for the NO_AVX2 difference but can look into it.

@1fish2
Copy link
Contributor

1fish2 commented Aug 24, 2020

We could brainstorm debugging techniques. E.g. run w/ and w/o NO_AVX2 in different Docker containers and compare a sequence of data snapshots.

@tahorst
Copy link
Member Author

tahorst commented Aug 25, 2020

You're thinking of looking at different points when the parca is running? My guess from the diffs that show up when comparing the final sim_data objects from the different Docker containers w/ and w/o NO_AVX2 is that the difference might be coming when we fit gene expression so it could be the Theano functions or the solver there but I would have to look into it to confirm. Comparing sim_data before and after that in different containers to confirm could be a good way to isolate a simpler test case.

@1fish2
Copy link
Contributor

1fish2 commented Aug 25, 2020

Good idea.

Top level questions:

  • Whether/how to install openblas and how to install numpy and scipy in each environment?
  • How to test libraries and future releases to see if/where we can just install numpy and scipy from binary wheels?
  • What else is needed to get consistent results from run to run and between environments?

Findings on Sherlock (pyenv wcEcoli3-staging):

  1. The ~/.numpy-site.cfg file needs a runtime_library_dirs = ... line to install numpy in a way that it reliably loads the named openblas at runtime.
    • Sherlock now has an environment module for openblas 0.3.10 but the module has to be loaded for numpy to load it at runtime, which clashes with the old openblas module. Our compiled openblas 0.3.9 doesn't have that limitation.
  2. Installing scipy linked to that openblas needs LDFLAGS="-shared $LDFLAGS", otherwise the compiler will fail with "crt1.o in function _start undefined reference to main".
  3. A login node now has 32 CPU cores and (with OPENBLAS_NUM_THREADS=1) it runs test_library_performance faster than a 5 year old compute node with 1 allocated CPU.

Findings in Docker:

  1. Always build openblas and build numpy and scipy linked to it.
    • Document how we reached that conclusion...
    • Build openblas with NO_AVX2=0 in Docker-for-Linux, NO_AVX2=1 in Docker-for-Mac, regardless of where it'll run, because NO_AVX2=0 computes more consistent results and runs fine in Docker-for-Mac (contrary the openblas team's hypothesis).
    • Set runtime_library_dirs for good measure and as a better reference for manual pyenv setup?

Findings on Mac & Linux?

Also:

  • Always run with OPENBLAS_NUM_THREADS=1 for more consistent results (see test_openblas_threads.py) and faster performance. It's possible to avoid that need by compiling openblas with USE_THREAD=0.

@1fish2
Copy link
Contributor

1fish2 commented Sep 22, 2020

The different ways to install OpenBLAS and NumPy × different runtime environments produce at least 7 different equivalence classes of Parca outputs. There does not seem to be an installation approach (even if the instructions are environment-specific) to get cross-platform consistent Parca output.

openblas variations

See OpenMathLib/OpenBLAS#2244 (comment)

It could be useful to investigate where these Parca computations diverge, but I'm not inclined to spend the time on that.

1fish2 added a commit that referenced this issue Sep 24, 2020
* `test_openblas_threads.py` tests openblas with varying numbers of threads. The results are inconsistent in some installations. So **always** set `OPENBLAS_NUM_THREADS=1`.
* I looked for ways to set up runtime environments on each platform for consistent Parca results but it did not pan out! See #931 (comment)
  * Get bug fix releases Python 3.8.5, numpy 1.19.2, and scipy 1.5.2, but they didn't help this problem. I updated the Dockerfile and the pyenv `wcEcoli3-staging` on Sherlock and will update `wcEcoli3`, but it's unclear if it matters for y'all to follow suit in your development environments since results will still vary by OS and other details.
  * It turns out that building a Docker Image on Linux needn't disable AVX2 vector instructions to use the Image on macOS. We have to do that only when building the Image on macOS. It'll get different Parca results so consider cloud-building Images with `cloud/build.sh` (= `cloud/build-runtime.sh` + `cloud/build-wcm.sh`) rather than `cloud/build-containers-locally.sh`.
  * Improve the docs on setting up your pyenv, explaining this stuff, fixing some confusions like Sherlock setup, and adding workarounds like `runtime_library_dirs` and `LDFLAGS` for roadblocks. These docs are more informative but more complicated. A tech writer could help.
  * I don't know when it's better to compile openblas vs. let numpy & scipy install their embedded copies, but I've written off using a package manager to install openblas.
* `compareParca.py` and `diff_simouts.py` now count the difference lines as a single figure-of-merit, and there's a CLI option to print just the count. This helped compare environments (table in #931 (comment)).
* Catch attempts to use Atlas MongoDB with GCloud. Atlas could be made to work there but there's no point since the GCloud MongoDB is better. We just can't use the GCloud MongoDB without opening it to external access.
* Adjust `augment-pycharm-package-list.sh` for the new Vivarium package names. People might have to manually replace the old Vivarium package names in PyCharm's exception list. This'd be easier if the package and PyPI names matched: `vivarium_core`, `vivarium_cell`.
* `mypy.ini`: Include the new `colony/` directory in source dirs to type-check.
* Drop Python 2.7 from PyCharm compatibility inspections.
@1fish2
Copy link
Contributor

1fish2 commented Nov 13, 2020

The bug fix #980 for portability issue #979 raises this question: Is numpy's random number generator returning different results between @ggsun's local machine and Sherlock?

What could cause that?

  • non-portable seeds;
  • non-portable random number generator;
  • different versions of numpy on different machines, and different random number sequences by version;
  • variations in OpenBlas (if it's used by the random number generator to compute normal distributions and such) or in the compilers used to build it;
  • calling the same random number generator instance from multiple threads [with high impact in @U8NWXD's multi-cell Vivarium].

There are uses of the default random instance in prototypes/, tests, and sim variants (those should be OK), analysis scripts (problematic when analyses run in parallel but won't affect the sims), and polymerize (that's concerning).

When moving to Python 3, we also moved from numpy 1.14.6 to 1.19.2. numpy.random did change in 1.17.0 (see NEP 19), mainly by adding a new generator PCG64 "which has better statistical properties than the legacy MT19937 used in RandomState". (It's faster, too.)

NEP 19 also says RandomState overpromised on compatibility.

1fish2 added a commit that referenced this issue Feb 17, 2021
When building a `wcm-runtime` Docker Image, make the default be to install numpy & scipy from wheels with their embedded copies of OpenBLAS.

It's **much faster** and it's the usual approach so we're less likely to hit installer bugs like #1040.

The reason for compiling OpenBLAS from source was originally to get a bug fix and later to avoid the AVX2 problem building it in Docker-for-Mac.

This does change Parca outputs but that's all part of #931.
@1fish2
Copy link
Contributor

1fish2 commented Jun 23, 2021

runSim --intermediates-directory and comparePickles are useful tools to help localize this problem.

Also see #1097

@1fish2
Copy link
Contributor

1fish2 commented Jun 24, 2021

The CACM article Keeping Science on Keel When Software Moves describes techniques and tools that a climate modeling project uses to debug reproducibility problems in their large models.

The article lacks key details, but the FLiT tool's technique as described might be relevant.

Also, all the examples in the article came down to compile optimizations that used Fused Multiply-Add (FMA) floating point instructions.

@tahorst
Copy link
Member Author

tahorst commented Jun 25, 2021

It's nice to see there's a tool to test for the floating point issues and it's interesting that other people have run into these problems. Do you think there are compiler flags for BLAS (I'm assuming that's where this is coming from but could be other dependencies) that make it slightly less efficient but reproducible? I also wonder if we can truncate/round floating point results in our code where these differences come up so our precision is less but we get above the machine level noise.

@1fish2
Copy link
Contributor

1fish2 commented Jun 25, 2021

It'd be good to brainstorm on this problem.

More observations:

  • I meant to mention that C++ compiler optimizations can't be the only cause of these problems, let alone the Fused Multiply-Add instruction. We've already isolated one floating point difference down to order of evaluation (and rounding) from OpenBLAS threading, and another down to Alpine Linux (in Docker) replacing libc with musl.
  • We run native floating point code in other libraries like NumPy, SciPy, cvxpy, glpk, cobrapy, stochastic-arrow, other code compiled by Cython, Aesara, and Numba.
  • There are OpenBLAS compiler flags to disable AVX2, AVX512 instructions and it looks like SSE3 and SSE4 instructions, and maybe FMA instructions, also to set a specific CPU target.
  • OpenBLAS frequently releases bug fixes. It's possible that one problem is due to a bug that they've fixed. I think installing NumPy and SciPy wheels includes OpenBLAS 0.3.13. We could go back to manually compiling OpenBLAS 0.3.15, NumPy, and SciPy (yuk). Changelog says v0.3.15 corrected the Haswell DROT kernel to require AVX2/FMA3 rather than AVX512.
  • I'm getting ready to test NumPy 1.21. It uses more SIMD instructions and includes mypy type stubs. It could make reproducibility better or worse. If the file openblas_support.py is a good clue, it's still on OpenBLAS 0.3.13.

I also wonder if we can truncate/round floating point results in our code where these differences come up so our precision is less but we get above the machine level noise.

Interesting idea! It might require localizing the cause to particular pieces of code. Then it might be able to erase some hardware variability and some order-of-evaluation variability.

This option sounds promising for performance in Docker but riskier for reproducibility:

OpenBLAS can be built for multiple targets with runtime detection of the target cpu by specifiying DYNAMIC_ARCH=1 in Makefile.rule, on the gmake command line or as -DDYNAMIC_ARCH=TRUE in cmake.

@1fish2
Copy link
Contributor

1fish2 commented Jun 29, 2021

Another variable is the C++ compiler: clang on macOS vs. gcc on Ubuntu, Docker, and CentOS (Sherlock). We could set up clang (how?) for use by pyenv, pip, Aesara, Cython, & Numba.

What triggered this realization is Santiago and Niels hit a problem on Ubuntu where Parca calls Aesara which fails to compile & link some code. A workaround is to install Python using PYTHON_CONFIGURE_OPTS="--enable-shared" pyenv install 3.8.7.

1fish2 added a commit that referenced this issue Nov 8, 2022
The list of alternate expected values in `test_openblas_threads.py` didn't pan out. Revert it to a single expected value and make that just a warning about non-portable simulation results.

See #931 and #1310
1fish2 added a commit that referenced this issue Nov 10, 2022
The list of alternate expected values in `test_openblas_threads.py` didn't pan out. Revert it to a single expected value and make that just a warning about non-portable simulation results.

See #931 and #1310
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants