-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ch5n_nbo QA test fails from v7.2.0 #864
Comments
@drew-parsons could you try this patch for the ch5n_nbo failure?
|
ch4-scf-dft-prop and ch4-dft-scf-prop were not part of the default QA script |
Good suggestion, adding
(was -94.679444920261 without
in the reference file, which is within the tolerance required for the QA tests. |
Thanks for the feedback. |
Ok. I'll apply this patch to the debian tests in the meantime. |
Current master and hotfix/release-7-2-0 have now a more reliable fix to address this issue. |
v7.2.1 is now available for download |
Thanks. Debian builds are underway. |
v7.2.1 is now built and available for Debian, https://buildd.debian.org/status/package.php?p=nwchem |
Great.
|
Should the I don't have a loong64 machine to log into and check, but can try it one way or the other in the next build upload. |
You are right. Thanks for spotting this
|
loong64 has now built successfully (via loongarch64), https://buildd.debian.org/status/fetch.php?pkg=nwchem&arch=loong64&ver=7.2.1-3&stamp=1697658700&raw=0 I've one more general question related to tests. In this issue we got the ch5n_nbo test passing, and it's passing on all arches (apart from some MPI trouble on s390x and ia64[gcc]). But on some of the less common architectures some of the other tests fail in trivial ways (just a small difference in energy). Pragmatically it might be simplest to just drop selected failing tests on the less common architectures, so CI can pass on the other tests. Tests with only trivial differences in energy compared to the amd64 reference include
Tests with more serious failure include
A number of tests ran out of heap memory on s390x, I don't think that's a bug. Test logs are a combination of run-time (CI) testing at https://ci.debian.net/packages/n/nwchem/ and build-time testing at https://buildd.debian.org/status/package.php?p=nwchem (debian build time test failures are currently ignored, the green bands indicate the build succeeded, not necessarily all the tests) I can start skipping these failing tests in the debian builds for these minor architectures so that we can monitor reliable passing of the other tests (i.e. not ignore test failures). Let me know if you want me to apply any specific patches. |
Thanks for the detailed update. I will try to see what I can replicate/reproduce with a generic build on some of the architectures. I have never tried to use s390x ... I am even surprised that the code can build and run. |
I am surprised the code works on x32 ... do you get just a single failure on n2_ccsd? |
For sure it means the code is in good condition, that it builds and passes most tests on most minor architectures. That's right, all the other tests passed on x32, n2_ccsd is the only one that failed. The x32 build log is In patch 02_makefile_flags.patch we added |
I see cosmo_h3co mentioned twice for sparc64 ... does it result in small failure or a SIGBUS? |
The sparc64 log is https://buildd.debian.org/status/fetch.php?pkg=nwchem&arch=sparc64&ver=7.2.1-3&stamp=1697877511&raw=0 cosmo_h3co fails with the SIGBUS with mpich, It was cosmo_h2o not cosmo_h3co that fails with the small energy difference,
|
alpha was slow to build but has finally declared it wants to join the DONTHAVEM64OPT group. |
The following bit of
|
USE_HWOPT will help the generic debian package builds. I'll try it along with removing the LINUX64 patch in the next upload. |
powerpc architecture: I am not able to reproduce your failures (other than n2_ccsd). |
Strange about powerpc. We have an external ga but it's still 5.8.2. With alpha (build log), it's getting the wrong flags for gcc building peigs,
That's coming from Line 496 in 060f945
What's the best way to handle it? x32 is apparently not happy having to handle LINUXCPU, https://buildd.debian.org/status/fetch.php?pkg=nwchem&arch=x32&ver=7.2.1-4&stamp=1698161945&raw=0
I'm a bit confused by the problem. nwchem is just including stdlib.h, stdlib.h should know what bits it needs. There's no special handling for LINUXCPU=x32 anyway, the behaviour shouldn't have changed. |
Let me have a look at the |
One point about BLAS. Debian policy is to build against the generic BLAS (libblas-dev). libblas.so is ABI compatible so any (optimised) BLAS package can be installed on end-user systems to provide the desired optimised blas implementation (OpenBLAS or one of the others) at runtime. But in the debian CI tests we haven't configured installation of an optimised BLAS package. So the s390x failure is using generic BLAS. |
Could you elaborate a bit on how to switch from generic libblas-dev to libopenblas-dev at runtime? |
It just needs libopenblas-dev to be installed, nothing else. Then the install scripts use the debian alternatives mechanism to set a symlink for the preferred libblas.so. The preference can be set manually using There is an additional choice about which openblas build. libopenblas-dev will install the pthread build ( 64-bit BLAS is also an option (libblas64-dev, etc). We've had a request to provide a longint nwchem build, https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=935993 But that's a separate library, |
64-bit BLAS is not enough, since we have Scalapack, too. Scalapack has been ported to use 64-bit integers for a while (this I what I use by default and do strongly recommend). I wish these changes would make it to debian some time soon. Release 2.2.0 New features: Allow compilation in ILP64 mode, PR Reference-ScaLAPACK/scalapack#19 |
I am testing the |
My earlier analysis on the s390x just proved to be wrong. It was an issue with 64bit to 32bit conversion. |
Endianess can be tricky. As far as scalapack goes, we have a bug request to provide a 64-bit build at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=961186 |
I might craft a 7.2.2 release with all these latest changes |
I believe there are no Cray-T3D or Cray-T3E powered on at this point in time. They are only present in computer museum.s We should delete all those lines of makefile at some point. |
It sounds like the time has come to consider obsolete ia64 |
Will be interesting to see how long debian keeps it on the build list. Likely debian will have to drop it too if it's no longer in the linux kernel. |
NWChem 7.2.2 is out with all the latest fixes to make the build work on big-endian architectures (the fix for Python 3.12 #892 is there, too). GA 5.8.2 needs to be patched to work on big-endian architectures with the following patches available in the ga develop branch |
Describe the bug
I'm running QA tests on a debian package build of nwchem 7.2.0. All the CI tests that debian runs are passing, except for ch5n_nbo which fails trivially (energy difference 1e-5 Hartree). What's the best way to handle it?
Describe settings used
Building with gcc 13.2.0 with both openmpi 4.1.5 and mpich 4.1.2.
Linux (Debian unstable)
To Reproduce
cd QA
MPIRUN_PATH=/usr/bin/mpirun.openmpi NWCHEM_TARGET=LINUX64 NWCHEM_EXECUTABLE=/usr/bin/nwchem.openmpi NWCHEM_TOP=$PWD/.. ./runtests.mpi.unix procs 1 ch5n_nbo
2b.
MPIRUN_PATH=/usr/bin/mpirun.mpich NWCHEM_TARGET=LINUX64 NWCHEM_EXECUTABLE=/usr/bin/nwchem.mpich NWCHEM_TOP=$PWD/.. ./runtests.mpi.unix procs 1 ch5n_nbo
Expected behavior
Precision in energy calculations should be reproducible such that tests pass.
Screenshots
Additional context
Debian CI tests only run a subset of available tests. A random launch of various other tests shows that ch4-scf-dft-prop.nw also fails, evidently due to Issue #776 (sets cosmo dielectric constant to 78.40 instead of the 3.90 given in the input file). Likewise ch4-dft-scf-prop.
The text was updated successfully, but these errors were encountered: