Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CABLE offline spatial runs are not reproducible #463

Open
SeanBryan51 opened this issue Nov 6, 2024 · 0 comments
Open

CABLE offline spatial runs are not reproducible #463

SeanBryan51 opened this issue Nov 6, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@SeanBryan51
Copy link
Collaborator

Currently for MPI configurations (ran via benchcab spatial), running the same version of CABLE against itself sometimes does not produce the same output (bitwise).

Benchcab currently tests 4 different science configurations for all given CABLE versions, labelled S0, S1, S2 and S3. Sometimes one or more of these configurations will reproduce the same results bitwise however it is unlikely all configurations reliably reproduce.

Where differences occur, many variables have relative differences greater than 10% throughout the time series.

My guess as to why this is happening is uninitialised memory access somewhere (e.g. #395, #396, #397) is causing non-deterministic behaviour. Currently the MPI executable crashes when running it with ddt with balanced memory debugging settings enabled.

Steps to reproduce (Gadi):

CABLE version used: main c125ede
Benchcab version used: 4.1.0

  1. Clone the bench_example repository into /scratch:
  2. Change directory into bench_example and set the configuration file as follows:
cat << EOF > config.yaml
realisations:
  - repo:
      git:
        branch: main
  - repo:
      git:
        branch: main
    name: main-2

modules: [
  intel-compiler/2021.1.1,
  netcdf/4.7.4,
  openmpi/4.1.0
]
EOF
  1. Load hh5 modules and run benchcab spatial:
module load conda/analysis3-24.04
benchcab spatial
  1. Wait for CABLE jobs to finish.
  2. Load nccmp and compare outputs:
module load nccmp
nccmp -d runs/spatial/tasks/crujra_access_R*_S0/archive/output000/cable_out.nc
nccmp -d runs/spatial/tasks/crujra_access_R*_S1/archive/output000/cable_out.nc
nccmp -d runs/spatial/tasks/crujra_access_R*_S2/archive/output000/cable_out.nc
nccmp -d runs/spatial/tasks/crujra_access_R*_S3/archive/output000/cable_out.nc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant