Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 0.1 degree model configurations #16

Closed
4 tasks
aidanheerdegen opened this issue Feb 26, 2024 · 20 comments
Closed
4 tasks

Add 0.1 degree model configurations #16

aidanheerdegen opened this issue Feb 26, 2024 · 20 comments
Assignees
Labels
type:config Model configurations

Comments

@aidanheerdegen
Copy link
Member

aidanheerdegen commented Feb 26, 2024

Currently only the 1 degree RYF and IAF global configurations are available in this repo.

The full ACCESS-OM2 release requires the 0.1 degree configurations, IAF and RYF atmospheric forcing, with physics only and bgc.

This corresponds to four configurations:

  • release-01deg_jra55_ryf (repo)
  • release-01deg_jra55_iaf (repo)
  • release-01deg_jra55_ryf_bgc (repo)
  • release-01deg_jra55_iaf_bgc (repo)

The release-1deg_jra55_ryf branch is the template to follow for RYF configurations. Similarly the release-1deg_jra55_iaf branch is the template to follow for IAF configurations.

Steps to follow to add a COSIMA configuration to this repo:

  1. Clone this repo, e.g. gh repo clone ACCESS-NRI/access-om2-configs
  2. Clone configuration repo, e.g. gh repo clone COSIMA/01deg_jra55_ryf
  3. cd access-om2-configs
  4. Add the COSIMA repo as a remote, e.g. remote add 01deg_jra55_ryf ../01deg_jra55_ryf
  5. Create branch from config repo, e.g. git checkout -b release-01deg_jra55_ryf 01deg_jra55_ryf/master
  6. Modify configuration to only use paths from vk83 and qv56
  7. Add configuration to GitHub repository, e.g. git push -f release-01deg_jra55_ryf
@aidanheerdegen
Copy link
Member Author

Steps to modify a configuration:

  1. Change the run length to something short for testing purposes (maybe 1 day for 0.1 degree)
  2. Create a ground-truth from the COSIMA configuration to compare against. Make sure you have diagnostic output at a high enough frequency that fields are output for the shortened run length. Good fields to choose are prognostic variables like salt, temp and velocities u, v.
  3. Create a tools subdirectory and git mv scripts to this location. Update config.yaml as required
  4. Modify the configuration by changing paths to executables to those in vk83 (see 1 degree for paths, and note there are different CICE executables for the different resolutions)
  5. Turn off runlog to prevent unnecessary git commits during testing
  6. Run short test run and check for reproducibility
  7. Modify input paths to point to the new structure in vk83 (see 1 degree config)
  8. Run short test run and check for reproducibility
  9. Update README
  10. Change run length back, and turn runlog back on

Reproducibility Testing

There is a general discussion here

ACCESS-NRI/model-config-tests#83

Specifically there is some code for extracting MOM/FMS checksums buried in the CI checking code, which isn't easy to call currently, but shows how it is done

https://github.com/ACCESS-NRI/access-om2-configs/blob/main/test/models/accessom2.py#L43-L91

@aidanheerdegen
Copy link
Member Author

This GitHub comparison shows the changes I did for the 1 degree RYF configuration (and I noticed there are some mistakes that need fixing!)

224b125...3f4cd6e

@aidanheerdegen
Copy link
Member Author

For comparing model outputs to check for bit reproducibility I recommend a few different tools:

  • nccmp (/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.07/bin/nccmp)

nccmp is silent when files are equivalent, otherwise it will echo to STDERR whether metadata or a specific variable differs. By default, comparing stops after the first difference.

  • ncdiff (/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.07/bin/ncdiff)

ncdiff subtracts one netcdf file from another

  • ncview (/apps/ncview/2.1.7/bin/ncview)

Simple Xwindows based netCDF file viewer. When subtracting one file from another it is a good idea to make sure the files contain actual data. If they're empty you may get a false negative for bit reproducibility as they will be identical.

@aidanheerdegen
Copy link
Member Author

aidanheerdegen commented Feb 28, 2024

Hi @minghangli-uni, great work, thanks.

A couple of things I noticed (that weren't covered above):

  1. restart_freq should be set to a date-based restart string.

https://github.com/ACCESS-NRI/access-om2-configs/blob/release-01deg_jra55_ryf/config.yaml#L74

Perhaps @aekiss would have some useful input into what frequency of restarts is appropriate, but in the this experiment (/g/data/ik11/outputs/access-om2-01/01deg_jra55v13_ryf9091) it is 5 yearly. So the appropriate setting would be

restart_freq: '5Y'
  1. runlog should be true, or just delete that line as it defaults to true and we don't really want people to turn it off
  2. We should add some default sync options, but leave the details up to users. I realise I didn't do this for 1 degree configurations either.

The documentation for sync is available here: https://payu.readthedocs.io/en/latest/config.html#postprocessing

I'd suggest we start with something like

sync:
    enable: False # set the path and change to true to automatically copy experiment outputs to another location
   path: # Set to a location on /g/data or use rsync syntax to specify a remote server and path
   exclude:
      - *.nc.* 
      - iceh.????-??-??.nc

But best to consult @aekiss to see if all these exclude patterns are required by default

https://github.com/COSIMA/01deg_jra55_ryf/blob/master/sync_data.sh#L20C10-L20C121

@aidanheerdegen
Copy link
Member Author

I've added sync options to the 1 degree configurations, and put it up the top as this is likely something users will have to change

https://github.com/ACCESS-NRI/access-om2-configs/blob/release-1deg_jra55_ryf/config.yaml#L15-L22

@minghangli-uni
Copy link
Collaborator

For the purpose of ensuring checksum reproducibility, the output file from a Known Good Output (KGO) generated by the COSIMA configuration and the output file from the current configuration are provided below:

  1. /scratch/tm70/ml0072/access-om2/archive/01deg_jra55_ryf-ae35d563/output000/access-om2.out
  2. /scratch/tm70/ml0072/access-om2/archive/access-om2-configs-release-01deg_jra55_ryf-86474517/output000/access-om2.out

@minghangli-uni
Copy link
Collaborator

minghangli-uni commented Feb 28, 2024

For comparing model outputs to check for bit reproducibility I recommend a few different tools:

  • nccmp (/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.07/bin/nccmp)

nccmp is silent when files are equivalent, otherwise it will echo to STDERR whether metadata or a specific variable differs. By default, comparing stops after the first difference.

  • ncdiff (/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.07/bin/ncdiff)

ncdiff subtracts one netcdf file from another

  • ncview (/apps/ncview/2.1.7/bin/ncview)

Simple Xwindows based netCDF file viewer. When subtracting one file from another it is a good idea to make sure the files contain actual data. If they're empty you may get a false negative for bit reproducibility as they will be identical.

For comparing model outputs, the above tools are used.

It's worth noting that when using nccmp, to avoid a "file format warning" such as,

DIFFER : FILE FORMATS : NC_FORMAT_NETCDF4_CLASSIC <> NC_FORMAT_NETCDF4,

it is suggested to include the flag -w format and execute

nccmp -d -s -w format $file_1_access_om2_configs.nc $file_1_cosima.nc.

Also for 3D variables, it's advisable to refrain from running nccmp or ncdiff on login nodes. Instead, it's best practice to submit a PBS job for such tasks to avoid resource contention and potential disruptions for other users.

Last but not least, it is ensured that diagnostics contain non-empty data.

@aidanheerdegen
Copy link
Member Author

@aekiss can you confirm there aren't separate BGC executables required for the BGC configs?

@aekiss
Copy link
Contributor

aekiss commented Feb 29, 2024

re sync exclusions - see comments below and https://github.com/COSIMA/01deg_jra55_iaf/blob/01deg_jra55v140_iaf_cycle4/sync_data.sh#L90-L111

sync:
    enable: False # set the path and change to true to automatically copy experiment outputs to another location
   path: # Set to a location on /g/data or use rsync syntax to specify a remote server and path
   exclude:
      - *.nc.*   # don't sync uncollated MOM outputs (eg if collation is slow or fails)
      - iceh.????-??-??.nc  # don't sync individual daily cice outputs - these are concatenated by sync_data.sh
      - iceh*.????-??-??-?????.nc  # don't sync individual sub-daily cice outputs - these are concatenated eg by https://github.com/COSIMA/01deg_jra55_iaf/blob/01deg_jra55v140_iaf_cycle4/concat_ice_6hourlies.sh
      - *-DELETE  # deletable files from daily ice concatenation
      - *-IN-PROGRESS  # temp files from daily ice concatenation

@aekiss
Copy link
Contributor

aekiss commented Feb 29, 2024

@aekiss can you confirm there aren't separate BGC executables required for the BGC configs?

No, MOM master+bgc exes are different from master in all OM2 configs.

e.g. compare these:
https://github.com/COSIMA/01deg_jra55_iaf/blob/master/config.yaml#L40
https://github.com/COSIMA/01deg_jra55_iaf/blob/master%2Bbgc/config.yaml#L40

@aekiss
Copy link
Contributor

aekiss commented Feb 29, 2024

Perhaps @aekiss would have some useful input into what frequency of restarts is appropriate

I've been saving annual restarts for IAF at 0.1°. Not sure what was done at other resolutions.

@aidanheerdegen
Copy link
Member Author

No, MOM master+bgc exes are different from master in all OM2 configs.

I couldn't find where the build logic differs. Where is there any build logic that differentiates between ACCESS-OM2 and ACCESS-OM2-BGC? I thought I recall you deciding to do away with separate builds, and only have the difference in the configurations themselves. No?

@aidanheerdegen
Copy link
Member Author

I've been saving annual restarts for IAF at 0.1°. Not sure what was done at other resolutions.

The RYF is being saved every 5 years, at least lately

$ ls -ld /g/data/ik11/outputs/access-om2-01/01deg_jra55v13_ryf9091/restart1*
drwxr-s---+ 5 amh157 ik11 4096 Feb 22  2021 /g/data/ik11/outputs/access-om2-01/01deg_jra55v13_ryf9091/restart1015
drwxr-s---+ 5 amh157 ik11 4096 Feb 24  2021 /g/data/ik11/outputs/access-om2-01/01deg_jra55v13_ryf9091/restart1035
drwxr-s---+ 5 amh157 ik11 4096 Feb 27  2021 /g/data/ik11/outputs/access-om2-01/01deg_jra55v13_ryf9091/restart1055
drwxr-s---+ 5 amh157 ik11 4096 Mar  1  2021 /g/data/ik11/outputs/access-om2-01/01deg_jra55v13_ryf9091/restart1075
drwxr-s---+ 5 amh157 ik11 4096 Apr  9  2021 /g/data/ik11/outputs/access-om2-01/01deg_jra55v13_ryf9091/restart1079
drwxr-s---+ 5 amh157 ik11 4096 Jun  6  2021 /g/data/ik11/outputs/access-om2-01/01deg_jra55v13_ryf9091/restart1095
drwxr-s---+ 5 amh157 ik11 4096 Jun  9  2021 /g/data/ik11/outputs/access-om2-01/01deg_jra55v13_ryf9091/restart1115
$ cat /g/data/ik11/outputs/access-om2-01/01deg_jra55v13_ryf9091/restart1*/ocean/ocean_solo.res 
     4        (Calendar: no_calendar=0, thirty_day_months=1, julian=2, gregorian=3, noleap=4)
  1900     1     1     0     0     0        Model start time:   year, month, day, hour, minute, second
  2155     1     1     0     0     0        Current model time: year, month, day, hour, minute, second
     4        (Calendar: no_calendar=0, thirty_day_months=1, julian=2, gregorian=3, noleap=4)
  1900     1     1     0     0     0        Model start time:   year, month, day, hour, minute, second
  2160     1     1     0     0     0        Current model time: year, month, day, hour, minute, second
     4        (Calendar: no_calendar=0, thirty_day_months=1, julian=2, gregorian=3, noleap=4)
  1900     1     1     0     0     0        Model start time:   year, month, day, hour, minute, second
  2165     1     1     0     0     0        Current model time: year, month, day, hour, minute, second
     4        (Calendar: no_calendar=0, thirty_day_months=1, julian=2, gregorian=3, noleap=4)
  1900     1     1     0     0     0        Model start time:   year, month, day, hour, minute, second
  2170     1     1     0     0     0        Current model time: year, month, day, hour, minute, second
     4        (Calendar: no_calendar=0, thirty_day_months=1, julian=2, gregorian=3, noleap=4)
  1900     1     1     0     0     0        Model start time:   year, month, day, hour, minute, second
  2171     1     1     0     0     0        Current model time: year, month, day, hour, minute, second
     4        (Calendar: no_calendar=0, thirty_day_months=1, julian=2, gregorian=3, noleap=4)
  1900     1     1     0     0     0        Model start time:   year, month, day, hour, minute, second
  2175     1     1     0     0     0        Current model time: year, month, day, hour, minute, second
     4        (Calendar: no_calendar=0, thirty_day_months=1, julian=2, gregorian=3, noleap=4)
  1900     1     1     0     0     0        Model start time:   year, month, day, hour, minute, second
  2180     1     1     0     0     0        Current model time: year, month, day, hour, minute, second

@aidanheerdegen
Copy link
Member Author

aidanheerdegen commented Feb 29, 2024

      - iceh.????-??-??.nc  # don't sync individual daily cice outputs - these are concatenated by sync_data.sh

Thanks.

But this has made me realise we don't have any functionality for concatenating the daily ice files

https://github.com/COSIMA/01deg_jra55_ryf/blob/master/sync_data.sh#L87-L108

That would need cdo nco installed in the payu conda environment in vk83

@aekiss
Copy link
Contributor

aekiss commented Feb 29, 2024

FYI we also concatenated sub-daily cice outputs, eg https://github.com/COSIMA/01deg_jra55_iaf/blob/01deg_jra55v140_iaf_cycle4/concat_ice_6hourlies.sh
which is why we also exclude iceh*.????-??-??-?????.nc

@aekiss
Copy link
Contributor

aekiss commented Feb 29, 2024

No, MOM master+bgc exes are different from master in all OM2 configs.

I couldn't find where the build logic differs. Where is there any build logic that differentiates between ACCESS-OM2 and ACCESS-OM2-BGC? I thought I recall you deciding to do away with separate builds, and only have the difference in the configurations themselves. No?

No, they're built differently, in some way that I can't quite remember right now - see https://github.com/COSIMA/access-om2/blob/master/install.sh#L10-L11 and https://github.com/COSIMA/access-om2/wiki/Getting-started#building-the-models

@aidanheerdegen
Copy link
Member Author

@aidanheerdegen
Copy link
Member Author

aidanheerdegen commented Feb 29, 2024

Progress on the BGC configurations is largely blocked until the build variant referenced above is created and deployed.

Though it should be possible to copy the missing BGC inputs from ik11 into the new structure in vk83.

minghangli-uni pushed a commit that referenced this issue Apr 8, 2024
This was referenced Apr 10, 2024
@aidanheerdegen aidanheerdegen added the type:config Model configurations label Apr 11, 2024
@aidanheerdegen
Copy link
Member Author

These configurations need to be updated to the ACCESS-OM2 2024.03.0 deployment, which includes restart reproducibility.

The binary paths are in the spack.location file, but the relevant ones are excerpted below:

==> In environment access-om2-2024_03_0
==> Root specs
[email protected]=2024.03.0

==> Installed packages
-- linux-rocky8-x86_64 / [email protected] -----------------------
[email protected]=2023.10.19         /g/data/vk83/apps/spack/0.20/release/linux-rocky8-x86_64/intel-19.0.5.281/cice5-git.2023.10.19=2023.10.19-v3zncpqjj2gyseudbwiudolcjq3k3leo
[email protected]=2023.10.26  /g/data/vk83/apps/spack/0.20/release/linux-rocky8-x86_64/intel-19.0.5.281/libaccessom2-git.2023.10.26=2023.10.26-ltfg7jcn6t4cefotvj3kjnyu5nru26xo
[email protected]=2023.11.09          /g/data/vk83/apps/spack/0.20/release/linux-rocky8-x86_64/intel-19.0.5.281/mom5-git.2023.11.09=2023.11.09-qji4nlmr6utrribaiyhewe4je6mifguz

@aidanheerdegen
Copy link
Member Author

Nice work @minghangli-uni! This is completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:config Model configurations
Projects
None yet
Development

No branches or pull requests

3 participants