Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symlink patch & CICE netcdf4 parrallel #24

Merged

Conversation

anton-seaice
Copy link
Contributor

@anton-seaice anton-seaice commented Dec 19, 2023

As part of COSIMA/access-om3#81, it was identified that there is a bug with symlink handling with parallel reads in openmpi (open-mpi/ompi#12141). The fix for the bug will be included in OpenMpi 4.1.7 but this may not be released until end of March 2024. This bug blocks use of netcdf4 in CICE.

In the interim, we can patch this bug using the two scripts in this PR, which replace the symlink to the initial conditions with a copy of the initial conditions file and replace the symlink to restart files with a hardlink to restart files. (We can't rely on hardlinking to the initial conditions file because it may not be on the same file system as where the model is being run from.)

This allows turning on netcdf4 support in CICE (in nuopc.runconfig).

In the current config, reported ReadWrite time for one month is 50-60seconds. With this change, reported times are ~13 seconds. And this corresponds to a similar change in total time (150 to 110 seconds).

@anton-seaice
Copy link
Contributor Author

@dougiesquire Are you happy to review this?

Copy link
Collaborator

@dougiesquire dougiesquire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should do the copies from the Payu driver itself, rather than as userscripts? I think that will be more robust (e.g. will still work if we change the name of the initial condition file or format of the restart files) and has the benefit that we won't have to duplicate changes across configuration repos? Thoughts?

@anton-seaice
Copy link
Contributor Author

I wonder if we should do the copies from the Payu driver itself, rather than as userscripts? I think that will be more robust (e.g. will still work if we change the name of the initial condition file or format of the restart files) and has the benefit that we won't have to duplicate changes across configuration repos? Thoughts?

Yeah - it would be more robust. But we only need this (as a patch) until March, when we can update the OpenMPI version and remove these scripts again. Using user scripts seemed faster to implement in the interim (rather than waiting for Payu releases).

@dougiesquire
Copy link
Collaborator

So are you planning to add (then remove) these user scripts to every OM3 config that uses CICE? If this is only till March, another option is to just wait till then to update the config to use netcdf4. Does anyone need netcdf4 before then? I don't feel strongly, I'm just checking before we make the same change across multiple configurations that we have to remember to undo in a few months.

@anton-seaice
Copy link
Contributor Author

I think only CICE-MOM configs, and not bother with the WW configs?

There is a performance impact of turning on parralel io (which was significant in OM2), which impacts the profiling work Micael is doing.

@anton-seaice anton-seaice changed the title Symlink patch Symlink patch & CICE netcdf4 parrallel Jan 8, 2024
@anton-seaice anton-seaice merged commit efa43a9 into ACCESS-NRI:1deg_jra55do_ryf Jan 9, 2024
@anton-seaice anton-seaice deleted the symlink_patch branch January 9, 2024 23:37
anton-seaice added a commit that referenced this pull request Jan 10, 2024
* symlink patch for netcdf4 in cice

* reset runconfig from testing
@micaeljtoliveira micaeljtoliveira added cice6 Related to CICE6 1deg_jra55do_ryf 1deg_jra55do_ryf configuration labels Jan 25, 2024
@micaeljtoliveira micaeljtoliveira added this to the 0.3.x milestone Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1deg_jra55do_ryf 1deg_jra55do_ryf configuration cice6 Related to CICE6
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants