-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Symlink patch & CICE netcdf4 parrallel #24
Symlink patch & CICE netcdf4 parrallel #24
Conversation
@dougiesquire Are you happy to review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should do the copies from the Payu driver itself, rather than as userscripts
? I think that will be more robust (e.g. will still work if we change the name of the initial condition file or format of the restart files) and has the benefit that we won't have to duplicate changes across configuration repos? Thoughts?
Yeah - it would be more robust. But we only need this (as a patch) until March, when we can update the OpenMPI version and remove these scripts again. Using user scripts seemed faster to implement in the interim (rather than waiting for Payu releases). |
So are you planning to add (then remove) these user scripts to every OM3 config that uses CICE? If this is only till March, another option is to just wait till then to update the config to use netcdf4. Does anyone need netcdf4 before then? I don't feel strongly, I'm just checking before we make the same change across multiple configurations that we have to remember to undo in a few months. |
I think only CICE-MOM configs, and not bother with the WW configs? There is a performance impact of turning on parralel io (which was significant in OM2), which impacts the profiling work Micael is doing. |
* symlink patch for netcdf4 in cice * reset runconfig from testing
As part of COSIMA/access-om3#81, it was identified that there is a bug with symlink handling with parallel reads in openmpi (open-mpi/ompi#12141). The fix for the bug will be included in OpenMpi 4.1.7 but this may not be released until end of March 2024. This bug blocks use of netcdf4 in CICE.
In the interim, we can patch this bug using the two scripts in this PR, which replace the symlink to the initial conditions with a copy of the initial conditions file and replace the symlink to restart files with a hardlink to restart files. (We can't rely on hardlinking to the initial conditions file because it may not be on the same file system as where the model is being run from.)
This allows turning on netcdf4 support in CICE (in nuopc.runconfig).
In the current config, reported ReadWrite time for one month is 50-60seconds. With this change, reported times are ~13 seconds. And this corresponds to a similar change in total time (150 to 110 seconds).