-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rearrange input directories #2
Comments
Here is a proposed directory structure
It is viewable/editable here I have just used the version (date) as the one in the COECMS @MartinDix can you take a look and tell me what you think. I have pretty much ignored the fact that this is just the pre-industrial, and and waiting for another config to see what files are common and where we might need configuration specific trees. Note I've done away with the historical "CABLE-AUX" structure, which would entail commensurate changes to paths in the config. |
On second thoughts, it probably makes sense to have |
I don't think the split between aerosol/chemistry is useful. ESM1.5 only has enough chemistry for the aerosol scheme so they're essentially the same thing. OCFF (organic carbon from fossil fuel) is an aerosol field. Ndep is nitrogen deposition, so probably part of land/biogeochem. The model shouldn't be using qrclim.slt, qrclim.smow (soil temperature and moisture climatologies) because it's starting from a complete restart file. Similarly with the ice monthly_sstsss.nc. I'll double check this. ozone probably sits better in forcing than in aerosol. It's also independent of the ocean model resolution. The various txt and csv files are also resolution independent. Do you want to separate out the fields that depend on the land sea mask and so the ocean model resolution? stash is logically part of the atmosphere (resolution independent again). |
Sounds like a good idea for clarity and to make it easier for those who want to alter the land/sea mask, and also avoid unnecessary duplication when creating a new configuration with a different resolution
I wasn't sure if Does Here is the updated organisation
You should able to access and edit a copy of this using this link: |
You can just cut n' paste this into https://tree.nathanfriend.io/ Details
|
Hi Aidan and Martin, I just have a few questions about the proposed structure:
Relatedly, what would be sensible for categorising the coupling files which use all of the grids? Would it make sense to note down both resolutions here, e.g.
Would highlighting the land-sea mask dependence in the directory tree be preferable to just noting it somewhere in the documentation?
Let me know if I might have misunderstood/misinterpreted anything here. |
Good point! This shows my ocean-bias, and yes I think it would be better to use something that makes sense from an atmosphere point of view.
Again, a good suggestion. is global twice redundant? Wondering out loud if we'd ever have a remapping file from
Very probably. I didn't know this was the case, which shows the value of highlighting it.
Another excellent point. I'd say anything that should be changed should also be categorised as such. |
I've put together a copy of the input directory with an updated structure, located currently in The tree for this new directory is:
Comments
Questions There are several things I'm unsure about:
Apologies for all the questions. Let me know if you have any suggestions or changes to make. Once everything is worked out, I'll make a branch of the configuration which uses the new structure. |
It is only the paths to the files, which might mean something to some people. Ideally we'd find out what that meaning might be as people don't last forever. That may be a later endeavour however, so I don't see any problem with keeping the file around, or putting the relevant path in a README in the sub-dir where the file is now located and note that this was the original location where the file was sourced from.
Probably best to hash out in that issue what should be omitted and then action it here.
The ESM1.5 configuration is using the old FMS mosaic format. IIRC I converted the COSIMA 1 degree grid to the more modern format to match the 0.25 and 0.1 grids, but I can't find any mention of it. Damn. Ideally we should switch to using the same grid format as the ACCESS-OM2 configurations, though I think the grid itself is different due to cell stretching around the equator in the coupled models, but correct me if I'm wrong. @anton-seaice has been playing around with grids quite a bit, so he or @dougiesquire may be able to comment on this.
It's a good point. The ocean and ice share a grid, and the atmosphere sends all fields through the ice model IIRC. So yes it is a bit weird. Could use a name like |
The ACCESS-OM 1 degree grid also have some (latitudinal?) refinement around the equator, its probably the same but it might not be. There is no guarantee the variable format used between the CICE4 in ESM1.5 and the AUSCOM-CICE5 fork are the same of course (again, quite likely there are). |
Maybe not... I wasn't even aware that there was an "old FMS mosaic format". My understanding of the "new" (ACCESS-OM2) format is:
It looks like in the "old FMS mosaic format", all the info that could be spread across this nested structure is contained in the one |
Also, a couple of unsolicited comments on the above structure:
Please feel free to ignore both |
IIRC yes. This is where we really miss the mentat abilities of Russ Fiedler when it comes to MOM5. |
Many of these files are common across pre-industrial and historical. E.g. for the atmosphere everything except forcing and aerosol directories. Everything in the coupler and ice directories is also common. Everything in ocean too, except perhaps biogeochemistry? |
I've tried modifying the tree based on the above ideas.
I've added in an
I've added in a common directory and moved these files across. It looks like the pre-industrial and historical configurations use different
From what I can tell, the copies of The modified tree based on the above is shown below. Let me know your thoughts/if you have any suggestions!
|
Awesome, thanks @blimlim this looks great. My only issue with Maybe it isn't worth overthinking too much, but if there is anything obvious I'd be keen to incorporate it now. e.g. do we put this all under (Note: we could use I like the idea of |
That sounds like a good idea
would If the tree is looking ok to everyone, I'll rearrange my copy of the input directory to match and test it out. |
Nice suggestion. Yep, sounds good.
Sounds like a good idea. |
When I was digging into the CICE calendar I noticed that the CICE code is looking for this file: #ifdef AusCOM
idate_save = idate !save for late re-set in case 'restart' is used for jobnum=1
if (runtype == 'initial') then
nrec = month - 1 !month is from calendar
if (nrec == 0) nrec = 12
call get_time0_sstsss(trim(inputdir)//'/monthly_sstsss.nc', nrec)
endif
!the read in sst/sss determines the initial ice state (in init_state)
Removing this file from the input directory leads to the PI configuration crashing, in (calendar) idate = 1010101
(get_time0_sstsss) file doesnt exist: INPUT/monthly_sstsss.nc
(END) @MartinDix is it worth digging more into why it's requiring this file? For now I'll leave it in the restructured directory. |
I am moderately sure that file is just used to set the initial ice extent. i.e. where the SST is below -1.8C in that file, the model will initialise with sea-ice present. This only occurs during initial spin-up (when |
Ah ok thanks. Just to check, is this meant to be happening in the default PI configuration? If I try and remove the file the model doesn't run. |
Its needed in in all configurations when a new experiment is started (i.e. when there isnt a ice restart file).. (The initial conditions for sea-ice in these circumstances are not very critical ... their impact on the overall simulation would resolve within a year to be consistent with the other model components. The ocean model would take much longer to stabilise/spin-up. |
The restructured directory is temporarily located in There are still a couple things to clean up (e.g. updating the The md5 hashes compared to the inputs in diff /g/data/access/payu/access-esm/input/pre-industrial/atmosphere/CABLE-AUX-1.4/core/biogeochem/pftlookup_csiro_v16_17tiles_wtlnds.csv /g/data/vk83/experiments/inputs/access-esm1p5/pre-industrial/atmosphere/CABLE-AUX-1.4/core/biogeochem/pftlookup_csiro_v16_17tiles_wtlnds.csv
15c15
< 12,0,newpft,,,,,,,,,,,,,,,,,,
---
> 12,0,not used,,,,,,,,,,,,,,,,,,
34c34
< 12,2,0.5,1.8,32.30769231,2,0.5,0.022,1.439682005,40,5,0.04,0.23,0.824,0.137,5,222.22,0.2,0.009919764,,
---
> 12,5.5,0.5,1.8,0,2,0.5,0,1,1,1,0.04,0.23,0.824,0.137,5,222.22,0.2,0.02,,
The one in
|
|
@blimlim Can you define short simulations? CASA only runs daily. Only the pftlookup file that is listed in the cable.nml will be used in the simulation anyway. |
Ah good point! The above simulations were 2 months long, and the figure shows the difference in the second month's mean surface temperatures. The |
@blimlim the differences seem to indicate the |
@MartinDix just checking whether
|
I don't think we need the extra level of the modern directory in the tree. Something like this should be enough historical Spectral files and vertlevs are going to be shared across everything, paleo and modern. |
Just wanted to check whether it would be clear enough to people which configurations use the We could otherwise go in the other direction and have more levels, e.g.
Let me know which would be preferable! |
I think I prefer your suggestion, though the suffixes on shared are probably unnecessary because it's implicit in the paths. How about this?
|
Are there any |
Sorry I think I originally misunderstood the question! Do you mean files that are used by the For managing file versions it might be easier if we avoid copies of files in different locations, so perhaps it would be best to have the |
On the same topic, I've been checking whether any other files currently under From David's example of setting up a miocene simulation, it looks like the following files weren't changed from the pre-industrial versions: Atmosphere:
Ocean:
@MartinDix and @ccarouge, do you know if any of the above atmosphere and land files would be suitable for the For the oceans, I think |
I think Regarding whether the same |
@blimlim I'll ask more about the land files. Actually, I'm not sure they are used. It's possible the code needs to find files with this name though even if not using the data that is inside. |
Here is what Pearse Buchanan (CSIRO), who is developing WOMBAT, says:
|
Thanks @dougiesquire and @ccarouge, I'm not 100% on which paleo configurations we're planning on having, but based on Pearse's comments it sounds like that the single
Just tested this out. The model runs without any issues and the first month of output looks identical with and without the I've made a new copy of the file and added it in to the directory |
I finally remembered to ask about the land files. The following files identified by @blimlim are needed for an ESM1.5 simulation:
I haven't queried the whole paleoclimate community, but it looks like these files are rarely modified. It does not look like they are part of the standard modifications for a paleo simulation. Obviously, there could be user-specific experiments that require modification but that's true for any input file. |
Thanks @ccarouge for looking into that! I'll move those files into the I think we're almost there! Before locking in a final structure, I'd like to get everyone's thoughts on what the convention/rule for the
I'm guessing that regardless of the conventions we choose, we'll run into unexpected complications eventually, though hopefully they will be informative for setting up ESM1.6. |
@blimlim these are good points. Given that we can imagine future situations that complicate both options, I'd suggest choosing the option that is cleanest and simplest for the configurations we currently know about. To me, that seems like option 1. Probably unhelpfully, there is also a third option to remove the |
Instead of If we have 2 config using file A for bgc params and 2 configs using file B for bgc params, we put both under share and then we need some naming convention to differentiate the bgc params. I would assume using a naming convention for the file that reflects the differences in origin (data or processing) instead of based on the configurations that use it. Edit: I ran out of time when I posted this. I don't know what are the requirements we want to reach, if they are:
Then I don't think we can avoid some file reorganisation when new configurations come in place. It won't be necessary all the time but it will be some time because there is no way we can know all the configurations we will support right now. Actually we might know for ESM1.5 and CM2 but not for versions 1.6 and 3. |
Thanks @dougiesquire and @ccarouge, those are good points and that's a good question about what the requirements actually are. I'm unsure whether it's more important to aim for
With option 1, As @ccarouge mentioned, some file reorganisation will likely be necessary when new configurations are added. A downside here is that each experiment collects its input files from paths listed in its
If we add a new configuration, and our directory "rules" make us rearrange the existing inputs, then any user's existing experiments using those inputs will break and they'll have to manually update the paths in their experiments. If this happened regularly... hopefully it wouldn't get too annoying. With option 2, I think the main downside is it adds difficulty for versioning files. E.g. if we have the same Perhaps another option would be to have just one real copy of each file located with the first configuration that used it, and then use symlinks for other configurations using that same file. If that file was updated, you could search for all the symlinks to it to see which other ones also needed the update. I guess this is still a bit messy though and leaves plenty of room for error. |
We should definitely not let this be a blocker. So choose a no-regrets option with a clear rationale. No-regrets in this context would be flexible in the future, minimise copies, but absolutely cannot break historical configurations. I'm not sure it is necessarily bad for paleo simulations to reference files in Clearly physical constants or data that is immutable over time should be in a top level To some extent it is useful to think about future scenarios/configs, but not too much. We can deal with them as they arise and fine-tune approaches. I think we can minimise copies by just referencing things in place (as noted above), and/or using soft or hard links. I actually don't have a massive problem with the occasional full copy if absolutely necessary (as long as the files in question aren't stupidly big). We can tell which files are identical from the md5 hashes. Does that answer all your questions @blimlim? Do you need a sign-off on final structure? Or are you happy to proceed from here? |
Ok thanks @aidanheerdegen I think that makes sense! |
Yes, I think so. The Does that make it clear or more opaque? |
Thank you, that clears it up! I'll finalise the structure and send the paths through to @CodeGat. |
Closing as I think this has been completed. |
The ACCESS-ESM1.5 input directories inherit a structure from the COECMS experiments
Details
They need to be rearranged to permit versioning and organised by function.
For example, the ACCESS-OM2 configurations have a top level that separates by function
and within each functional grouping it is split by extent and resolution, and then further by versions
The text was updated successfully, but these errors were encountered: