Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rearrange input directories #2

Closed
aidanheerdegen opened this issue Apr 22, 2024 · 48 comments
Closed

Rearrange input directories #2

aidanheerdegen opened this issue Apr 22, 2024 · 48 comments
Labels
priority:blocker type:release Required for next release

Comments

@aidanheerdegen
Copy link
Member

The ACCESS-ESM1.5 input directories inherit a structure from the COECMS experiments

Details

$ tree -L 3 /g/data/vk83/experiments/inputs/access-esm1p5/
/g/data/vk83/experiments/inputs/access-esm1p5/
├── common
│   └── ocean
│       ├── basin_mask.nc
│       ├── geothermal_heating.nc
│       ├── grid_spec.nc
│       ├── roughness_amp.nc
│       ├── ssw_atten_depth.nc
│       └── tideamp.nc
└── pre-industrial
    ├── atmosphere
    │   ├── BC_hi_1850_ESM1.anc
    │   ├── Bio_1850_ESM1.anc
    │   ├── biogenic_351sm.N96L38
    │   ├── CABLE-AUX-1.4
    │   ├── cable_vegfunc_N96.anc
    │   ├── DMS_conc.N96
    │   ├── Ndep_1850_ESM1.anc
    │   ├── OCFF_1850_ESM1.anc
    │   ├── ozone_1850_ESM1.anc
    │   ├── qrclim.slt
    │   ├── qrclim.smow
    │   ├── qrparm.mask
    │   ├── qrparm.soil_igbp_vg
    │   ├── scycl_1850_ESM1_v4.anc
    │   ├── SOURCES
    │   ├── spec3a_lw_hadgem1_6on
    │   ├── spec3a_sw_hadgem1_6on
    │   ├── stasets
    │   ├── STASHmaster
    │   ├── sulpc_oxidants_N96_L38
    │   ├── vertlevs_G3
    │   └── volcts_18502000ave.dat
    ├── coupler
    │   ├── areas.nc
    │   ├── cf_name_table.txt
    │   ├── grids.nc
    │   ├── masks.nc
    │   ├── rmp_cice_to_um1t_CONSERV_FRACNNEI.nc
    │   ├── rmp_cice_to_um1u_CONSERV_FRACNNEI.nc
    │   ├── rmp_cice_to_um1v_CONSERV_FRACNNEI.nc
    │   ├── rmp_um1t_to_cice_CONSERV_DESTAREA.nc
    │   ├── rmp_um1t_to_cice_CONSERV_FRACNNEI.nc
    │   ├── rmp_um1u_to_cice_CONSERV_FRACNNEI.nc
    │   └── rmp_um1v_to_cice_CONSERV_FRACNNEI.nc
    ├── ice
    │   ├── grid.nc
    │   ├── kmt.nc
    │   └── monthly_sstsss.nc
    ├── ocean
    │   ├── bgc_param.nc
    │   ├── cfc_auscom.nc
    │   ├── co2_obs.nc
    │   ├── dust.nc
    │   ├── ocmip2_abiotic_c14_atm_hist_om3_bc-1-9999.nc
    │   ├── ocmip2_fice_monthly_om1p5_bc.nc
    │   ├── ocmip2_press_monthly_om1p5_bc.nc
    │   ├── ocmip2_siple_co2_atm_am2_bc-1-9999.nc
    │   └── ocmip2_xkw_monthly_om1p5_bc.nc
    └── restart
        ├── atmosphere
        ├── coupler
        ├── ice
        └── ocean

15 directories, 48 files

They need to be rearranged to permit versioning and organised by function.

For example, the ACCESS-OM2 configurations have a top level that separates by function

$ tree -L 2 /g/data/vk83/experiments/inputs/access-om2/
/g/data/vk83/experiments/inputs/access-om2/
├── CHANGELOG
├── ice
│   ├── grids
│   ├── initial_conditions
│   └── initial_conditions_biogeochemistry
├── ocean
│   ├── biogeochemistry
│   ├── chlorophyll
│   ├── grids
│   ├── initial_conditions
│   ├── processor_masks
│   ├── surface_salt_restoring
│   └── tides
├── README.txt
└── remapping_weights
    └── JRA55

and within each functional grouping it is split by extent and resolution, and then further by versions

$ tree -L 5 /g/data/vk83/experiments/inputs/access-om2/ocean/grids/bathymetry/
/g/data/vk83/experiments/inputs/access-om2/ocean/grids/bathymetry/
├── global.01deg
│   └── 2020.05.30
│       ├── ocean_mask.nc
│       └── topog.nc
├── global.025deg
│   ├── 2020.11.02
│   │   ├── ocean_mask.nc
│   │   └── topog.nc
│   └── 2023.05.15
│       ├── ocean_mask.nc
│       ├── README
│       └── topog.nc
└── global.1deg
    └── 2020.10.22
        ├── ocean_mask.nc
        └── topog.nc
@aidanheerdegen aidanheerdegen added priority:blocker type:release Required for next release labels Apr 29, 2024
@aidanheerdegen
Copy link
Member Author

aidanheerdegen commented May 17, 2024

Here is a proposed directory structure

.
├── /g/data/vk83/experiments/inputs/access-esm1p5/
├── ocean/
│   ├── biogeochemistry/
│   │   └── global.1deg/
│   │       └── 2020.05.19/
│   │           ├── bgc_param.nc
│   │           ├── cfc_auscom.nc
│   │           ├── co2_obs.nc
│   │           ├── dust.nc      
│   │           ├── ocmip2_abiotic_c14_atm_hist_om3_bc-1-9999.nc
│   │           ├── ocmip2_fice_monthly_om1p5_bc.nc
│   │           ├── ocmip2_press_monthly_om1p5_bc.nc
│   │           ├── ocmip2_siple_co2_atm_am2_bc-1-9999.nc
│   │           └── ocmip2_xkw_monthly_om1p5_bc.nc
│   ├── grids/
│   │   └── mosaic/
│   │       └── global.1deg/
│   │           └── 2020.05.19/
│   │               └── grid_spec.nc
│   ├── shortwave_penetration/
│   │   └── global.1deg/
│   │       └── 2020.05.19/
│   │           └── ssw_atten_depth.nc
│   └── tides/
│       └── global.1deg/
│           └── 2020.05.19/
│               ├── roughness_amp.nc
│               └── tideamp.nc
├── atmosphere/
│   ├── land/
│   │   ├── biogeochem/
│   │   │   └── global.1deg/
│   │   │       └── 2020.05.19/
│   │   │           ├── modis_phenology_csiro.txt
│   │   │           └── pftlookup_csiro_v16_17tiles_wtlnds.csv
│   │   ├── biogeophys/
│   │   │   └── global.1deg/
│   │   │       └── 2020.05.19/
│   │   │           ├── def_soil_params.txt
│   │   │           └── def_veg_params.txt
│   │   ├── climatology/
│   │   │   └── global.1deg/
│   │   │       └── 2020.05.19/
│   │   │           ├── qrclim.slt
│   │   │           └── qrclim.smow
│   │   ├── soiltype/
│   │   │   └── global.1deg/
│   │   │       └── 2020.05.19/
│   │   │           └── qrparm.soil_igbp_vg
│   │   └── vegetation/
│   │       └── global.1deg/
│   │           └── 2020.05.19/
│   │               └── cable_vegfunc_N96.anc
│   ├── aerosol/
│   │   └── global.1deg/
│   │       └── 2020.05.19  /
│   │           ├── ozone_1850_ESM1.anc
│   │           ├── BC_hi_1850_ESM1.anc
│   │           ├── Bio_1850_ESM1.anc
│   │           ├── biogenic_351sm.N96L38
│   │           └── scycl_1850_ESM1_v4.anc
│   ├── chemistry/
│   │   └── global.1deg/
│   │       └── 2020.05.19/
│   │           ├── DMS_conc.N96
│   │           └── sulpc_oxidants_N96_L38
│   ├── grids/
│   │   └── global.1deg/
│   │       └── 2020.05.19/
│   │           ├── qrparm.mask
│   │           └── vertlevs_G3
│   ├── spectral/
│   │   └── global.1deg/
│   │       └── 2020.05.19/
│   │           ├── spec3a_lw_hadgem1_6on
│   │           └── spec3a_sw_hadgem1_6on
│   ├── forcing/
│   │   └── global.1deg/
│   │       └── 2020.05.19/
│   │           └── volcts_18502000ave.dat
│   └── unknown/
│       ├── Ndep_1850_ESM1.anc
│       └── OCFF_1850_ESM1.anc
├── coupler/
│   ├── grids/
│   │   └── global.1deg/
│   │       └── 2020.05.19/
│   │           ├── areas.nc
│   │           ├── grids.nc
│   │           └── masks.nc
│   └── remapping_weights/
│       └── global.1deg/
│           └── 2020.05.19  /
│               ├── rmp_cice_to_um1t_CONSERV_FRACNNEI.nc
│               ├── rmp_cice_to_um1u_CONSERV_FRACNNEI.nc
│               ├── rmp_cice_to_um1v_CONSERV_FRACNNEI.nc
│               ├── rmp_um1t_to_cice_CONSERV_DESTAREA.nc
│               ├── rmp_um1t_to_cice_CONSERV_FRACNNEI.nc
│               ├── rmp_um1u_to_cice_CONSERV_FRACNNEI.nc
│               └── rmp_um1v_to_cice_CONSERV_FRACNNEI.nc
├── ice/
│   ├── grids/
│   │   └── global.1deg/
│   │       └── 2020.05.19/
│   │           ├── grid.nc
│   │           └── kmt.nc
│   └── climatology/
│       └── global.1deg/
│           └── 2020.05.19/
│               └── monthly_sstsss.nc
├── stash/
│   ├── stasets
│   └── STASHmaster

It is viewable/editable here

I have just used the version (date) as the one in the COECMS access-esm original directory tree, but I know these are much older than that in general. I've also not checked all those dates thoroughly, just assumed they were all the same, but I would check when rearranging.

@MartinDix can you take a look and tell me what you think. I have pretty much ignored the fact that this is just the pre-industrial, and and waiting for another config to see what files are common and where we might need configuration specific trees.

Note I've done away with the historical "CABLE-AUX" structure, which would entail commensurate changes to paths in the config.

@aidanheerdegen
Copy link
Member Author

On second thoughts, it probably makes sense to have pre-industrial in the top level and do the same for other configs, and then try and figure out the commonalities ...

@MartinDix
Copy link

MartinDix commented May 20, 2024

I don't think the split between aerosol/chemistry is useful. ESM1.5 only has enough chemistry for the aerosol scheme so they're essentially the same thing. OCFF (organic carbon from fossil fuel) is an aerosol field. Ndep is nitrogen deposition, so probably part of land/biogeochem.

The model shouldn't be using qrclim.slt, qrclim.smow (soil temperature and moisture climatologies) because it's starting from a complete restart file. Similarly with the ice monthly_sstsss.nc. I'll double check this.

ozone probably sits better in forcing than in aerosol. It's also independent of the ocean model resolution. The various txt and csv files are also resolution independent. Do you want to separate out the fields that depend on the land sea mask and so the ocean model resolution?

stash is logically part of the atmosphere (resolution independent again).

@aidanheerdegen
Copy link
Member Author

Do you want to separate out the fields that depend on the land sea mask and so the ocean model resolution?

Sounds like a good idea for clarity and to make it easier for those who want to alter the land/sea mask, and also avoid unnecessary duplication when creating a new configuration with a different resolution

stash is logically part of the atmosphere (resolution independent again).

I wasn't sure if STASHmaster was reference data that was pretty much global for all configurations with a roughly similar UM version. Which I guess means it should be under atmosphere as you suggest. Will change. I've not put resolution_independent as it seems redundant, but we can for consistency/clarity.

Does spectral make sense as a category? Couldn't think of a better name or another category to plonk it in.

Here is the updated organisation

.
├── /g/data/vk83/experiments/inputs/access-esm1p5/
├── ocean/
│   ├── biogeochemistry/
│   │   └── global.1deg/
│   │       └── 2020.05.19/
│   │           ├── bgc_param.nc
│   │           ├── cfc_auscom.nc
│   │           ├── co2_obs.nc
│   │           ├── dust.nc      
│   │           ├── ocmip2_abiotic_c14_atm_hist_om3_bc-1-9999.nc
│   │           ├── ocmip2_fice_monthly_om1p5_bc.nc
│   │           ├── ocmip2_press_monthly_om1p5_bc.nc
│   │           ├── ocmip2_siple_co2_atm_am2_bc-1-9999.nc
│   │           └── ocmip2_xkw_monthly_om1p5_bc.nc
│   ├── grids/
│   │   └── mosaic/
│   │       └── global.1deg/
│   │           └── 2020.05.19/
│   │               └── grid_spec.nc
│   ├── shortwave_penetration/
│   │   └── global.1deg/
│   │       └── 2020.05.19/
│   │           └── ssw_atten_depth.nc
│   └── tides/
│       └── global.1deg/
│           └── 2020.05.19/
│               ├── roughness_amp.nc
│               └── tideamp.nc
├── atmosphere/
│   ├── land/
│   │   ├── biogeochem/
│   │   │   ├── resolution_independent/
│   │   │   │   └── 2020.05.19/
│   │   │   │       ├── modis_phenology_csiro.txt
│   │   │   │       └── pftlookup_csiro_v16_17tiles_wtlnds.csv        
│   │   │   └── global.1deg/
│   │   │       └── 2020.05.19/
│   │   │           └── Ndep_1850_ESM1.anc
│   │   ├── biogeophys/
│   │   │   └── resolution_independent/
│   │   │       └── 2020.05.19/
│   │   │           ├── def_soil_params.txt
│   │   │           └── def_veg_params.txt
│   │   ├── soiltype/
│   │   │   └── global.1deg/
│   │   │       └── 2020.05.19/
│   │   │           └── qrparm.soil_igbp_vg
│   │   └── vegetation/
│   │       └── global.1deg/
│   │           └── 2020.05.19/
│   │               └── cable_vegfunc_N96.anc
│   ├── aerosol/
│   │   └── global.1deg/
│   │       └── 2020.05.19/
│   │           ├── BC_hi_1850_ESM1.anc
│   │           ├── Bio_1850_ESM1.anc
│   │           ├── biogenic_351sm.N96L38
│   │           ├── scycl_1850_ESM1_v4.anc
│   │           ├── DMS_conc.N96
│   │           ├── sulpc_oxidants_N96_L38
│   │           └── OCFF_1850_ESM1.anc
│   ├── grids/
│   │   └── global.1deg/
│   │       └── 2020.05.19/
│   │           ├── qrparm.mask
│   │           └── vertlevs_G3
│   ├── spectral/
│   │   └── global.1deg/
│   │       └── 2020.05.19/
│   │           ├── spec3a_lw_hadgem1_6on
│   │           └── spec3a_sw_hadgem1_6on
│   ├── forcing/
│   │   ├── resolution_independent/
│   │   │   └── 2020.05.19/
│   │   │       └── ozone_1850_ESM1.anc
│   │   └── global.1deg/
│   │       └── 2020.05.19/
│   │           └── volcts_18502000ave.dat
│   └── stash/
│       └── 2020.05.19/
│           ├── stasets
│           └── STASHmaster
├── coupler/
│   ├── grids/
│   │   └── global.1deg/
│   │       └── 2020.05.19/
│   │           ├── areas.nc
│   │           ├── grids.nc
│   │           └── masks.nc
│   └── remapping_weights/
│       └── global.1deg/
│           └── 2020.05.19  /
│               ├── rmp_cice_to_um1t_CONSERV_FRACNNEI.nc
│               ├── rmp_cice_to_um1u_CONSERV_FRACNNEI.nc
│               ├── rmp_cice_to_um1v_CONSERV_FRACNNEI.nc
│               ├── rmp_um1t_to_cice_CONSERV_DESTAREA.nc
│               ├── rmp_um1t_to_cice_CONSERV_FRACNNEI.nc
│               ├── rmp_um1u_to_cice_CONSERV_FRACNNEI.nc
│               └── rmp_um1v_to_cice_CONSERV_FRACNNEI.nc
├── ice/
│   └── grids/
│       └── global.1deg/
│           └── 2020.05.19/
│               ├── grid.nc
│               └── kmt.nc
└── /g/data/vk83/experiments/restarts/access-esm1p5/
    ├── atmosphere
    ├── coupler
    └── ice

You should able to access and edit a copy of this using this link:

https://shorturl.at/YVx4v

@aidanheerdegen
Copy link
Member Author

aidanheerdegen commented Jun 4, 2024

You can just cut n' paste this into https://tree.nathanfriend.io/

Details

/g/data/vk83/experiments/inputs/access-esm1p5/
ocean
  biogeochemistry
    global.1deg
      2020.05.19
        bgc_param.nc
        cfc_auscom.nc
        co2_obs.nc
        dust.nc      
        ocmip2_abiotic_c14_atm_hist_om3_bc-1-9999.nc
        ocmip2_fice_monthly_om1p5_bc.nc
        ocmip2_press_monthly_om1p5_bc.nc
        ocmip2_siple_co2_atm_am2_bc-1-9999.nc
        ocmip2_xkw_monthly_om1p5_bc.nc
  grids
    mosaic
      global.1deg
        2020.05.19
          grid_spec.nc
  shortwave_penetration
    global.1deg
      2020.05.19
        ssw_atten_depth.nc
  tides
    global.1deg
      2020.05.19
        roughness_amp.nc
        tideamp.nc
atmosphere
  land
    biogeochem
      resolution_independent
        2020.05.19
          modis_phenology_csiro.txt
          pftlookup_csiro_v16_17tiles_wtlnds.csv        
      global.1deg
        2020.05.19
          Ndep_1850_ESM1.anc
    biogeophys
      resolution_independent
        2020.05.19
          def_soil_params.txt
          def_veg_params.txt
    soiltype
      global.1deg
        2020.05.19
          qrparm.soil_igbp_vg
    vegetation
      global.1deg
        2020.05.19
          cable_vegfunc_N96.anc
  aerosol
    global.1deg
      2020.05.19
        BC_hi_1850_ESM1.anc
        Bio_1850_ESM1.anc
        biogenic_351sm.N96L38
        scycl_1850_ESM1_v4.anc
        DMS_conc.N96
        sulpc_oxidants_N96_L38
        OCFF_1850_ESM1.anc
  grids
    global.1deg
      2020.05.19
        qrparm.mask
        vertlevs_G3
  spectral
    global.1deg
      2020.05.19
        spec3a_lw_hadgem1_6on
        spec3a_sw_hadgem1_6on
  forcing
    resolution_independent
      2020.05.19
        ozone_1850_ESM1.anc
    global.1deg
      2020.05.19
        volcts_18502000ave.dat
  stash
    2020.05.19
      stasets
      STASHmaster
coupler
  grids
    global.1deg
      2020.05.19
        areas.nc
        grids.nc
        masks.nc
  remapping_weights
    global.1deg
      2020.05.19  
        rmp_cice_to_um1t_CONSERV_FRACNNEI.nc
        rmp_cice_to_um1u_CONSERV_FRACNNEI.nc
        rmp_cice_to_um1v_CONSERV_FRACNNEI.nc
        rmp_um1t_to_cice_CONSERV_DESTAREA.nc
        rmp_um1t_to_cice_CONSERV_FRACNNEI.nc
        rmp_um1u_to_cice_CONSERV_FRACNNEI.nc
        rmp_um1v_to_cice_CONSERV_FRACNNEI.nc
ice
  grids
    global.1deg
      2020.05.19
        grid.nc
        kmt.nc

/g/data/vk83/experiments/restarts/access-esm1p5/
  atmosphere
  coupler
  ice

@blimlim
Copy link

blimlim commented Jun 4, 2024

Hi Aidan and Martin, I just have a few questions about the proposed structure:

  1. To clarify the idea behind the resolution categories - is the idea to separate files based on the resolution of their related model? E.g would we like to use global.N96 for the atmosphere inputs rather than global.1deg? Or instead, is the idea to highlight which files depend specifically on the ocean model's resolution?

Relatedly, what would be sensible for categorising the coupling files which use all of the grids? Would it make sense to note down both resolutions here, e.g. A.global.N96_O.global.1deg?

  1. If we want to additionally separate out input files which depend on the land sea mask, would the idea be to add another level in the directory tree, e.g:
└── vegetation/
    └── global.1deg/
        ├── lsm_dependent/
        │   └── 2020.05.19/
        │       └── cable_vegfunc_N96.anc
        └── lsm_independent/
            └── ...

Would highlighting the land-sea mask dependence in the directory tree be preferable to just noting it somewhere in the documentation?

  1. Following on from 2, would our intention be to highlight the files which absolutely must be changed when the land sea mask changes (e.g. I'm guessing cable_vegfunc_N96.anc fits this category), or instead files which in theory should be changed alongside the land sea mask, but the model will still run even if they are unchanged. I wonder whether something like the DMS_conc.N96 would fall under this second category:

download

Let me know if I might have misunderstood/misinterpreted anything here.

@aidanheerdegen
Copy link
Member Author

  1. would we like to use global.N96 for the atmosphere inputs rather than global.1deg?

Good point! This shows my ocean-bias, and yes I think it would be better to use something that makes sense from an atmosphere point of view.

Relatedly, what would be sensible for categorising the coupling files which use all of the grids? Would it make sense to note down both resolutions here, e.g. A.global.N96_O.global.1deg

Again, a good suggestion. is global twice redundant? Wondering out loud if we'd ever have a remapping file from global to regional or vice-versa?

Would highlighting the land-sea mask dependence in the directory tree be preferable to just noting it somewhere in the documentation?

Very probably. I didn't know this was the case, which shows the value of highlighting it.

or instead files which in theory should be changed alongside the land sea mask, but the model will still run even if they are unchanged

Another excellent point. I'd say anything that should be changed should also be categorised as such.

@blimlim
Copy link

blimlim commented Jun 14, 2024

I've put together a copy of the input directory with an updated structure, located currently in /g/data/tm70/sw6175/esm1p5-input-restructure/restructured-inputs/pre-industrial.

The tree for this new directory is:

└── pre-industrial
    ├── atmosphere
    │   ├── aerosol
    │   │   └── global.N96
    │   │       └── 2020.05.19
    │   │           ├── BC_hi_1850_ESM1.anc
    │   │           ├── Bio_1850_ESM1.anc
    │   │           ├── biogenic_351sm.N96L38
    │   │           ├── DMS_conc.N96
    │   │           ├── OCFF_1850_ESM1.anc
    │   │           ├── scycl_1850_ESM1_v4.anc
    │   │           └── sulpc_oxidants_N96_L38
    │   ├── forcing
    │   │   ├── global.N96
    │   │   │   └── 2020.05.19
    │   │   │       └── ozone_1850_ESM1.anc
    │   │   └── resolution_independent
    │   │       └── 2020.05.19
    │   │           └── volcts_18502000ave.dat
    │   ├── grids
    │   │   └── global.N96
    │   │       └── 2020.05.19
    │   │           ├── qrparm.mask
    │   │           └── vertlevs_G3
    │   ├── land
    │   │   ├── biogeochem
    │   │   │   ├── global.N96
    │   │   │   │   └── 2020.05.19
    │   │   │   │       └── Ndep_1850_ESM1.anc
    │   │   │   └── resolution_independent
    │   │   │       └── 2020.05.19
    │   │   │           ├── modis_phenology_csiro.txt
    │   │   │           ├── pftlookup_csiro_v16_17tiles.csv
    │   │   │           ├── pftlookup_csiro_v16_17tiles_spinup.csv
    │   │   │           ├── pftlookup_csiro_v16_17tiles_wtlnds.csv
    │   │   │           └── poolcnpInTumbarumba.csv
    │   │   ├── biogeophys
    │   │   │   └── resolution_independent
    │   │   │       └── 2020.05.19
    │   │   │           ├── def_soil_params.txt
    │   │   │           └── def_veg_params.txt
    │   │   ├── soiltype
    │   │   │   └── global.N96
    │   │   │       └── 2020.05.19
    │   │   │           └── qrparm.soil_igbp_vg
    │   │   └── vegetation
    │   │       └── global.N96
    │   │           └── 2020.05.19
    │   │               └── cable_vegfunc_N96.anc
    │   ├── SOURCES
    │   ├── spectral
    │   │   └── resolution_independent
    │   │       └── 2020.05.19
    │   │           ├── spec3a_lw_hadgem1_6on
    │   │           └── spec3a_sw_hadgem1_6on
    │   └── stash
    │       └── 2020.05.19
    │           ├── stasets
    │           └── STASHmaster
    ├── coupler
    │   ├── cf_name_table.txt
    │   ├── grids
    │   │   └── global.o.1deg_a.N96
    │   │       └── 2020.05.19
    │   │           ├── areas.nc
    │   │           ├── grids.nc
    │   │           └── masks.nc
    │   └── remapping_weights
    │       └── global.o.1deg_a.N96
    │           └── 2020.05.19
    │               ├── rmp_cice_to_um1t_CONSERV_FRACNNEI.nc
    │               ├── rmp_cice_to_um1u_CONSERV_FRACNNEI.nc
    │               ├── rmp_cice_to_um1v_CONSERV_FRACNNEI.nc
    │               ├── rmp_um1t_to_cice_CONSERV_DESTAREA.nc
    │               ├── rmp_um1t_to_cice_CONSERV_FRACNNEI.nc
    │               ├── rmp_um1u_to_cice_CONSERV_FRACNNEI.nc
    │               └── rmp_um1v_to_cice_CONSERV_FRACNNEI.nc
    ├── ice
    │   ├── grids
    │   │   └── global.1deg
    │   │       └── 2020.05.19
    │   │           ├── grid.nc
    │   │           └── kmt.nc
    ├── ocean
    │   ├── basin_mask.nc
    │   ├── biogeochemistry
    │   │   └── global.1deg
    │   │       └── 2020.05.19
    │   │           ├── bgc_param.nc
    │   │           ├── cfc_auscom.nc
    │   │           ├── co2_obs.nc
    │   │           ├── dust.nc
    │   │           ├── ocmip2_abiotic_c14_atm_hist_om3_bc-1-9999.nc
    │   │           ├── ocmip2_fice_monthly_om1p5_bc.nc
    │   │           ├── ocmip2_press_monthly_om1p5_bc.nc
    │   │           ├── ocmip2_siple_co2_atm_am2_bc-1-9999.nc
    │   │           └── ocmip2_xkw_monthly_om1p5_bc.nc
    │   ├── geothermal_heating.nc
    │   ├── grids
    │   │   └── mosaic
    │   │       └── global.1deg
    │   │           └── 2020.05.19
    │   │               └── grid_spec.nc
    │   ├── shortwave_penetration
    │   │   └── global.1deg
    │   │       └── 2020.05.19
    │   │           └── ssw_atten_depth.nc
    │   └── tides
    │       └── global.1deg
    │           └── 2020.05.19
    │               ├── roughness_amp.nc
    │               └── tideamp.nc
    └── restart
        ├── atmosphere
        ├── coupler
        ├── ice
        └── ocean

Comments

  • The tree follows the previous structure by @aidanheerdegen, though with changes to fix up the resolution dependence for a few files.
  • Version dates -I've searched for earlier copies of each of the inputs in the /g/data/access directory, and listed the modification dates and paths of any identical files in the collapsed table below. The files likely go back much further than the listed dates, (e.g. tideamp.nc matches /g/data/access/projects/access/data/ACCESS_CMIP5/mom4/tides_auscom_20080605.nc which might date back to 2008). Martin noted that we're unlikely to find the original creation dates for most files, and that the 2020.05.19 date is reasonable for us to use here.
Model Section Filename Earliest found date Date notes
ocean biogeochemistry bgc_param.nc    
ocean biogeochemistry cfc_auscom.nc 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/bgc_y0001/cfc_auscom_20110418.nc
ocean biogeochemistry co2_obs.nc 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/bgc_y0001/co2_obs.nc
ocean biogeochemistry dust.nc 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/bgc_y0001/dust.nc
ocean biogeochemistry ocmip2_abiotic_c14_atm_hist_om3_bc-1-9999.nc 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/bgc_y0001/ocmip2_abiotic_c14_atm_hist_om3_bc-1-9999.nc
ocean biogeochemistry ocmip2_fice_monthly_om1p5_bc.nc 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/bgc_y0001/ocmip2_fice_monthly_om1p5_bc.nc
ocean biogeochemistry ocmip2_press_monthly_om1p5_bc.nc 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/bgc_y0001/ocmip2_press_monthly_om1p5_bc.nc
ocean biogeochemistry ocmip2_siple_co2_atm_am2_bc-1-9999.nc 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/bgc_y0001/ocmip2_siple_co2_atm_am2_bc-1-9999.nc
ocean biogeochemistry ocmip2_xkw_monthly_om1p5_bc.nc 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/bgc_y0001/ocmip2_xkw_monthly_om1p5_bc.nc
ocean grids grid_spec.nc 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/mom4/grid_spec.auscom.20110618.nc
ocean shorwave penetration ssw_atten_depth.nc    
ocean tides roughness_amp.nc 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/mom4/roughness_auscom_20080605_roughness_amp.nc
ocean tides tideamp.nc 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/mom4/tides_auscom_20080605.nc
atmosphere land - biogeochem modis_phenology_csiro.txt 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/CABLE-AUX-1.4/core/biogeochem/modis_phenology_csiro.txt
atmosphere land - biogeochem pftlookup_csiro_v16_17tiles_wtlnds.csv 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/CABLE-AUX-1.4/core/biogeochem/pftlookup_csiro_v16_17tiles_wtlnds.csv
atmosphere land - biogeochem 2020.01.13 2018.12.19 /g/data/p66/txz599/data/ancil/CMIP6/Ndep_1850_ESM1.anc
atmosphere land - biogeophys def_soil_params.txt 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/CABLE-AUX-1.4/core/biogeophys/def_soil_params.txt
atmosphere land - biogeophys def_veg_params.txt 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/CABLE-AUX-1.4/core/biogeophys/def_veg_params.txt
atmosphere land - soiltype qrparm.soil_igbp_vg 2020.05.19 the file that SOURCES points to doesn't seem to exist. So just have used /g/data/access/payu/access-esm/input/pre-industrial/atmosphere/qrparm.soil_igbp_vg
atmosphere land - vegetation cable_vegfunc_N96.anc 2019.12.22 /g/data/access/projects/access/data/ancil/access_v2/cable_vegfunc_N96.anc
atmosphere aerosol BC_hi_1850_ESM1.anc 2018.12.19 /g/data1a/p66/txz599/data/ancil/CMIP6/BC_hi_1850_ESM1.anc
atmosphere aerosol Bio_1850_ESM1.anc 2018.12.19 /g/data1a/p66/txz599/data/ancil/CMIP6/Bio_1850_ESM1.anc
atmosphere aerosol biogenic_351sm.N96L38 2020.01.21 /g/data/access/projects/access/data/ancil/HadGEM3_cal365/biogenic_351sm.N96L38
atmosphere aerosol scycl_1850_ESM1_v4.anc 2018.12.19 /g/data1a/p66/txz599/data/ancil/CMIP6/scycl_1850_ESM1_v4.anc
atmosphere aerosol DMS_conc.N96 2020.01.21 /g/data/access/projects/access/data/ancil/HadGEM3_cal365/DMS_conc.N96
atmosphere aerosol sulpc_oxidants_N96_L38 2020.01.21 /g/data/access/projects/access/data/ancil/HadGEM3_cal365/sulpc_oxidants_N96_L38
atmosphere aerosol OCFF_1850_ESM1.anc 2018.12.19 /g/data1a/p66/txz599/data/ancil/CMIP6/OCFF_1850_ESM1.anc
atmosphere grids qrparm.mask 2019.11.25 /g/data/access/projects/access/data/ancil/access_v2/qrparm.mask
atmosphere grids vertlevs_G3 2019.11.22 /g/data/access/projects/access/umdir/vn7.3/ctldata/vert/vertlevs_G3
atmosphere spectral spec3a_lw_hadgem1_6on 2020.01.21 /g/data/access/projects/access/data/ancil/HadGEM3_cal365/spec3a_lw_hadgem1_6on
atmosphere spectral spec3a_sw_hadgem1_6on 2020.01.21 /g/data/access/projects/access/data/ancil/HadGEM3_cal365/spec3a_sw_hadgem1_6on
atmosphere forcing ozone_1850_ESM1.anc 2018.12.19 /g/data/p66/txz599/data/ancil/CMIP6/ozone_1850_ESM1.anc
atmosphere forcing volcts_18502000ave.dat 2020.01.21 /g/data/access/projects/access/data/ancil/CMIP5/volcts_18502000ave.dat
atmosphere stash stasets 2019.11.22 /g/data/access/projects/access/umdir/vn7.3/ctldata/stasets
atmosphere stash STASHmaster 2019.11.22 /g/data/access/projects/access/umdir/vn7.3/ctldata/STASHmaster
coupler grids areas.nc 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/oasis3/oasis3_areas_20101208.nc
coupler grids grids.nc 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/oasis3/oasis3_grids_20101208.nc
coupler grids masks.nc 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/oasis3/oasis3_masks_20101208.nc
coupler remapping_weights rmp_cice_to_um1t_CONSERV_FRACNNEI.nc 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/oasis3-mct/rmp_cice_to_um1t_CONSERV_FRACNNEI.nc
coupler remapping_weights rmp_cice_to_um1u_CONSERV_FRACNNEI.nc 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/oasis3-mct/rmp_cice_to_um1u_CONSERV_FRACNNEI.nc
coupler remapping_weights rmp_cice_to_um1v_CONSERV_FRACNNEI.nc 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/oasis3-mct/rmp_cice_to_um1v_CONSERV_FRACNNEI.nc
coupler remapping_weights rmp_um1t_to_cice_CONSERV_DESTAREA.nc 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/oasis3-mct/rmp_um1t_to_cice_CONSERV_DESTAREA.nc
coupler remapping_weights rmp_um1t_to_cice_CONSERV_FRACNNEI.nc 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/oasis3-mct/rmp_um1t_to_cice_CONSERV_FRACNNEI.nc
coupler remapping_weights rmp_um1u_to_cice_CONSERV_FRACNNEI.nc 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/oasis3-mct/rmp_um1u_to_cice_CONSERV_FRACNNEI.nc
coupler remapping_weights rmp_um1v_to_cice_CONSERV_FRACNNEI.nc 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/oasis3-mct/rmp_um1v_to_cice_CONSERV_FRACNNEI.nc
ice grids grid.nc 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/cice/cice_grid_20101208.nc
ice grids kmt.nc 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/cice/cice_kmt_20101208.nc
         
         
files not in the github directory tree which are in the input directory    
atmosphere land - biogeochem pftlookup_csiro_v16_17tiles_spinup.csv 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/CABLE-AUX-1.4/core/biogeochem/pftlookup_csiro_v16_17tiles_spinup.csv
atmosphere land - biogeochem poolcnpInTumbarumba.csv 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/CABLE-AUX-1.4/core/biogeochem/poolcnpInTumbarumba.csv
atmosphere land - biogeochem pftlookup_csiro_v16_17tiles.csv 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/CABLE-AUX-1.4/core/biogeochem/pftlookup_csiro_v16_17tiles.csv
ocean ? basin_mask.nc 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/mom4/basin_mask_20111103.nc
ocean ? geothermal_heating.nc 2020.01.13 /g/data/access/projects/access/data/ACCESS_CMIP5/mom4/geothermal_heating_auscom_20080605.nc
  • Land sea mask dependence - following discussion with Martin, we've decided to currently leave the land-sea mask dependence out of the directory structure. I think one reason was that it's not frequently changed, though correct me here @MartinDix if I've missed anything here.

Questions There are several things I'm unsure about:

  • The atmosphere directory originally contained a SOURCES file, listing where the atmospheric inputs had been copied from. Do we want to keep this somewhere, and perhaps update it to include the sources I found for the ocean and other data (some files I couldn't find anything for though), or just leave it out?
  • Just to double check: A few files weren't located in the original /g/data/vk83/experiments/inputs/access-esm1p5/pre-industrial directory, but were in the .../inputs/access-esm1p5/common/ (namely the ocean files basin_mask.nc, geothermal_heating.nc, grid_spec.nc, roughness_amp.nc, ssw_atten_depth.nc, and tideamp.nc). I've copied them from the common directory – just wanted to confirm that these are the correct versions to use.
  • A few files were present in the input directory but not in the previous proposed file trees, namely: basin_mask.nc, geothermal_heating.nc in the ocean directory, and cf_nametable.txt in the coupler directory. @aidanheerdegen, do you know if the pre-industrial configuration needs these files? They are listed in the inputs.yaml manifest file, but geothermal heating appears to be switched off in the ocean namelist. edit: Just saw this previous issue - if we're happy to remove these files I'll omit them from the new directory.
  • Is our categorisation of the ocean grid_spec.nc file under ocean/grids/mosaic/ correct? It matches the classification for the OM2 file of the same name, however the pre-industrial and om2 grid_spec.nc files are quite different. I've included ncdumps of each in the dropdowns below. The pre-industrial file doesn't mention any mosaics, and so I'm unsure whether our classification is correct.

pre-industrial/ocean/grids/mosaic/global.1deg/2020.05.19/grid_spec.nc:


netcdf grid_spec {
dimensions:
	zt = 50 ;
	zb = 50 ;
	grid_x_T = 360 ;
	grid_y_T = 300 ;
	grid_x_C = 360 ;
	grid_y_C = 300 ;
	vertex = 4 ;
	i_atmXocn = 111817 ;
	i_atmXlnd = 5285 ;
	i_lndXocn = 111817 ;
	xba = 145 ;
	yba = 91 ;
	xta = 144 ;
	yta = 90 ;
	xbl = 145 ;
	ybl = 91 ;
	xtl = 144 ;
	ytl = 90 ;
	xto = 360 ;
	yto = 300 ;
variables:
	float zt(zt) ;
		zt:long_name = "zt" ;
		zt:units = "meters" ;
		zt:cartesian_axis = "z" ;
		zt:positive = "down" ;
	float zb(zb) ;
		zb:long_name = "zb" ;
		zb:units = "meters" ;
		zb:cartesian_axis = "z" ;
		zb:positive = "down" ;
	float grid_x_T(grid_x_T) ;
		grid_x_T:long_name = "Nominal Longitude of T-cell center" ;
		grid_x_T:units = "degree_east" ;
		grid_x_T:cartesian_axis = "X" ;
	float grid_y_T(grid_y_T) ;
		grid_y_T:long_name = "Nominal Latitude of T-cell center" ;
		grid_y_T:units = "degree_north" ;
		grid_y_T:cartesian_axis = "Y" ;
	float grid_x_C(grid_x_C) ;
		grid_x_C:long_name = "Nominal Longitude of C-cell center" ;
		grid_x_C:units = "degree_east" ;
		grid_x_C:cartesian_axis = "X" ;
	float grid_y_C(grid_y_C) ;
		grid_y_C:long_name = "Nominal Latitude of C-cell center" ;
		grid_y_C:units = "degree_north" ;
		grid_y_C:cartesian_axis = "Y" ;
	float vertex(vertex) ;
		vertex:long_name = "Vertex position from southwest couterclockwise" ;
		vertex:units = "none" ;
	double x_T(grid_y_T, grid_x_T) ;
		x_T:long_name = "Geographic longitude of T_cell centers" ;
		x_T:units = "degree_east" ;
	double y_T(grid_y_T, grid_x_T) ;
		y_T:long_name = "Geographic latitude of T_cell centers" ;
		y_T:units = "degree_north" ;
	double x_vert_T(vertex, grid_y_T, grid_x_T) ;
		x_vert_T:long_name = "Geographic longitude of T_cell vertices begin southwest counterclockwise" ;
		x_vert_T:units = "degree_east" ;
	double y_vert_T(vertex, grid_y_T, grid_x_T) ;
		y_vert_T:long_name = "Geographic latitude of T_cell vertices begin southwest counterclockwise" ;
		y_vert_T:units = "degree_north" ;
	double area_T(grid_y_T, grid_x_T) ;
		area_T:long_name = "Area of T_cell" ;
		area_T:units = "m2" ;
	double angle_T(grid_y_T, grid_x_T) ;
		angle_T:long_name = "Angle clockwise between logical and geographic east of T_cell" ;
		angle_T:units = "degree" ;
	double ds_00_02_T(grid_y_T, grid_x_T) ;
		ds_00_02_T:long_name = "Length of western face of T_cell" ;
		ds_00_02_T:units = "m" ;
	double ds_20_22_T(grid_y_T, grid_x_T) ;
		ds_20_22_T:long_name = "Length of eastern face of T_cell" ;
		ds_20_22_T:units = "m" ;
	double ds_02_22_T(grid_y_T, grid_x_T) ;
		ds_02_22_T:long_name = "Length of northern face of T_cell" ;
		ds_02_22_T:units = "m" ;
	double ds_00_20_T(grid_y_T, grid_x_T) ;
		ds_00_20_T:long_name = "Length of southern face of T_cell" ;
		ds_00_20_T:units = "m" ;
	double ds_00_01_T(grid_y_T, grid_x_T) ;
		ds_00_01_T:long_name = "Distance from southwest corner to western face center of T_cell" ;
		ds_00_01_T:units = "m" ;
	double ds_01_02_T(grid_y_T, grid_x_T) ;
		ds_01_02_T:long_name = "Distance from northwest corner to western face center of T_cell" ;
		ds_01_02_T:units = "m" ;
	double ds_02_12_T(grid_y_T, grid_x_T) ;
		ds_02_12_T:long_name = "Distance from northwest corner to northern face center of T_cell" ;
		ds_02_12_T:units = "m" ;
	double ds_12_22_T(grid_y_T, grid_x_T) ;
		ds_12_22_T:long_name = "Distance from northeast corner to northern face center of T_cell" ;
		ds_12_22_T:units = "m" ;
	double ds_21_22_T(grid_y_T, grid_x_T) ;
		ds_21_22_T:long_name = "Distance from northeast corner to eastern face center of T_cell" ;
		ds_21_22_T:units = "m" ;
	double ds_20_21_T(grid_y_T, grid_x_T) ;
		ds_20_21_T:long_name = "Distance from southeast corner to eastern face center of T_cell" ;
		ds_20_21_T:units = "m" ;
	double ds_10_20_T(grid_y_T, grid_x_T) ;
		ds_10_20_T:long_name = "Distance from southeast corner to southern face center of T_cell" ;
		ds_10_20_T:units = "m" ;
	double ds_00_10_T(grid_y_T, grid_x_T) ;
		ds_00_10_T:long_name = "Distance from southwest corner to southern face center of T_cell" ;
		ds_00_10_T:units = "m" ;
	double ds_01_11_T(grid_y_T, grid_x_T) ;
		ds_01_11_T:long_name = "Distance from center to western face of T_cell" ;
		ds_01_11_T:units = "m" ;
	double ds_11_12_T(grid_y_T, grid_x_T) ;
		ds_11_12_T:long_name = "Distance from center to northern face of T_cell" ;
		ds_11_12_T:units = "m" ;
	double ds_11_21_T(grid_y_T, grid_x_T) ;
		ds_11_21_T:long_name = "Distance from center to eastern face of T_cell" ;
		ds_11_21_T:units = "m" ;
	double ds_10_11_T(grid_y_T, grid_x_T) ;
		ds_10_11_T:long_name = "Distance from center to southern face of T_cell" ;
		ds_10_11_T:units = "m" ;
	double ds_01_21_T(grid_y_T, grid_x_T) ;
		ds_01_21_T:long_name = "width of T_cell" ;
		ds_01_21_T:units = "m" ;
	double ds_10_12_T(grid_y_T, grid_x_T) ;
		ds_10_12_T:long_name = "height of T_cell" ;
		ds_10_12_T:units = "m" ;
	double x_E(grid_y_T, grid_x_C) ;
		x_E:long_name = "Geographic longitude of E_cell centers" ;
		x_E:units = "degree_east" ;
	double y_E(grid_y_T, grid_x_C) ;
		y_E:long_name = "Geographic latitude of E_cell centers" ;
		y_E:units = "degree_north" ;
	double x_vert_E(vertex, grid_y_T, grid_x_C) ;
		x_vert_E:long_name = "Geographic longitude of E_cell vertices begin southwest counterclockwise" ;
		x_vert_E:units = "degree_east" ;
	double y_vert_E(vertex, grid_y_T, grid_x_C) ;
		y_vert_E:long_name = "Geographic latitude of E_cell vertices begin southwest counterclockwise" ;
		y_vert_E:units = "degree_north" ;
	double area_E(grid_y_T, grid_x_C) ;
		area_E:long_name = "Area of E_cell" ;
		area_E:units = "m2" ;
	double angle_E(grid_y_T, grid_x_C) ;
		angle_E:long_name = "Angle clockwise between logical and geographic east of E_cell" ;
		angle_E:units = "degree" ;
	double ds_00_02_E(grid_y_T, grid_x_C) ;
		ds_00_02_E:long_name = "Length of western face of E_cell" ;
		ds_00_02_E:units = "m" ;
	double ds_20_22_E(grid_y_T, grid_x_C) ;
		ds_20_22_E:long_name = "Length of eastern face of E_cell" ;
		ds_20_22_E:units = "m" ;
	double ds_02_22_E(grid_y_T, grid_x_C) ;
		ds_02_22_E:long_name = "Length of northern face of E_cell" ;
		ds_02_22_E:units = "m" ;
	double ds_00_20_E(grid_y_T, grid_x_C) ;
		ds_00_20_E:long_name = "Length of southern face of E_cell" ;
		ds_00_20_E:units = "m" ;
	double ds_00_01_E(grid_y_T, grid_x_C) ;
		ds_00_01_E:long_name = "Distance from southwest corner to western face center of E_cell" ;
		ds_00_01_E:units = "m" ;
	double ds_01_02_E(grid_y_T, grid_x_C) ;
		ds_01_02_E:long_name = "Distance from northwest corner to western face center of E_cell" ;
		ds_01_02_E:units = "m" ;
	double ds_02_12_E(grid_y_T, grid_x_C) ;
		ds_02_12_E:long_name = "Distance from northwest corner to northern face center of E_cell" ;
		ds_02_12_E:units = "m" ;
	double ds_12_22_E(grid_y_T, grid_x_C) ;
		ds_12_22_E:long_name = "Distance from northeast corner to northern face center of E_cell" ;
		ds_12_22_E:units = "m" ;
	double ds_21_22_E(grid_y_T, grid_x_C) ;
		ds_21_22_E:long_name = "Distance from northeast corner to eastern face center of E_cell" ;
		ds_21_22_E:units = "m" ;
	double ds_20_21_E(grid_y_T, grid_x_C) ;
		ds_20_21_E:long_name = "Distance from southeast corner to eastern face center of E_cell" ;
		ds_20_21_E:units = "m" ;
	double ds_10_20_E(grid_y_T, grid_x_C) ;
		ds_10_20_E:long_name = "Distance from southeast corner to southern face center of E_cell" ;
		ds_10_20_E:units = "m" ;
	double ds_00_10_E(grid_y_T, grid_x_C) ;
		ds_00_10_E:long_name = "Distance from southwest corner to southern face center of E_cell" ;
		ds_00_10_E:units = "m" ;
	double ds_01_11_E(grid_y_T, grid_x_C) ;
		ds_01_11_E:long_name = "Distance from center to western face of E_cell" ;
		ds_01_11_E:units = "m" ;
	double ds_11_12_E(grid_y_T, grid_x_C) ;
		ds_11_12_E:long_name = "Distance from center to northern face of E_cell" ;
		ds_11_12_E:units = "m" ;
	double ds_11_21_E(grid_y_T, grid_x_C) ;
		ds_11_21_E:long_name = "Distance from center to eastern face of E_cell" ;
		ds_11_21_E:units = "m" ;
	double ds_10_11_E(grid_y_T, grid_x_C) ;
		ds_10_11_E:long_name = "Distance from center to southern face of E_cell" ;
		ds_10_11_E:units = "m" ;
	double ds_01_21_E(grid_y_T, grid_x_C) ;
		ds_01_21_E:long_name = "width of E_cell" ;
		ds_01_21_E:units = "m" ;
	double ds_10_12_E(grid_y_T, grid_x_C) ;
		ds_10_12_E:long_name = "height of E_cell" ;
		ds_10_12_E:units = "m" ;
	double x_N(grid_y_C, grid_x_T) ;
		x_N:long_name = "Geographic longitude of N_cell centers" ;
		x_N:units = "degree_east" ;
	double y_N(grid_y_C, grid_x_T) ;
		y_N:long_name = "Geographic latitude of N_cell centers" ;
		y_N:units = "degree_north" ;
	double x_vert_N(vertex, grid_y_C, grid_x_T) ;
		x_vert_N:long_name = "Geographic longitude of N_cell vertices begin southwest counterclockwise" ;
		x_vert_N:units = "degree_east" ;
	double y_vert_N(vertex, grid_y_C, grid_x_T) ;
		y_vert_N:long_name = "Geographic latitude of N_cell vertices begin southwest counterclockwise" ;
		y_vert_N:units = "degree_north" ;
	double area_N(grid_y_C, grid_x_T) ;
		area_N:long_name = "Area of N_cell" ;
		area_N:units = "m2" ;
	double angle_N(grid_y_C, grid_x_T) ;
		angle_N:long_name = "Angle clockwise between logical and geographic east of N_cell" ;
		angle_N:units = "degree" ;
	double ds_00_02_N(grid_y_C, grid_x_T) ;
		ds_00_02_N:long_name = "Length of western face of N_cell" ;
		ds_00_02_N:units = "m" ;
	double ds_20_22_N(grid_y_C, grid_x_T) ;
		ds_20_22_N:long_name = "Length of eastern face of N_cell" ;
		ds_20_22_N:units = "m" ;
	double ds_02_22_N(grid_y_C, grid_x_T) ;
		ds_02_22_N:long_name = "Length of northern face of N_cell" ;
		ds_02_22_N:units = "m" ;
	double ds_00_20_N(grid_y_C, grid_x_T) ;
		ds_00_20_N:long_name = "Length of southern face of N_cell" ;
		ds_00_20_N:units = "m" ;
	double ds_00_01_N(grid_y_C, grid_x_T) ;
		ds_00_01_N:long_name = "Distance from southwest corner to western face center of N_cell" ;
		ds_00_01_N:units = "m" ;
	double ds_01_02_N(grid_y_C, grid_x_T) ;
		ds_01_02_N:long_name = "Distance from northwest corner to western face center of N_cell" ;
		ds_01_02_N:units = "m" ;
	double ds_02_12_N(grid_y_C, grid_x_T) ;
		ds_02_12_N:long_name = "Distance from northwest corner to northern face center of N_cell" ;
		ds_02_12_N:units = "m" ;
	double ds_12_22_N(grid_y_C, grid_x_T) ;
		ds_12_22_N:long_name = "Distance from northeast corner to northern face center of N_cell" ;
		ds_12_22_N:units = "m" ;
	double ds_21_22_N(grid_y_C, grid_x_T) ;
		ds_21_22_N:long_name = "Distance from northeast corner to eastern face center of N_cell" ;
		ds_21_22_N:units = "m" ;
	double ds_20_21_N(grid_y_C, grid_x_T) ;
		ds_20_21_N:long_name = "Distance from southeast corner to eastern face center of N_cell" ;
		ds_20_21_N:units = "m" ;
	double ds_10_20_N(grid_y_C, grid_x_T) ;
		ds_10_20_N:long_name = "Distance from southeast corner to southern face center of N_cell" ;
		ds_10_20_N:units = "m" ;
	double ds_00_10_N(grid_y_C, grid_x_T) ;
		ds_00_10_N:long_name = "Distance from southwest corner to southern face center of N_cell" ;
		ds_00_10_N:units = "m" ;
	double ds_01_11_N(grid_y_C, grid_x_T) ;
		ds_01_11_N:long_name = "Distance from center to western face of N_cell" ;
		ds_01_11_N:units = "m" ;
	double ds_11_12_N(grid_y_C, grid_x_T) ;
		ds_11_12_N:long_name = "Distance from center to northern face of N_cell" ;
		ds_11_12_N:units = "m" ;
	double ds_11_21_N(grid_y_C, grid_x_T) ;
		ds_11_21_N:long_name = "Distance from center to eastern face of N_cell" ;
		ds_11_21_N:units = "m" ;
	double ds_10_11_N(grid_y_C, grid_x_T) ;
		ds_10_11_N:long_name = "Distance from center to southern face of N_cell" ;
		ds_10_11_N:units = "m" ;
	double ds_01_21_N(grid_y_C, grid_x_T) ;
		ds_01_21_N:long_name = "width of N_cell" ;
		ds_01_21_N:units = "m" ;
	double ds_10_12_N(grid_y_C, grid_x_T) ;
		ds_10_12_N:long_name = "height of N_cell" ;
		ds_10_12_N:units = "m" ;
	double x_C(grid_y_C, grid_x_C) ;
		x_C:long_name = "Geographic longitude of C_cell centers" ;
		x_C:units = "degree_east" ;
	double y_C(grid_y_C, grid_x_C) ;
		y_C:long_name = "Geographic latitude of C_cell centers" ;
		y_C:units = "degree_north" ;
	double x_vert_C(vertex, grid_y_C, grid_x_C) ;
		x_vert_C:long_name = "Geographic longitude of C_cell vertices begin southwest counterclockwise" ;
		x_vert_C:units = "degree_east" ;
	double y_vert_C(vertex, grid_y_C, grid_x_C) ;
		y_vert_C:long_name = "Geographic latitude of C_cell vertices begin southwest counterclockwise" ;
		y_vert_C:units = "degree_north" ;
	double area_C(grid_y_C, grid_x_C) ;
		area_C:long_name = "Area of C_cell" ;
		area_C:units = "m2" ;
	double angle_C(grid_y_C, grid_x_C) ;
		angle_C:long_name = "Angle clockwise between logical and geographic east of C_cell" ;
		angle_C:units = "degree" ;
	double ds_00_02_C(grid_y_C, grid_x_C) ;
		ds_00_02_C:long_name = "Length of western face of C_cell" ;
		ds_00_02_C:units = "m" ;
	double ds_20_22_C(grid_y_C, grid_x_C) ;
		ds_20_22_C:long_name = "Length of eastern face of C_cell" ;
		ds_20_22_C:units = "m" ;
	double ds_02_22_C(grid_y_C, grid_x_C) ;
		ds_02_22_C:long_name = "Length of northern face of C_cell" ;
		ds_02_22_C:units = "m" ;
	double ds_00_20_C(grid_y_C, grid_x_C) ;
		ds_00_20_C:long_name = "Length of southern face of C_cell" ;
		ds_00_20_C:units = "m" ;
	double ds_00_01_C(grid_y_C, grid_x_C) ;
		ds_00_01_C:long_name = "Distance from southwest corner to western face center of C_cell" ;
		ds_00_01_C:units = "m" ;
	double ds_01_02_C(grid_y_C, grid_x_C) ;
		ds_01_02_C:long_name = "Distance from northwest corner to western face center of C_cell" ;
		ds_01_02_C:units = "m" ;
	double ds_02_12_C(grid_y_C, grid_x_C) ;
		ds_02_12_C:long_name = "Distance from northwest corner to northern face center of C_cell" ;
		ds_02_12_C:units = "m" ;
	double ds_12_22_C(grid_y_C, grid_x_C) ;
		ds_12_22_C:long_name = "Distance from northeast corner to northern face center of C_cell" ;
		ds_12_22_C:units = "m" ;
	double ds_21_22_C(grid_y_C, grid_x_C) ;
		ds_21_22_C:long_name = "Distance from northeast corner to eastern face center of C_cell" ;
		ds_21_22_C:units = "m" ;
	double ds_20_21_C(grid_y_C, grid_x_C) ;
		ds_20_21_C:long_name = "Distance from southeast corner to eastern face center of C_cell" ;
		ds_20_21_C:units = "m" ;
	double ds_10_20_C(grid_y_C, grid_x_C) ;
		ds_10_20_C:long_name = "Distance from southeast corner to southern face center of C_cell" ;
		ds_10_20_C:units = "m" ;
	double ds_00_10_C(grid_y_C, grid_x_C) ;
		ds_00_10_C:long_name = "Distance from southwest corner to southern face center of C_cell" ;
		ds_00_10_C:units = "m" ;
	double ds_01_11_C(grid_y_C, grid_x_C) ;
		ds_01_11_C:long_name = "Distance from center to western face of C_cell" ;
		ds_01_11_C:units = "m" ;
	double ds_11_12_C(grid_y_C, grid_x_C) ;
		ds_11_12_C:long_name = "Distance from center to northern face of C_cell" ;
		ds_11_12_C:units = "m" ;
	double ds_11_21_C(grid_y_C, grid_x_C) ;
		ds_11_21_C:long_name = "Distance from center to eastern face of C_cell" ;
		ds_11_21_C:units = "m" ;
	double ds_10_11_C(grid_y_C, grid_x_C) ;
		ds_10_11_C:long_name = "Distance from center to southern face of C_cell" ;
		ds_10_11_C:units = "m" ;
	double ds_01_21_C(grid_y_C, grid_x_C) ;
		ds_01_21_C:long_name = "width of C_cell" ;
		ds_01_21_C:units = "m" ;
	double ds_10_12_C(grid_y_C, grid_x_C) ;
		ds_10_12_C:long_name = "height of C_cell" ;
		ds_10_12_C:units = "m" ;
	double depth_t(grid_y_T, grid_x_T) ;
		depth_t:long_name = "topographic depth of T-cell" ;
		depth_t:units = "meters" ;
	double num_levels(grid_y_T, grid_x_T) ;
		num_levels:long_name = "number of vertical T-cells" ;
		num_levels:units = "none" ;
	double wet(grid_y_T, grid_x_T) ;
		wet:long_name = "land/sea flag (0=land) for T-cell" ;
		wet:units = "none" ;
	double AREA_ATMxOCN(i_atmXocn) ;
	double DI_ATMxOCN(i_atmXocn) ;
	double DJ_ATMxOCN(i_atmXocn) ;
	int I_ATM_ATMxOCN(i_atmXocn) ;
	int J_ATM_ATMxOCN(i_atmXocn) ;
	int I_OCN_ATMxOCN(i_atmXocn) ;
	int J_OCN_ATMxOCN(i_atmXocn) ;
	double AREA_ATMxLND(i_atmXlnd) ;
	double DI_ATMxLND(i_atmXlnd) ;
	double DJ_ATMxLND(i_atmXlnd) ;
	int I_ATM_ATMxLND(i_atmXlnd) ;
	int J_ATM_ATMxLND(i_atmXlnd) ;
	int I_LND_ATMxLND(i_atmXlnd) ;
	int J_LND_ATMxLND(i_atmXlnd) ;
	double AREA_LNDxOCN(i_lndXocn) ;
	double DI_LNDxOCN(i_lndXocn) ;
	double DJ_LNDxOCN(i_lndXocn) ;
	int I_LND_LNDxOCN(i_lndXocn) ;
	int J_LND_LNDxOCN(i_lndXocn) ;
	int I_OCN_LNDxOCN(i_lndXocn) ;
	int J_OCN_LNDxOCN(i_lndXocn) ;
	double xba(xba) ;
	double yba(yba) ;
	double xta(xta) ;
	double yta(yta) ;
	double AREA_ATM(yta, xta) ;
	double xbl(xbl) ;
	double ybl(ybl) ;
	double xtl(xtl) ;
	double ytl(ytl) ;
	double AREA_LND(ytl, xtl) ;
	double AREA_LND_CELL(ytl, xtl) ;
	double xto(xto) ;
	double yto(yto) ;
	double AREA_OCN(yto, xto) ;

// global attributes:
		:filename = "edit_grid.20110618.nc" ;
		:xname = "longitude" ;
		:yname = "latitude" ;
		:vertex_convention = "SWCCW" ;
		:join_lat = 65.f ;
		:y_boundary_type = "fold_north_edge" ;
		:x_boundary_type = "cyclic" ;
		:topography = "from_file" ;
		:input_file = "/short/p66/sjm599/AusCOM/input/mom4/OCCAM_p1degree.nc" ;
		:input_field = "topo" ;
		:fill_isolated_cells = "y" ;
		:fill_first_row = "y" ;
		:deepen_shallow = "y" ;
		:adjust_topo = "y" ;
		:filter_topog = "y" ;
		:num_filter_pass = 1.f ;
}

access-om2/ocean/grids/mosaic/global.1deg/2020.05.30/grid_spec.nc:


netcdf grid_spec {
dimensions:
	string = 255 ;
	nfile_aXo = 1 ;
	nfile_aXl = 1 ;
	nfile_lXo = 1 ;
variables:
	char atm_mosaic_dir(string) ;
		atm_mosaic_dir:standard_name = "directory_storing_atmosphere_mosaic" ;
	char atm_mosaic_file(string) ;
		atm_mosaic_file:standard_name = "atmosphere_mosaic_file_name" ;
	char atm_mosaic(string) ;
		atm_mosaic:standard_name = "atmosphere_mosaic_name" ;
	char lnd_mosaic_dir(string) ;
		lnd_mosaic_dir:standard_name = "directory_storing_land_mosaic" ;
	char lnd_mosaic_file(string) ;
		lnd_mosaic_file:standard_name = "land_mosaic_file_name" ;
	char lnd_mosaic(string) ;
		lnd_mosaic:standard_name = "land_mosaic_name" ;
	char ocn_mosaic_dir(string) ;
		ocn_mosaic_dir:standard_name = "directory_storing_ocean_mosaic" ;
	char ocn_mosaic_file(string) ;
		ocn_mosaic_file:standard_name = "ocean_mosaic_file_name" ;
	char ocn_mosaic(string) ;
		ocn_mosaic:standard_name = "ocean_mosaic_name" ;
	char ocn_topog_dir(string) ;
		ocn_topog_dir:standard_name = "directory_storing_ocean_topog" ;
	char ocn_topog_file(string) ;
		ocn_topog_file:standard_name = "ocean_topog_file_name" ;
	char aXo_file(nfile_aXo, string) ;
		aXo_file:standard_name = "atmXocn_exchange_grid_file" ;
	char aXl_file(nfile_aXl, string) ;
		aXl_file:standard_name = "atmXlnd_exchange_grid_file" ;
	char lXo_file(nfile_lXo, string) ;
		lXo_file:standard_name = "lndXocn_exchange_grid_file" ;
}

  • I placed the atmosphere vertical levels file vertlevs_G3 in a global.N96 directory, as it's specifying information about the resolution. Since it's just a text file though, unsure whether it really is resolution dependent. @MartinDix would you be able to clarify the proper classification?

  • Is it reasonable to classify the resolutions for the coupling files as global.o.1deg_a.N96? I noticed that most of the files mention the cice rather than the ocean model and so am wondering whether I'm incorrect to prioritise the ocean grid in the labelling?

Apologies for all the questions. Let me know if you have any suggestions or changes to make. Once everything is worked out, I'll make a branch of the configuration which uses the new structure.

@aidanheerdegen
Copy link
Member Author

The atmosphere directory originally contained a SOURCES file, listing where the atmospheric inputs had been copied from. Do we want to keep this somewhere, and perhaps update it to include the sources I found for the ocean and other data (some files I couldn't find anything for though), or just leave it out?

It is only the paths to the files, which might mean something to some people. Ideally we'd find out what that meaning might be as people don't last forever. That may be a later endeavour however, so I don't see any problem with keeping the file around, or putting the relevant path in a README in the sub-dir where the file is now located and note that this was the original location where the file was sourced from.

Just saw this previous issue - if we're happy to remove these files I'll omit them from the new directory.

Probably best to hash out in that issue what should be omitted and then action it here.

Is our categorisation of the ocean grid_spec.nc file under ocean/grids/mosaic/ correct? It matches the classification for the OM2 file of the same name, however the pre-industrial and om2 grid_spec.nc files are quite different.

The ESM1.5 configuration is using the old FMS mosaic format. IIRC I converted the COSIMA 1 degree grid to the more modern format to match the 0.25 and 0.1 grids, but I can't find any mention of it. Damn.

Ideally we should switch to using the same grid format as the ACCESS-OM2 configurations, though I think the grid itself is different due to cell stretching around the equator in the coupled models, but correct me if I'm wrong.

@anton-seaice has been playing around with grids quite a bit, so he or @dougiesquire may be able to comment on this.

Is it reasonable to classify the resolutions for the coupling files as global.o.1deg_a.N96? I noticed that most of the files mention the cice rather than the ocean model and so am wondering whether I'm incorrect to prioritise the ocean grid in the labelling?

It's a good point. The ocean and ice share a grid, and the atmosphere sends all fields through the ice model IIRC. So yes it is a bit weird. Could use a name like global.oi.1deg_a.N96. In any case does it make sense to single out a single model when the coupling is between two models, and is indicated in the names of the files themselves?

@anton-seaice
Copy link

Ideally we should switch to using the same grid format as the ACCESS-OM2 configurations, though I think the grid itself is different due to cell stretching around the equator in the coupled models, but correct me if I'm wrong.

The ACCESS-OM 1 degree grid also have some (latitudinal?) refinement around the equator, its probably the same but it might not be. There is no guarantee the variable format used between the CICE4 in ESM1.5 and the AUSCOM-CICE5 fork are the same of course (again, quite likely there are).

@dougiesquire
Copy link

dougiesquire commented Jun 18, 2024

or @dougiesquire may be able to comment on this.

Maybe not... I wasn't even aware that there was an "old FMS mosaic format".

My understanding of the "new" (ACCESS-OM2) format is:

  • the grid_spec.nc file has the locations of the mosaic files for the various components. For ACCESS-OM2 only the ocn_* variables are used and the ocean mosaic file is ocean_mosaic.nc.
  • the ocean_mosaic.nc file has the location of the grid tile files that make up the mosaic and defines how they fit together. In ACCESS-OM2 there is only one grid tile (ocean_hgrid.nc)
  • the ocean_hgrid.nc file defines the grid

It looks like in the "old FMS mosaic format", all the info that could be spread across this nested structure is contained in the one grid_spec.nc file? So the ocean grid is defined by the x_* and y_* variables in grid_spec.nc? Is that correct @aidanheerdegen? If so, then it looks to me like the ocean grids could be the same between ACCESS-OM2 and ACCESS-ESM1.5, but I can confirm tomorrow.

@dougiesquire
Copy link

Also, a couple of unsolicited comments on the above structure:

  • for the ACCESS-OM3 inputs we put things that can be used across a given hierarchy level in a share directory (e.g. instead of resolution_independent)
  • I personally find the formatting of global.o.1deg_a.N96 hard to parse at a glance. I presume this is encoding "1 deg ocean, N96 atmosphere" and to me something like global.o_1deg.a_N96 is clearer. I might be alone in that.

Please feel free to ignore both

@aidanheerdegen
Copy link
Member Author

It looks like in the "old FMS mosaic format", all the info that could be spread across this nested structure is contained in the one grid_spec.nc file? So the ocean grid is defined by the x_* and y_* variables in grid_spec.nc? Is that correct @aidanheerdegen?

IIRC yes. This is where we really miss the mentat abilities of Russ Fiedler when it comes to MOM5.

penguian pushed a commit that referenced this issue Jun 21, 2024
@MartinDix
Copy link

Many of these files are common across pre-industrial and historical. E.g. for the atmosphere everything except forcing and aerosol directories.

Everything in the coupler and ice directories is also common. Everything in ocean too, except perhaps biogeochemistry?

@blimlim
Copy link

blimlim commented Jun 21, 2024

I've tried modifying the tree based on the above ideas.

  • and to me something like global.o_1deg.a_N96 is clearer.
    I agree, I think that looks better.

I've added in an unused directory for the files discussed here – I'm thinking it might be good to hold onto them in case they are needed later, but let me know if it would be better to remove them.

Many of these files are common across pre-industrial and historical

I've added in a common directory and moved these files across. It looks like the pre-industrial and historical configurations use different Ndep files under atmosphere/land/biogeochem/ (e.g. Ndep_1850_ESM1.anc for pre-industrial and Ndep_1849_2015.anc for historical), but otherwise the land inputs appear to be shared – just wanted to confirm that this sounds correct.

Everything in ocean too, except perhaps biogeochemistry?

From what I can tell, the copies of bgc_param.nc, dust.nc, and ocmip2_press_monthly_om1p5_bc.nc accessed by the coe historical experiment are identical to the copies in /g/data/vk83/experiments/inputs/access-esm1p5/pre-industrial/ocean, and so I've put them into the common directory. The other bgc inputs don't seem to be used , and so I've moved them into the unused section.

The modified tree based on the above is shown below. Let me know your thoughts/if you have any suggestions!


.
├── common/
│   ├── atmosphere/
│   │   ├── grids/
│   │   │   ├── global.N96/
│   │   │   │   └── 2020.05.19/
│   │   │   │       └── qrparm.mask
│   │   │   └── resolution_independent/
│   │   │       └── vertlevs_G3
│   │   ├── land/
│   │   │   ├── biogeochem/
│   │   │   │   └── resolution_independent/
│   │   │   │       └── 2020.05.19/
│   │   │   │           ├── modis_phenology_csiro.txt
│   │   │   │           └── pftlookup_csiro_v16_17tiles_wtlnds.csv
│   │   │   ├── biogeophys/
│   │   │   │   └── resolution_independent/
│   │   │   │       └── 2020.05.19/
│   │   │   │           ├── def_soil_params.txt
│   │   │   │           └── def_veg_params.txt
│   │   │   ├── soiltype/
│   │   │   │   └── global.N96/
│   │   │   │       └── 2020.05.19/
│   │   │   │           └── qrparm.soil_igbp_vg
│   │   │   └── vegetation/
│   │   │       └── global.N96/
│   │   │           └── 2020.05.19/
│   │   │               └── cable_vegfunc_N96.anc
│   │   ├── stash/
│   │   │   └── 2020.05.19/
│   │   │       ├── stasets
│   │   │       └── STASHmaster
│   │   └── spectral/
│   │       └── resolution_independent/
│   │           └── 2020.05.19/
│   │               ├── spec3a_lw_hadgem1_6on
│   │               └── spec3a_sw_hadgem1_6on
│   ├── coupler/
│   │   ├── grids/
│   │   │   └── global.oi_1deg.a_N96/
│   │   │       └── 2020.05.19/
│   │   │           ├── areas.nc
│   │   │           ├── grids.nc
│   │   │           └── masks.nc
│   │   └── remapping_weights/
│   │       └── global.oi_1deg.a_N96/
│   │           └── 2020.05.19/
│   │               ├── rmp_cice_to_um1t_CONSERV_FRACNNEI.nc
│   │               ├── rmp_cice_to_um1u_CONSERV_FRACNNEI.nc
│   │               ├── rmp_cice_to_um1v_CONSERV_FRACNNEI.nc
│   │               ├── rmp_um1t_to_cice_CONSERV_DESTAREA.nc
│   │               ├── rmp_um1t_to_cice_CONSERV_FRACNNEI.nc
│   │               ├── rmp_um1u_to_cice_CONSERV_FRACNNEI.nc
│   │               └── rmp_um1v_to_cice_CONSERV_FRACNNEI.nc
│   ├── ice/
│   │   └── grids/
│   │       └── global.1deg/
│   │           └── 2020.05.19/
│   │               ├── grid.nc
│   │               └── kmt.nc
│   └── ocean/
│       ├── biogeochemistry/
│       │   └── global.1deg/
│       │       └── 2020.05.19/
│       │           ├── bgc_param.nc
│       │           ├── dust.nc
│       │           └── ocmip2_press_monthly_om1p5_bc.nc
│       └── grids/
│           ├── mosaic/
│           │   └── global.1deg/
│           │       └── 2020.05.19/
│           │           └── grid_spec.nc
│           ├── shortwave_penetration/
│           │   └── global.1deg/
│           │       └── 2020.05.19/
│           │           └── ssw_atten_depth.nc
│           └── tides/
│               └── global.1deg/
│                   └── 2020.05.19/
│                       ├── roughness_amp.nc
│                       └── tideamp.nc
├── pre-industrial/
│   ├── atmosphere/
│   │   ├── aerosol/
│   │   │   └── global.N96/
│   │   │       └── 2020.05.19/
│   │   │           ├── BC_hi_1850_ESM1.anc
│   │   │           ├── Bio_1850_ESM1.anc
│   │   │           ├── biogenic_351sm.N96L38
│   │   │           ├── DMS_conc.N96
│   │   │           ├── OCFF_1850_ESM1.anc
│   │   │           ├── scycl_1850_ESM1_v4.anc
│   │   │           └── sulpc_oxidants_N96_L38
│   │   ├── forcing/
│   │   │   ├── global.N96/
│   │   │   │   └── 2020.05.19/
│   │   │   │       └── ozone_1850_ESM1.anc
│   │   │   └── resolution_independent/
│   │   │       └── 2020.05.19/
│   │   │           └── volcts_18502000ave.dat
│   │   └── land/
│   │       └── biogeochem/
│   │           └── global.N96/
│   │               └── 2020.05.19/
│   │                   └── Ndep_1850_ESM1.anc
│   ├── ocean
│   └── restart/
│       ├── atmosphere
│       ├── coupler
│       ├── ice
│       └── ocean
└── unused/
    ├── atmosphere/
    │   ├── climatology/
    │   │   └── global.N96/
    │   │       └── 2020.05.19/
    │   │           ├── qrclim.slt
    │   │           └── qrclim.smow
    │   └── land/
    │       └── biogeochem/
    │           └── resolution_independent/
    │               └── 2020.05.19/
    │                   ├── pftlookup_csiro_v16_17tiles.csv
    │                   ├── pftlookup_csiro_v16_17tiles_spinup.csv
    │                   └── poolcnpInTumbarumba.csv
    ├── coupler/
    │   └── uncategorised/
    │       └── 2020.05.19/
    │           └── cf_name_table.txt
    ├── ice/
    │   └── climatology/
    │       └── global.1deg/
    │           └── 2020.05.19/
    │               └── monthly_sstsss.nc
    └── ocean/
        └── uncategorised/
            ├── global.1deg/
            │   └── 2020.05.19/
            │       ├── basin_mask.nc
            │       └── geothermal_heating.nc
            └── biogeochemistry/
                └── global.1deg/
                    └── 2020.05.19/
                        ├── cfc_auscom.nc
                        ├── co2_obs.nc
                        ├── ocmip2_abiotic_c14_atm_hist_om3_bc-1-9999.nc
                        ├── ocmip2_fice_monthly_om1p5_bc.nc
                        ├── ocmip2_siple_co2_atm_am2_bc-1-9999.nc
                        └── ocmip2_xkw_monthly_om1p5_bc.nc


@aidanheerdegen
Copy link
Member Author

aidanheerdegen commented Jun 21, 2024

Awesome, thanks @blimlim this looks great.

My only issue with common is that I'm guessing this is only common to present-day, or close to it. What will this look like when we have paleoclimate configurations?

Maybe it isn't worth overthinking too much, but if there is anything obvious I'd be keen to incorporate it now. e.g. do we put this all under present, and have a separate hierarchy for miocene and last-glacial-maximum?

(Note: we could use present-day which is more obvious, but perhaps misleading because we're also talking about pre-industrial)

I like the idea of unused. It makes is obvious it has been omitted.

@blimlim
Copy link

blimlim commented Jun 24, 2024

That sounds like a good idea

(Note: we could use present-day which is more obvious, but perhaps misleading because we're also talking about pre-industrial)

would modern be accurate for the pre-industrial, historical, and maybe also the CMIP scenario configurations?

If the tree is looking ok to everyone, I'll rearrange my copy of the input directory to match and test it out.

@aidanheerdegen
Copy link
Member Author

would modern be accurate for the pre-industrial, historical, and maybe also the CMIP scenario configurations?

Nice suggestion. Yep, sounds good.

If the tree is looking ok to everyone, I'll rearrange my copy of the input directory to match and test it out.

Sounds like a good idea.

@blimlim
Copy link

blimlim commented Jun 25, 2024

Similarly with the ice monthly_sstsss.nc.

When I was digging into the CICE calendar I noticed that the CICE code is looking for this file:

#ifdef AusCOM
      idate_save = idate  !save for late re-set in case 'restart' is used for jobnum=1

      if (runtype == 'initial') then 
        nrec = month - 1        !month is from calendar
        if (nrec == 0) nrec = 12 
        call get_time0_sstsss(trim(inputdir)//'/monthly_sstsss.nc', nrec)
      endif
      !the read in sst/sss determines the initial ice state (in init_state)

Removing this file from the input directory leads to the PI configuration crashing, in iceout085:

 (calendar)  idate =      1010101
 (get_time0_sstsss) file doesnt exist: INPUT/monthly_sstsss.nc
(END)

@MartinDix is it worth digging more into why it's requiring this file? For now I'll leave it in the restructured directory.

@anton-seaice
Copy link

I am moderately sure that file is just used to set the initial ice extent. i.e. where the SST is below -1.8C in that file, the model will initialise with sea-ice present. This only occurs during initial spin-up (when runtype = 'initial') and then for later runs (of the same experiment) runtype = 'continue' so this file is not neaded anymore.

@blimlim
Copy link

blimlim commented Jun 25, 2024

Ah ok thanks. Just to check, is this meant to be happening in the default PI configuration? If I try and remove the file the model doesn't run.

@anton-seaice
Copy link

Ah ok thanks. Just to check, is this meant to be happening in the default PI configuration? If I try and remove the file the model doesn't run.

Its needed in in all configurations when a new experiment is started (i.e. when there isnt a ice restart file).. (The initial conditions for sea-ice in these circumstances are not very critical ... their impact on the overall simulation would resolve within a year to be consistent with the other model components. The ocean model would take much longer to stabilise/spin-up.

@blimlim
Copy link

blimlim commented Jun 26, 2024

The restructured directory is temporarily located in /g/data/tm70/sw6175/esm1p5-input-restructure/restructured-inputs with tree shown at the bottom.

There are still a couple things to clean up (e.g. updating the SOURCES file), but let me know if you notice anything else to change.

The md5 hashes compared to the inputs in /g/data/access/payu/access-esm/input/pre-industrial/ changed for one file, pftlookup_csiro_v16_17tiles_wtlnds.csv. The copy currently in /g/data/vk83 and /g/data/access/payu are slightly different:

diff /g/data/access/payu/access-esm/input/pre-industrial/atmosphere/CABLE-AUX-1.4/core/biogeochem/pftlookup_csiro_v16_17tiles_wtlnds.csv /g/data/vk83/experiments/inputs/access-esm1p5/pre-industrial/atmosphere/CABLE-AUX-1.4/core/biogeochem/pftlookup_csiro_v16_17tiles_wtlnds.csv 
15c15
< 12,0,newpft,,,,,,,,,,,,,,,,,,
---
> 12,0,not used,,,,,,,,,,,,,,,,,,
34c34
< 12,2,0.5,1.8,32.30769231,2,0.5,0.022,1.439682005,40,5,0.04,0.23,0.824,0.137,5,222.22,0.2,0.009919764,,
---
> 12,5.5,0.5,1.8,0,2,0.5,0,1,1,1,0.04,0.23,0.824,0.137,5,222.22,0.2,0.02,,

The one in vk83 is identical to the older one in /g/data/access/projects/access/data/ACCESS_CMIP5/CABLE-AUX-1.4/core/biogeochem/pftlookup_csiro_v16_17tiles_wtlnds.csv, so it looks like the coe version must have changed at some point. Short simulations using each file look identical though, and so I'm guessing this data isn't used, as hinted above... If anyone with more knowledge about CABLE is able to confirm this, that would be great!

download-4


└── modern
    ├── historical
    │   ├── atmosphere
    │   ├── coupler
    │   ├── ice
    │   ├── ocean
    │   └── restart
    │       ├── atmosphere
    │       ├── coupler
    │       ├── ice
    │       └── ocean
    ├── pre-industrial
    │   ├── atmosphere
    │   │   ├── aerosol
    │   │   │   └── global.N96
    │   │   │       └── 2020.05.19
    │   │   │           ├── BC_hi_1850_ESM1.anc
    │   │   │           ├── Bio_1850_ESM1.anc
    │   │   │           ├── biogenic_351sm.N96L38
    │   │   │           ├── DMS_conc.N96
    │   │   │           ├── OCFF_1850_ESM1.anc
    │   │   │           ├── scycl_1850_ESM1_v4.anc
    │   │   │           └── sulpc_oxidants_N96_L38
    │   │   ├── forcing
    │   │   │   ├── global.N96
    │   │   │   │   └── 2020.05.19
    │   │   │   │       └── ozone_1850_ESM1.anc
    │   │   │   └── resolution_independent
    │   │   │       └── 2020.05.19
    │   │   │           └── volcts_18502000ave.dat
    │   │   ├── land
    │   │   │   └── biogeochemistry
    │   │   │       └── global.N96
    │   │   │           └── 2020.05.19
    │   │   │               └── Ndep_1850_ESM1.anc
    │   │   └── SOURCES
    │   ├── coupler
    │   ├── ice
    │   ├── ocean
    │   └── restart
    │       ├── atmosphere
    │       ├── coupler
    │       ├── ice
    │       └── ocean
    ├── shared-modern
    │   ├── atmosphere
    │   │   ├── grids
    │   │   │   ├── global.N96
    │   │   │   │   └── 2020.05.19
    │   │   │   │       └── qrparm.mask
    │   │   │   └── resolution_independent
    │   │   │       └── 2020.05.19
    │   │   │           └── vertlevs_G3
    │   │   ├── land
    │   │   │   ├── biogeochemistry
    │   │   │   │   └── resolution_independent
    │   │   │   │       └── 2020.05.19
    │   │   │   │           ├── modis_phenology_csiro.txt
    │   │   │   │           └── pftlookup_csiro_v16_17tiles_wtlnds.csv
    │   │   │   ├── biogeophysics
    │   │   │   │   └── resolution_independent
    │   │   │   │       └── 2020.05.19
    │   │   │   │           ├── def_soil_params.txt
    │   │   │   │           └── def_veg_params.txt
    │   │   │   ├── soiltype
    │   │   │   │   └── global.N96
    │   │   │   │       └── 2020.05.19
    │   │   │   │           └── qrparm.soil_igbp_vg
    │   │   │   └── vegetation
    │   │   │       └── global.N96
    │   │   │           └── 2020.05.19
    │   │   │               └── cable_vegfunc_N96.anc
    │   │   ├── spectral
    │   │   │   └── resolution_independent
    │   │   │       └── 2020.05.19
    │   │   │           ├── spec3a_lw_hadgem1_6on
    │   │   │           └── spec3a_sw_hadgem1_6on
    │   │   └── stash
    │   │       └── 2020.05.19
    │   │           ├── stasets
    │   │           └── STASHmaster
    │   ├── coupler
    │   │   ├── grids
    │   │   │   └── global.oi_1deg.a_N96
    │   │   │       └── 2020.05.19
    │   │   │           ├── areas.nc
    │   │   │           ├── grids.nc
    │   │   │           └── masks.nc
    │   │   └── remapping_weights
    │   │       └── global.oi_1deg.a_N96
    │   │           └── 2020.05.19
    │   │               ├── rmp_cice_to_um1t_CONSERV_FRACNNEI.nc
    │   │               ├── rmp_cice_to_um1u_CONSERV_FRACNNEI.nc
    │   │               ├── rmp_cice_to_um1v_CONSERV_FRACNNEI.nc
    │   │               ├── rmp_um1t_to_cice_CONSERV_DESTAREA.nc
    │   │               ├── rmp_um1t_to_cice_CONSERV_FRACNNEI.nc
    │   │               ├── rmp_um1u_to_cice_CONSERV_FRACNNEI.nc
    │   │               └── rmp_um1v_to_cice_CONSERV_FRACNNEI.nc
    │   ├── ice
    │   │   ├── climatology
    │   │   │   └── global.1deg
    │   │   │       └── 2020.05.19
    │   │   │           └── monthly_sstsss.nc
    │   │   └── grids
    │   │       └── global.1deg
    │   │           └── 2020.05.19
    │   │               ├── grid.nc
    │   │               └── kmt.nc
    │   └── ocean
    │       ├── biogeochemistry
    │       │   └── global.1deg
    │       │       └── 2020.05.19
    │       │           ├── bgc_param.nc
    │       │           ├── dust.nc
    │       │           └── ocmip2_press_monthly_om1p5_bc.nc
    │       ├── grids
    │       │   └── mosaic
    │       │       └── global.1deg
    │       │           └── 2020.05.19
    │       │               └── grid_spec.nc
    │       ├── shortwave_penetration
    │       │   └── global.1deg
    │       │       └── 2020.05.19
    │       │           └── ssw_atten_depth.nc
    │       └── tides
    │           └── global.1deg
    │               └── 2020.05.19
    │                   ├── roughness_amp.nc
    │                   └── tideamp.nc
    └── unused
        ├── atmosphere
        │   ├── climatology
        │   │   └── global.N96
        │   │       └── 2020.05.19
        │   │           ├── qrclim.slt
        │   │           └── qrclim.smow
        │   └── land
        │       └── biogeochemistry
        │           └── resolution_independent
        │               └── 2020.05.19
        │                   ├── pftlookup_csiro_v16_17tiles.csv
        │                   ├── pftlookup_csiro_v16_17tiles_spinup.csv
        │                   └── poolcnpInTumbarumba.csv
        ├── coupler
        │   └── uncategorised
        │       └── 2020.05.19
        │           └── cf_name_table.txt
        ├── ice
        └── ocean
            ├── biogeochemistry
            │   └── global.1deg
            │       └── 2020.05.19
            │           ├── cfc_auscom.nc
            │           ├── co2_obs.nc
            │           ├── ocmip2_abiotic_c14_atm_hist_om3_bc-1-9999.nc
            │           ├── ocmip2_fice_monthly_om1p5_bc.nc
            │           ├── ocmip2_siple_co2_atm_am2_bc-1-9999.nc
            │           └── ocmip2_xkw_monthly_om1p5_bc.nc
            └── uncategorised
                └── global.1deg
                    └── 2020.05.19
                        ├── basin_mask.nc
                        └── geothermal_heating.nc


@aidanheerdegen
Copy link
Member Author

The one in vk83 is identical to the older one in /g/data/access/projects/access/data/ACCESS_CMIP5/CABLE-AUX-1.4/core/biogeochem/pftlookup_csiro_v16_17tiles_wtlnds.csv, so it looks like the coe version must have changed at some point. Short simulations using each file look identical though, and so I'm guessing this data isn't used, as hinted above... If anyone with more knowledge about CABLE is able to confirm this, that would be great!

@ccarouge ?

@ccarouge
Copy link
Member

@blimlim Can you define short simulations? CASA only runs daily. Only the pftlookup file that is listed in the cable.nml will be used in the simulation anyway.

@blimlim
Copy link

blimlim commented Jun 28, 2024

Ah good point! The above simulations were 2 months long, and the figure shows the difference in the second month's mean surface temperatures. The pftlookup_csiro_v16_17tiles_wtlnds.csv is the one currently listed in the pre-industrial cable.nml file, and so it would be good to know whether the differences between the coe version and vk83 versions are important.

@ccarouge
Copy link
Member

@blimlim the differences seem to indicate the wtlnds version is using a normally empty PFT index to carry a PFT that describes wetlands (land folks seem to have an aversion to vowels). It is definitely significant. There is an ancillary that contains a map of PFTs for the globe (no idea of the name of that file for ESM1.5). If that file contains points with values of 12, then the values for the wetlands are used and picking up the correct values would definitely make a big difference over wetlands.

@blimlim
Copy link

blimlim commented Jul 3, 2024

@MartinDix just checking whether DMS_conc.N96, biogenic_351sm.N96L38, and sulpc_oxidants_N96_L38 should be moved out of pre-industrial/atmosphere and into shared-modern/atmosphere? It looks like the historical configuration is pointed to them too, e.g. ihist for the historical configuration contains:

 DMSCONC = 'DMSCONC : $ANCIL_ATMOS/DMS_conc.N96                                             ',
 ARCLBIOG        = 'ARCLBIOG : $ANCIL_ATMOS/biogenic_351sm.N96L38                                   ',
 CHEMOXID        = 'CHEMOXID : $ANCIL_ATMOS/sulpc_oxidants_N96_L38                                  ',

@MartinDix
Copy link

I don't think we need the extra level of the modern directory in the tree. Something like this should be enough

historical
shared-modern
pre-industrial
paleo-miocene
paleo-LGM

Spectral files and vertlevs are going to be shared across everything, paleo and modern.

@blimlim
Copy link

blimlim commented Jul 10, 2024

Just wanted to check whether it would be clear enough to people which configurations use the shared-modern files if we remove the modern level. If people would generally understand this, I think this should work.

We could otherwise go in the other direction and have more levels, e.g.

inputs/
    ├── shared-all (contains vertlevs and spectral files)
    ├── modern/
    │   ├── shared-modern
    │   ├── pre-industrial
    │   └── historical
    └── paleo/
        ├── miocene
        └── LGM

Let me know which would be preferable!

@MartinDix
Copy link

I think I prefer your suggestion, though the suffixes on shared are probably unnecessary because it's implicit in the paths.

How about this?

inputs/
    ├── shared (contains vertlevs and spectral files)
    ├── modern/
    │   ├── shared
    │   ├── pre-industrial
    │   └── historical
    └── paleo/
        ├── miocene
        └── LGM

@ccarouge
Copy link
Member

Are there any paleo runs that use the same ancillaries as modern runs? If so, would we have a copy of the file or would the paleo/ run point to a modern/ location?

@blimlim
Copy link

blimlim commented Jul 12, 2024

Good question, I think anything that's used by both the paleo and modern runs will go into the shared folder so that there will only be one copy of those files.

Sorry I think I originally misunderstood the question! Do you mean files that are used by the modern runs as well as some but not all of the paleo simulations? I'm not sure what the best way to handle this would be... we could put them in the shared folder too, but I'm not sure whether that should be reserved for files that are used by everything across a given level.

For managing file versions it might be easier if we avoid copies of files in different locations, so perhaps it would be best to have the paleo simulation point to a modern/ location... Do you or anyone have any preferences on this?

@blimlim
Copy link

blimlim commented Jul 12, 2024

On the same topic, I've been checking whether any other files currently under modern would be shared by all the paleo configurations too.

From David's example of setting up a miocene simulation, it looks like the following files weren't changed from the pre-industrial versions:

Atmosphere:

  • Everything in the stash directory
  • Land: def_soil_params.txt
  • Land: def_veg_params.txt
  • Land: modis_phenology_csiro.txt
  • Land: pftlookup_csiro_v16_17tiles_wtlnds.csv
  • Aerosol: biogenic_351sm.N96L38
  • Aerosol: DMS_conc.N96
  • Aerosol: sulpc_oxidants_N96_L38

Ocean:

  • biogeochemistry: bgc_param.nc
  • tides: tideamp.nc
  • tides: roughness_amp.nc

@MartinDix and @ccarouge, do you know if any of the above atmosphere and land files would be suitable for the shared modern/paleo directory. I'm guessing it might be best to leave the gridded aerosol files under modern as they appear linked to the geography. E.g. from the sulpc_oxidants_N96_L38 file:
download-7

For the oceans, I think tideamp.nc and roughness_amp.nc look dependent on the geography too and so I'll leave them under modern for now. @dougiesquire, do you know whether the same bgc_param.nc file would be used across modern and paleo configurations? All the fields appear to have a constant value, apart from the single field jack which has land points masked out:
download-9
I couldn't find this field being used in the model though.

@dougiesquire
Copy link

dougiesquire commented Jul 12, 2024

@dougiesquire, do you know whether the same bgc_param.nc file would be used across modern and paleo configurations? All the fields appear to have a constant value, apart from the single field jack which has land points masked out:

I think jack must be a leftover of a previous version of WOMBAT - I've certainly not come across it before and I agree it's not being used anywhere in the code. It could be worth removing it from bgc_param.nc and checking that the model still runs. If it does, it probably then makes sense to leave jack out of bgc_param.nc.

Regarding whether the same bgc_param.nc file would be used across modern and paleo configurations, I'm not sure sorry. "Retuning" of these parameters may be required? I'll check with someone who knows more and get back to you.

@ccarouge
Copy link
Member

@blimlim I'll ask more about the land files. Actually, I'm not sure they are used. It's possible the code needs to find files with this name though even if not using the data that is inside.

@dougiesquire
Copy link

Regarding whether the same bgc_param.nc file would be used across modern and paleo configurations, I'm not sure sorry. "Retuning" of these parameters may be required? I'll check with someone who knows more and get back to you.

Here is what Pearse Buchanan (CSIRO), who is developing WOMBAT, says:

I'd say the same parameters should be used unless there is a very good reason otherwise
Like if you were wanting to simulate ocean biogeochemistry before the evolution of diatoms, then you might consider changing some of the constants involved in phytoplankton growth
But if you are talking about more recent history (< 20 million years ago), then using the same parameters is probably the best assumption

@blimlim
Copy link

blimlim commented Jul 14, 2024

Thanks @dougiesquire and @ccarouge,

I'm not 100% on which paleo configurations we're planning on having, but based on Pearse's comments it sounds like that the single bgc_param.nc will be the same between them.

It could be worth removing it from bgc_param.nc and checking that the model still runs.

Just tested this out. The model runs without any issues and the first month of output looks identical with and without the jack variable. The following is the difference in monthly mean o2, averaged over the vertical coordinate:
download-10

I've made a new copy of the file and added it in to the directory

@ccarouge
Copy link
Member

I finally remembered to ask about the land files. The following files identified by @blimlim are needed for an ESM1.5 simulation:

  • Land: def_soil_params.txt
  • Land: def_veg_params.txt
  • Land: modis_phenology_csiro.txt
  • Land: pftlookup_csiro_v16_17tiles_wtlnds.csv

I haven't queried the whole paleoclimate community, but it looks like these files are rarely modified. It does not look like they are part of the standard modifications for a paleo simulation. Obviously, there could be user-specific experiments that require modification but that's true for any input file.
In conclusion, it looks like they are good candidates to sit under inputs/shared instead of separated between modern and paleo.

@blimlim
Copy link

blimlim commented Jul 17, 2024

Thanks @ccarouge for looking into that! I'll move those files into the share directory (I've swapped the name from shared to share to make them more consistent with the OM directory names).

I think we're almost there! Before locking in a final structure, I'd like to get everyone's thoughts on what the convention/rule for the share directories at each level of the structure should be. It's a bit tricky and I don't think we'll be able to get a perfect solution that neatly covers every situation, and so it would be great to get a few different ideas and opinions. The two options I can think of are:

  1. The share directories at each level contain inputs used by every configuration at that level. E.g. modern/share contains files used by every modern simulation, while inputs/share contains files used by all configurations.

    A downside here is that if eventually a new configuration is added which doesn't use one of the share files, someone would have to decide whether to adhere to the convention and move the unused file out of share and update the paths in all the configurations using it, or otherwise to break the convention as it's easier and wouldn't mess with anyone's existing config.yaml files. Just as a made up example, if one day we decided to add a very deep paleo simulation with different ocean bgc parameters, then the bgc_param.nc file currently sitting in inputs/share would no longer apply for every simulation, and this policy would require it to be moved.

    Going back to @ccarouge's earlier question, this convention also doesn't cover how to handle files which are shared by some but not all files at a given level. E.g. if the same land mask was used by all modern simulations, as well as paleo/last-glacial-maximum but not paleo/miocene, where should this file be stored? I'm a bit hesitant to copy files to different locations as it might make versioning difficult, but would it be ok otherwise for the last-glacial-maximum simulation's configuration files to point to the modern inputs?

  2. The share directories contain files that are shared by multiple but not necessarily all configurations at a given level.

    This would solve the above problems but introduce new ones when different groups of configurations share different files. E.g. if we added several paleo configurations, we might have paleo/configuration_A and paleo/configuration_B using one land mask, while paleo/configuration_C and paleo/configuration_D use another. It would be hard to include both land masks in the paleo/share directory. In this situation we could add another level to the directory structure to group configurations A and B together etc, though in a worst case scenario, configurations B and C might share a different file making that grouping a bit awkward.

I'm guessing that regardless of the conventions we choose, we'll run into unexpected complications eventually, though hopefully they will be informative for setting up ESM1.6.

@dougiesquire
Copy link

@blimlim these are good points. Given that we can imagine future situations that complicate both options, I'd suggest choosing the option that is cleanest and simplest for the configurations we currently know about. To me, that seems like option 1.

Probably unhelpfully, there is also a third option to remove the share directories altogether. That scales well to new, unimagined configs but means duplicated files 🤮 .

@ccarouge
Copy link
Member

ccarouge commented Jul 18, 2024

Instead of share/, you can call it several/ 😆 So the convention becomes it contains files used by several configurations at this level.
And we can simply do that without changing the directory name to a silly name by changing the convention we want to apply to the share/ directory. After all, we are setting the rules! We can say share/ is for files used by several configurations at this level. Then, we eliminate the problems of option 1.

If we have 2 config using file A for bgc params and 2 configs using file B for bgc params, we put both under share and then we need some naming convention to differentiate the bgc params. I would assume using a naming convention for the file that reflects the differences in origin (data or processing) instead of based on the configurations that use it.

Edit: I ran out of time when I posted this. I don't know what are the requirements we want to reach, if they are:

  • no file duplicates
  • files organised per configurations

Then I don't think we can avoid some file reorganisation when new configurations come in place. It won't be necessary all the time but it will be some time because there is no way we can know all the configurations we will support right now. Actually we might know for ESM1.5 and CM2 but not for versions 1.6 and 3.

@blimlim
Copy link

blimlim commented Jul 22, 2024

Thanks @dougiesquire and @ccarouge, those are good points and that's a good question about what the requirements actually are. I'm unsure whether it's more important to aim for

  1. no duplicates – by using some form of share directories, or
  2. avoiding reorganising files when new configurations are added, e.g. by duplicating files across configurations and not having share directories.

With option 1, As @ccarouge mentioned, some file reorganisation will likely be necessary when new configurations are added. A downside here is that each experiment collects its input files from paths listed in its config.yaml file, e.g:

 exe: /g/data/access/payu/access-esm/bin/coe/um7.3x
      input:
        - /g/data/tm70/sw6175/esm1p5-input-restructure/restructured-inputs/modern/pre-industrial/atmosphere/aerosol/global.N96/2020.05.19/OCFF_1850_ESM1.anc
        - /g/data/tm70/sw6175/esm1p5-input-restructure/restructured-inputs/modern/pre-industrial/atmosphere/aerosol/global.N96/2020.05.19/BC_hi_1850_ESM1.anc

If we add a new configuration, and our directory "rules" make us rearrange the existing inputs, then any user's existing experiments using those inputs will break and they'll have to manually update the paths in their experiments. If this happened regularly... hopefully it wouldn't get too annoying.

With option 2, I think the main downside is it adds difficulty for versioning files. E.g. if we have the same bgc_param.nc file across many configurations, and it ends up having an error fixed, someone will have to copy the updated file across all the relevant configurations. It might not be obvious which configurations should receive the updated copy: for this file, the filename is hardcoded in the model, and so there might be different files all with the same name, only some of which should be updated.

Perhaps another option would be to have just one real copy of each file located with the first configuration that used it, and then use symlinks for other configurations using that same file. If that file was updated, you could search for all the symlinks to it to see which other ones also needed the update. I guess this is still a bit messy though and leaves plenty of room for error.

@aidanheerdegen
Copy link
Member Author

We should definitely not let this be a blocker. So choose a no-regrets option with a clear rationale.

No-regrets in this context would be flexible in the future, minimise copies, but absolutely cannot break historical configurations.

I'm not sure it is necessarily bad for paleo simulations to reference files in modern/share (or whatever the name is). In some cases this shows clearly that there is no better knowledge so the default is to use the modern values.

Clearly physical constants or data that is immutable over time should be in a top level share directory. This actually gives users useful information about what they should, and should not, attempt to change.

To some extent it is useful to think about future scenarios/configs, but not too much. We can deal with them as they arise and fine-tune approaches.

I think we can minimise copies by just referencing things in place (as noted above), and/or using soft or hard links. I actually don't have a massive problem with the occasional full copy if absolutely necessary (as long as the files in question aren't stupidly big). We can tell which files are identical from the md5 hashes.

Does that answer all your questions @blimlim? Do you need a sign-off on final structure? Or are you happy to proceed from here?

@blimlim
Copy link

blimlim commented Jul 31, 2024

Ok thanks @aidanheerdegen I think that makes sense!
Just want to make sure if I'm understanding correctly – the share directory at a given level would contain files that we'd expect to be used by every configuration across that level, including potential new ones, whereas files that are only used by some configurations at a given level would either be copied across them, or otherwise live in one configuration and be referenced in place, with the exact procedure to be worked out when we get there?

@aidanheerdegen
Copy link
Member Author

the share directory at a given level would contain files that we'd expect to be used by every configuration across that level, including potential new ones, whereas files that are only used by some configurations at a given level would either be copied across them, or otherwise live in one configuration and be referenced in place, with the exact procedure to be worked out when we get there?

Yes, I think so. The share directory would contain files we would expect in most cases to be utilised by configurations at that same sub-directory level. Exceptions can be made, for example a specific paleo run that has altered land/sea mask (and associated differences). Then those files would reside within that specific configuration if we didn't expect them to be used elsewhere. If we get a situation where we need to share between a sub-set of configurations we can work it out.

Does that make it clear or more opaque?

@blimlim
Copy link

blimlim commented Jul 31, 2024

Thank you, that clears it up! I'll finalise the structure and send the paths through to @CodeGat.

@aidanheerdegen
Copy link
Member Author

Closing as I think this has been completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:blocker type:release Required for next release
Projects
None yet
Development

No branches or pull requests

6 participants