Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include frequency in history/diagnostics output filenames #191

Open
anton-seaice opened this issue Jul 17, 2024 · 19 comments
Open

Include frequency in history/diagnostics output filenames #191

anton-seaice opened this issue Jul 17, 2024 · 19 comments

Comments

@anton-seaice
Copy link
Contributor

anton-seaice commented Jul 17, 2024

@dougiesquire has suggested we include the frequency of output in the filenames of history/diagnostic output. For example, instead of

access-om3.cice.h.1900-01-01.nc

it could be

access-om3.cice.h.1900-01-01.day.nc

CMIP7 uses these names: fx, subhr, hr, day, mon, yr, dec

This has a few advantages:

  • Its clear to the user what the frequency in the file should be
  • For example if daily data is in a monthly file, the name would be access-om3.cice.h.1900-01.day.nc which is clearly different to access-om3.cice.h.1900-01.month.nc
  • It might be more reliable for the intake catalog to parse ?

At the same time we could harmonise how the dates are written ?

e.g. MOM uses "_" when cice uses "-" (access-om3.mom6.h.native_1945_05.nc)

@anton-seaice
Copy link
Contributor Author

Is there a CMIP7 format for these filenames we should go straight to ?

@dougiesquire
Copy link
Collaborator

dougiesquire commented Jul 18, 2024

In MOM, the diag_table provides the user total control over what the output filenames are. So to "impose" a format we could write it into @aekiss's make_diag_table tool (and possibly add this to om3-scripts?). It's far more convenient to use this tool than to manually edit the diag_table anyway.

@dougiesquire
Copy link
Collaborator

dougiesquire commented Jul 18, 2024

Is there a CMIP7 format for these filenames we should go straight to ?

I think this would require always having a single file per variable which I don't thinks suits our use case (e.g. the ice data)?

@aekiss
Copy link
Contributor

aekiss commented Jul 18, 2024

The standard ACCESS-OM2 configs use one file per MOM5 variable, with a standard filename format that was carefully designed to be self-explanatory. As @dougie mentioned, the required diag_table is generated by make_diag_table.py to avoid all the tedious and error-prone boilerplate of doing it manually. Note that make_diag_table.py can also put multiple variables in each file if preferred (this is done with scalars in ACCESS-OM2).

It would be nice to have something like make_diag_table.py for CICE output, but I'm not sure how it could be done, since CICE doesn't use diag_table. But if we're going to define a new output filename format we should at least make it similar to what we do for MOM6.

@ofa001
Copy link

ofa001 commented Jul 18, 2024

I was trying to tag Arnold Sullivan into this but he isnt linked to this github so I will have send him an email. Yes @aekiss you did switch to single files for access-om2 but we would prefer it back as a a single file for the existing CMIP6 post processing software package. I guess the software is still under discussion in the evaluation group, but it will be at that stage that standard CMIP7 files names which like CMIP6 will be very long to use will be set up.

CICE has an entirely different way of setting up it output fields we are talking about updating the ACCESS ESM1.6 to the CICE6.5 diagnostics (using some of the software from those routines ) so we can use the same post processing, but thats still to be actioned.

I will contact Arnold about the discussion,

@anton-seaice anton-seaice changed the title Include frequency in history/diagnostics output Include frequency in history/diagnostics output filenames Jul 18, 2024
@aekiss
Copy link
Contributor

aekiss commented Jul 18, 2024

OK thanks for the heads-up re. the possibility of single files. At this stage we are working out defaults for standard configs for development and eventually users of ACCESS-OM3. Things may need to be set up differently in the CM3 and ESM3 CMIP7 production configs.

@anton-seaice
Copy link
Contributor Author

@kieranricardo @MartinDix - is it worth trying to harmonize naming conventions with the UM / Cable ?

@ars599v2
Copy link

We try to push very hard for the community to use the CMORised data, not raw data.

CMIP table_id already has "frequency" tabs.

It is a good approach if the model output can be directly converted to CMIP format (cmorise). If we had APP5 to handle it, then that would be great. The primary purpose is to submit the data, right? It is not just for our local community to use.

Test: if we can use the new format to calculate the CMIP standard variable by using the new APP5 or other cmorise tool, then that is a great approach:

msftyrho,yes,ty_trans_rho ty_trans_rho_gm,"meridionalOverturning(var,'full')",kg s-1,dropX basin gridlat,,both,ocean,
msftmrho,yes,ty_trans_rho ty_trans_rho_gm,"meridionalOverturning(var,'full')",kg s-1,dropX basin,,both,ocean,
msftyz,yes,ty_trans ty_trans_gm ty_trans_submeso,"meridionalOverturning(var,'full')",kg s-1,dropX basin gridlat,,both,ocean,
msftmz,yes,ty_trans ty_trans_gm ty_trans_submeso,"meridionalOverturning(var,'full')",kg s-1,dropX basin,,both,ocean,

sisnthick,yes,sisnthick,,m,,,CM2,seaIce,
sispeed,yes,sispeed,,m/s,,,CM2,seaIce,
sistrxdtop,yes,sistrxdtop,,N m-2,,down,CM2,seaIce,
sistrydtop,yes,sistrydtop,,N m-2,,down,CM2,seaIce,

@ofa001
Copy link

ofa001 commented Jul 18, 2024

I think @aekiss is correct its probably going to need different approaches for different communities with the COSIMA community still using ''the cookbook' perhaps now though @dougiesquire prefers through the intake catalogue if the data is set up in that format. Whilst the CMIP7 data needs to be cmorized. The intake catalogue can handle cmorized data. I guess this will all be discussed more in the evaluation working group and in the wider community at the ACCESS-NRI workshop.

@anton-seaice
Copy link
Contributor Author

CMIP table_id already has "frequency" tabs.

In theory, we can hope that data is accessed through the intake catalogue, or through other tools (we know this isn't true ofcourse). Through the metadata in the source data, those tools will provide the frequency to the user. So to the end user the filename shouldn't matter. I think we are setting the filename as a convenience to developers (and maybe its used by intake).

sisnthick,yes,sisnthick,,m,,,CM2,seaIce, sispeed,yes,sispeed,,m/s,,,CM2,seaIce, sistrxdtop,yes,sistrxdtop,,N m-2,,down,CM2,seaIce, sistrydtop,yes,sistrydtop,,N m-2,,down,CM2,seaIce,

CICE6 has an option to enable output using the cmip variable names, so I think this will work for sea ice output in OM3. But I haven't tested it.

@aekiss
Copy link
Contributor

aekiss commented Jul 22, 2024

@ars599v2
Copy link

Many thanks @anton-seaice @aekiss,

I have no further questions. I believe that the MOM output with extra frequency can still be handled using the APP (CMORise package).

@aekiss, I think MOM6 CMOR format outputs should just for one-to-one variables, e.g., temp -> thetao, temp[:, :, :, 0] -> tos. For specific cases like hdfs, heat budget analysis, or different basins, we would need APP4 to handle it.

It would be great to ask the MED team (Romain) to double-check this. But at this stage, the ocean output in frequency format is fine.

@aekiss
Copy link
Contributor

aekiss commented Jul 22, 2024

ping @rbeucher

@rbeucher
Copy link

That would be great. Anything that can alleviate the need for loading multiple files would be good

@ars599v2
Copy link

@rbeucher once they introduce the new frequency for the ocean output, we then need to change APP4 and maybe MOPPER

            elif realm == 'ocean':
                if 'scalar' in axes_modifier:
                    file_structure='/ocn/ocean_scalar.nc-*'
                elif freq == 'mon':
                    file_structure='/ocn/ocean_month.nc-*'
                elif freq == 'yr':
                    if access_version.find('OM2') != -1:
                        if axes_modifier.find('mon2yr') != -1:
                            file_structure='/ocn/ocean_month.nc-*'
                        else:
                            file_structure='/ocn/ocean_budget.nc-*'
                            if exptoprocess == '025deg_jra55_iaf_omip2_cycle6':
                                axes_modifier='{} mon2yr'.format(axes_modifier)
                    else:
                        file_structure='/ocn/ocean_month.nc-*'
                elif freq == 'fx':
                    #if access_version.find('OM2') != -1:
                    #    file_structure='/ocn/ocean_grid.nc-*'
                    #else:
                    file_structure='/ocn/ocean_month.nc-*'
                elif freq == 'day':
                    file_structure='/ocn/ocean_daily.nc-*'
                else:
                    #Unknown ocean frequency
                    file_structure=None

@aekiss
Copy link
Contributor

aekiss commented Jul 30, 2024

Linking to related discussion: #190 (comment)

@rbeucher
Copy link

@aidanheerdegen following our discussion this morning. I think we can change APP4 and MOPPER.

@dougiesquire
Copy link
Collaborator

I think we've finialised a format - see here

@aekiss
Copy link
Contributor

aekiss commented Jul 31, 2024

almost there - see here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants