Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clearly separate fixes and CMOR checks #2157

Merged
merged 46 commits into from
Sep 28, 2023
Merged

Clearly separate fixes and CMOR checks #2157

merged 46 commits into from
Sep 28, 2023

Conversation

schlunma
Copy link
Contributor

@schlunma schlunma commented Jul 31, 2023

Description

Current situation

At the moment, automatic fixes to datasets are done in the CMORCheck class via the automatic_fixes keyword. While this weird interconnection is not only conceptually awkward (why would we fix and check at the same time?), this also brings along performance issues since CMOR checks are done twice or even more often (#2154) for a given dataset. Especially for datasets with lots of individual files, this can add up (e.g., I found a 14% decrease in run time for an ICON recipe).

Changes in this PR

This PR disentangles automatic fixes and CMOR checks in a fully backwards-compatible way (apart from the deprecations; see below).

All automatic fixes have been removed from the CMORCheck class and the automatic_fix keyword deprecated (see below). Moreover, a new class GenericFix has been introduced that now takes care of ALL automatic fixes, which is added as a fix for all dataset. The CMOR checks within these fix functions are also deprecated (see below).

Deprecations


Closes #2154


Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.


To help with the number pull requests:

@schlunma schlunma added the cmor Related to the CMOR standard label Jul 31, 2023
@schlunma schlunma added this to the v2.10.0 milestone Jul 31, 2023
@schlunma schlunma self-assigned this Jul 31, 2023
@codecov
Copy link

codecov bot commented Jul 31, 2023

Codecov Report

Merging #2157 (99c823c) into main (10bd776) will increase coverage by 0.17%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #2157      +/-   ##
==========================================
+ Coverage   93.14%   93.31%   +0.17%     
==========================================
  Files         237      238       +1     
  Lines       12831    13035     +204     
==========================================
+ Hits        11951    12164     +213     
+ Misses        880      871       -9     
Files Coverage Δ
esmvalcore/cmor/_fixes/cmip6/iitm_esm.py 100.00% <100.00%> (ø)
esmvalcore/cmor/_fixes/cmip6/kace_1_0_g.py 100.00% <100.00%> (ø)
esmvalcore/cmor/_fixes/fix.py 99.72% <100.00%> (+1.13%) ⬆️
esmvalcore/cmor/_utils.py 100.00% <100.00%> (ø)
esmvalcore/cmor/check.py 98.96% <100.00%> (+1.35%) ⬆️
esmvalcore/cmor/fix.py 100.00% <100.00%> (ø)
esmvalcore/cmor/table.py 94.72% <100.00%> (+0.03%) ⬆️
esmvalcore/preprocessor/_time.py 97.72% <100.00%> (ø)

@schlunma schlunma marked this pull request as ready for review July 31, 2023 13:37
@schlunma schlunma changed the title Run CMOR checks once per datasets Clearly separate fixes and CMOR checks Aug 3, 2023
@schlunma
Copy link
Contributor Author

Thanks @bouweandela for the review! 🚀 Apart from renaming AutomaticFix I think I addressed all your comments. Please let me know what you think.

@bouweandela
Copy link
Member

Thanks for making the changes!

@schlunma
Copy link
Contributor Author

schlunma commented Sep 27, 2023

Unfortunately I found an issue with this new implementation. Previously, the fix_metadata of the AutomaticFix class has been applied to a single cube that has been filtered by short_name. Now, to be aligned with the other fix_metadata functions, it operates on cubes.

Since some datasets will always give you multiple cubes when loaded (e.g., the ones with a generic level coordinate will have additional cubes with coordinates like ps when you load them, even if they are fully CMOR-compliant), we get lots of wrong warnings with the new implementation (e.g., about wrong standard_names that are fixed). One could speculate that this is a bug in iris, but that's the current situation.

An easy way to fix this is to build the _get_single_cube function into AutomaticFix.fix_metadata. Any other ideas @bouweandela? I think this new implementation is much nicer than the old want, so I don't want to revert it.

@bouweandela
Copy link
Member

An easy way to fix this is to build the _get_single_cube function into AutomaticFix.fix_metadata.

Sounds fine to me, that is quite similar to the current implementation

esmvalcore/cmor/fix.py Outdated Show resolved Hide resolved
@bouweandela
Copy link
Member

Thanks @schlunma, LGTM!

@valeriupredoi
Copy link
Contributor

Superb work, gents - I can merge now if you guys ready-ready 🍺

Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely work 👍

@valeriupredoi valeriupredoi merged commit ac31d53 into main Sep 28, 2023
@valeriupredoi valeriupredoi deleted the cmor_checks_once branch September 28, 2023 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmor Related to the CMOR standard deprecated feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CMOR checks are done twice or even more often
3 participants