-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Refactor]: CDAT Migration Phase 3: testing and documentation update #846
Conversation
- Delete outdated run scripts
@chengzhuzhang The following variables failed in ['/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/examples-dev/ex5_model_to_obs/lat_lon/Cloud ISCCP/ISCCPCOSP-CLDTOT_TAU1.3_9.4_ISCCP-ANN-global_ref.nc',
'/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/examples-dev/ex5_model_to_obs/lat_lon/Cloud ISCCP/ISCCPCOSP-CLDTOT_TAU1.3_9.4_ISCCP-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/examples-dev/ex5_model_to_obs/lat_lon/Cloud ISCCP/ISCCPCOSP-CLDTOT_TAU1.3_ISCCP-ANN-global_ref.nc',
'/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/examples-dev/ex5_model_to_obs/lat_lon/Cloud ISCCP/ISCCPCOSP-CLDTOT_TAU1.3_ISCCP-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/examples-dev/ex5_model_to_obs/lat_lon/Cloud ISCCP/ISCCPCOSP-CLDTOT_TAU9.4_ISCCP-ANN-global_ref.nc',
'/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/examples-dev/ex5_model_to_obs/lat_lon/Cloud ISCCP/ISCCPCOSP-CLDTOT_TAU9.4_ISCCP-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/examples-dev/ex6_zonal_mean_2d_and_lat_lon_demo/lat_lon/Cloud ISCCP/ISCCPCOSP-CLDTOT_TAU1.3_9.4_ISCCP-ANN-global_ref.nc',
'/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/examples-dev/ex6_zonal_mean_2d_and_lat_lon_demo/lat_lon/Cloud ISCCP/ISCCPCOSP-CLDTOT_TAU1.3_9.4_ISCCP-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/examples-dev/ex6_zonal_mean_2d_and_lat_lon_demo/lat_lon/Cloud ISCCP/ISCCPCOSP-CLDTOT_TAU1.3_ISCCP-ANN-global_ref.nc',
'/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/examples-dev/ex6_zonal_mean_2d_and_lat_lon_demo/lat_lon/Cloud ISCCP/ISCCPCOSP-CLDTOT_TAU1.3_ISCCP-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/examples-dev/ex6_zonal_mean_2d_and_lat_lon_demo/lat_lon/Cloud ISCCP/ISCCPCOSP-CLDTOT_TAU9.4_ISCCP-ANN-global_ref.nc',
'/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/examples-dev/ex6_zonal_mean_2d_and_lat_lon_demo/lat_lon/Cloud ISCCP/ISCCPCOSP-CLDTOT_TAU9.4_ISCCP-ANN-global_test.nc'] Root CauseI found that the time attributes for this file is bad, which breaks decoding ( Checking time attributes (with
|
ds = self._open_climo_dataset(filepath) |
ds = self._open_climo_dataset(filepath) |
e3sm_diags/e3sm_diags/driver/utils/dataset_xr.py
Lines 490 to 503 in 9be8cea
# Time coordinates are decoded because there might be cases where | |
# a multi-file climatology dataset has different units between files | |
# but raw encoded time values overlap. Decoding with Xarray allows | |
# concatenation of datasets with this issue (e.g., `area_cycle_zonal_mean` | |
# set with the MERRA2_Aerosols climatology datasets). | |
# NOTE: This GitHub issue explains why the "coords" and "compat" args | |
# are defined as they are below: https://github.com/xCDAT/xcdat/issues/641 | |
args = { | |
"paths": filepath, | |
"decode_times": True, | |
"add_bounds": ["X", "Y"], | |
"coords": "minimal", | |
"compat": "override", | |
} |
To correct myself, xCDAT can handle a missing "calendar" attribute by defaulting it to
Root cause of this issue:Related dataset: This issue is that this dataset uses a time scalar float of 150.5, which represents the middle of the month. xCDAT's decoding logic uses the MVCE based on xCDAT logicfrom datetime import datetime
from dateutil import parser
from dateutil import relativedelta as rd
import numpy as np
flat_offsets = np.array([150.5])
ref_date = "1983-06"
units_type = "months"
ref_datetime: datetime = parser.parse(ref_date, default=datetime(2000, 1, 1))
# ValueError: Non-integer years and months are ambiguous and not currently supported.
times = np.array(
[
ref_datetime + rd.relativedelta(**{units_type: offset})
for offset in flat_offsets
],
dtype="object",
) Possible workarounds
|
Hi Tom, I think the fractional month could be ambiguous and that's why |
Hey @chengzhuzhang, the file with the fixed time coordinate that needs to be copied from NERSC to LCRC is I created a backup of this file in the same directory on NERSC just in case. The backup has |
Findings for the "Test run script with model_vs_model run" task
Why is
|
def get_ref_climo_dataset( | |
self, var_key: str, season: ClimoFreq, ds_test: xr.Dataset | |
): | |
"""Get the reference climatology dataset for the variable and season. | |
If the reference climatatology does not exist or could not be found, it | |
will be considered a model-only run. For this case the test dataset | |
is returned as a default value and subsequent metrics calculations will | |
only be performed on the original test dataset. | |
Parameters | |
---------- | |
var_key : str | |
The key of the variable. | |
season : CLIMO_FREQ | |
The climatology frequency. | |
ds_test : xr.Dataset | |
The test dataset, which is returned if the reference climatology | |
does not exist or could not be found. | |
Returns | |
------- | |
xr.Dataset | |
The reference climatology if it exists or a copy of the test dataset | |
if it does not exist. | |
Raises | |
------ | |
RuntimeError | |
If `self.data_type` is not "ref". | |
""" | |
# TODO: This logic was carried over from legacy implementation. It | |
# can probably be improved on by setting `ds_ref = None` and not | |
# performing unnecessary operations on `ds_ref` for model-only runs, | |
# since it is the same as `ds_test`. In addition, returning ds_test | |
# makes it difficult for debugging. | |
if self.data_type == "ref": | |
try: | |
ds_ref = self.get_climo_dataset(var_key, season) | |
self.model_only = False | |
except (RuntimeError, IOError): | |
ds_ref = ds_test.copy() | |
self.model_only = True | |
logger.info("Cannot process reference data, analyzing test data only.") | |
else: | |
raise RuntimeError( | |
"`Dataset._get_ref_dataset` only works with " | |
f"`self.data_type == 'ref'`, not {self.data_type}." | |
) | |
return ds_ref |
dataset.py
has get_climo_variable()
and only lat_lon
set has the try and except for model-only runs. This means all other sets will fail, leading to less files being produced compared to dev.
e3sm_diags/e3sm_diags/driver/utils/dataset.py
Lines 120 to 173 in 9be8cea
def get_climo_variable(self, var, season, extra_vars=[], *args, **kwargs): """ For a given season, get the variable and any extra variables and run the climatology on them. These variables can either be from the test data or reference data. """ self.var = var self.extra_vars = extra_vars if not self.var: raise RuntimeError("Variable is invalid.") if not season: raise RuntimeError("Season is invalid.") # We need to make two decisions: # 1) Are the files being used reference or test data? # - This is done with self.ref and self.test. # 2) Are the files being used climo or timeseries files? # - This is done with the ref_timeseries_input and test_timeseries_input parameters. if self.ref and self.is_timeseries(): # Get the reference variable from timeseries files. data_path = self.parameters.reference_data_path timeseries_vars = self._get_timeseries_var(data_path, *args, **kwargs) # Run climo on the variables. variables = [self.climo_fcn(v, season) for v in timeseries_vars] elif self.test and self.is_timeseries(): # Get the test variable from timeseries files. data_path = self.parameters.test_data_path timeseries_vars = self._get_timeseries_var(data_path, *args, **kwargs) # Run climo on the variables. variables = [self.climo_fcn(v, season) for v in timeseries_vars] elif self.ref: # Get the reference variable from climo files. filename = self.get_ref_filename_climo(season) variables = self._get_climo_var(filename, *args, **kwargs) elif self.test: # Get the test variable from climo files. filename = self.get_test_filename_climo(season) variables = self._get_climo_var(filename, *args, **kwargs) else: msg = "Error when determining what kind (ref or test) " msg += "of variable to get and where to get it from " msg += "(climo or timeseries files)." raise RuntimeError(msg) # Needed so we can do: # v1 = Dataset.get_variable('v1', season) # and also: # v1, v2, v3 = Dataset.get_variable('v1', season, extra_vars=['v2', 'v3']) return variables[0] if len(variables) == 1 else variables lat_lon
code:e3sm_diags/e3sm_diags/driver/lat_lon_driver.py
Lines 149 to 156 in 2b303df
mv1 = test_data.get_climo_variable(var, season) try: mv2 = ref_data.get_climo_variable(var, season) except (RuntimeError, IOError): mv2 = mv1 logger.info("Can not process reference data, analyse test data only") parameter.model_only = True
Options
- Update all sets to use
get_climo_dataset()
and only addtry and except
tolat_lon
, then removeget_ref_climo_dataset()
-- cleanest solution - Update logic on
get_ref_climo_dataset()
to perform model-only run forlat_lon
if reference climo file is not found -- easiest solution - Keep things as is if we want to perform model-only runs -- note, this might lead to slower overall runtime compared to main since more sets and variables are being processed
Hey @chengzhuzhang, can you provide your thoughts to my above comment? |
Hi @tomvothecoder I'm voting option 1 as the best. It can be a good opportunity to also address this relevant issue? #823 . |
- Refactor lat_lon driver to split up functions used for model-only run - remove `_get_ref_climo_dataset` from `dataset_xr.py`
@tomvothecoder I have copied the data to LCRC input data server. We can run mache to sync data to all machines once new e3sm_unified version is deployed. |
- Fix area being dropped in dataset class
I found a bug with that file where the variable values were replaced with 0.0 for some reason. I recreated the file on NERSC. Can you point me to the path and I will copy it over again? |
I can't overwrite the file due to permissions and different group access. -rw-r--r-- 1 ac.zhang40 cels 2238458 Sep 9 17:12 ISCCPCOSP_ANN_climo.nc
-rwxr-xr-x 1 ac.zhang40 E3SM 2162972 Nov 5 2019 ISCCPCOSP_ANN_climo.nc.bak
-rwxr-xr-x 1 ac.zhang40 E3SM 2162972 Nov 5 2019 ISCCPCOSP_DJF_climo.nc
-rwxr-xr-x 1 ac.zhang40 E3SM 2162972 Nov 5 2019 ISCCPCOSP_JJA_climo.nc
-rwxr-xr-x 1 ac.zhang40 E3SM 2162972 Nov 5 2019 ISCCPCOSP_MAM_climo.nc
-rwxr-xr-x 1 ac.zhang40 E3SM 2162972 Nov 5 2019 ISCCPCOSP_SON_climo.nc Whenever you have time, can you move the latest fixed version of this file from NERSC? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regression test results
examples
scripts -- all run, within rtolmodel_vs_model
-- 124/128 variables within rtol, 4/128 not (but okay due to bug onmain
-> refer to PR descriptionmodel_vs_obs
-- 1213/1215 variables within rtol, 2/1215 not within rtol due to <=8/64800 elements mismatching (0.04% and 1.2% max rel diff) -- not a concern
Summary of Changes
e3sm_diags/driver/
annual_cycle_zonal_mean_driver.py
,cosp_histogram_driver.py
,meridional_mean_2d_driver.py
,polar_driver.py
,zonal_mean_2d_driver.py
,zonal_mean_xy_driver.py
-- replacedget_ref_climo_dataset()
withget_climo_dataset()
lat_lon_driver.py
- Add
_run_diags_2d_model_only()
and_run_diags_3d_model_only()
functions (whends_ref
reference dataset isNone
) - Add
_get_ref_climo_dataset()
-- now returnsNone
if reference dataset is not passed instead of settingds_ref = ds_test
- Refactor
_create_metrics_dict()
- Use
DEFAULT_METRICS_VALUE
(999.99) as default value for missing metrics to support viewer, otherwise it will break ifNone
- Use
- Add
driver/utils/dataset_xr.py
- Remove
get_ref_climo_dataset()
since it is only used bylat_lon_driver.py
- Fix
_subset_vars_and_load()
to keep"area"
variable
Run scripts
- Delete all outdated
run_....py
scripts - Rename
run_v_2_9_0_all_sets_E3SM_machines.py
torun_all_sets_E3SM_machine.py
-- the filename can be version agonistic and updated based on the latest version
Update complete_run.py
- Port over functions from
run_v2_6_0_all_sets.py
needed in this module
@@ -1,12 +1,15 @@ | |||
from __future__ import annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A file of interest for review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misc. findings
This one is done! |
- interacts effectively with the PCMDI's metrics package and the ARM | ||
diagnostics package through a unifying framework: `Community | ||
Diagnostics Package (CDP) <https://github.com/CDAT/cdp>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e3sm_diags no longer uses CDP
Merging this PR based on my review comment above. |
Refer to the PR for more information because the changelog is massive. Update build workflow to run on `cdat-migration-fy24` branch CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743) - Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files - Fix some lingering unit tests failure - Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml` - Add `xskillscore` to `ci.yml` - Fix `pre-commit` issues CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744) - Add Makefile that simplifies common development commands (building and installing, testing, etc.) - Write unit tests to cover all new code for utility functions - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py` - Metrics comparison for `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs - Test run with 3D variables (`_run_3d_diags()`) - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still - Fix subsetting syntax bug using ilev - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()` - Fix failing integration tests pass in CI/CD - Refactor `test_diags.py` -- replace unittest with pytest - Refactor `test_all_sets.py` -- replace unittest with pytest - Test climatology datasets -- tested with 3d variables using `test_all_sets.py` CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746) - Move driver type annotations to `type_annotations.py` - Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py` - Update `_save_data_metrics_and_plots` args to accept `plot_func` callable - Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False` - Move `parameter` arg to the top in `lat_lon_plot.plot` - Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class Regression testing for lat_lon variables `NET_FLUX_SRF` and `RESTOM` (#754) Update regression test notebook to show validation of all vars Add `subset_and_align_datasets()` to regrid.py (#776) Add template run scripts CDAT Migration Phase: Refactor `cosp_histogram` set (#748) - Refactor `cosp_histogram_driver.py` and `cosp_histogram_plot.py` - `formulas_cosp.py` (new file) - Includes refactored, Xarray-based `cosp_histogram_standard()` and `cosp_bin_sum()` functions - I wrote a lot of new code in `formulas_cosp.py` to clean up `derivations.py` and the old equivalent functions in `utils.py` - `derivations.py` - Cleaned up portions of `DERIVED_VARIABLES` dictionary - Removed unnecessary `OrderedDict` usage for `cosp_histogram` related variables (we should do this for the rest of the variables in in #716) - Remove unnecessary `convert_units()` function calls - Move cloud levels passed to derived variable formulas to `formulas_cosp.CLOUD_BIN_SUM_MAP` - `utils.py` - Delete deprecated, CDAT-based `cosp_histogram` functions - `dataset_xr.py` - Add `dataset_xr.Dataset._open_climo_dataset()` method with a catch for dataset quality issues where "time" is a scalar variable that does not match the "time" dimension array length, drops this variable and replaces it with the correct coordinate - Update `_get_dataset_with_derivation_func()` to handle derivation functions that require the `xr.Dataset` and `target_var_key` args (e.g., `cosp_histogram_standardize()` and `cosp_bin_sum()`) - `io.py` - Update `_write_vars_to_netcdf()` to write test, ref, and diff variables to individual netCDF (required for easy comparison to CDAT-based code that does the same thing) - Add `cdat_migration_regression_test_netcdf.ipynb` validation notebook template for comparing `.nc` files CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774) Refactor 654 zonal mean xy (#752) Co-authored-by: Tom Vo <[email protected]> CDAT Migration - Update run script output directory to NERSC public webserver (#793) [PR]: CDAT Migration: Refactor `aerosol_aeronet` set (#788) CDAT Migration: Test `lat_lon` set with run script and debug any issues (#794) CDAT Migration: Refactor `polar` set (#749) Co-authored-by: Tom Vo <[email protected]> Align order of calls to `_set_param_output_attrs` CDAT Migration: Refactor `meridional_mean_2d` set (#795) CDAT Migration: Refactor `aerosol_budget` (#800) Add `acme.py` changes from PR #712 (#814) * Add `acme.py` changes from PR #712 * Replace unnecessary lambda call Refactor area_mean_time_series and add ccb slice flag feature (#750) Co-authored-by: Tom Vo <[email protected]> [Refactor]: Validate fix in PR #750 for #759 (#815) CDAT Migration Phase 2: Refactor `diurnal_cycle` set (#819) CDAT Migration: Refactor annual_cycle_zonal_mean set (#798) * Refactor `annual_cycle_zonal_mean` set * Address PR review comments * Add lat lon regression testing * Add debugging scripts * Update `_open_climo_dataset()` to decode times as workaround to misaligned time coords - Update `annual_cycle_zonal_mean_plot.py` to convert time coordinates to month integers * Fix unit tests * Remove old plotter * Add script to debug decode_times=True and ncclimo file * Update plotter time values to month integers * Fix slow `.load()` and multiprocessing issue - Due to incorrectly updating `keep_bnds` logic - Add `_encode_time_coords()` to workaround cftime issue `ValueError: "months since" units only allowed for "360_day" calendar` * Update `_encode_time_coords()` docstring * Add AODVIS debug script * update AODVIS obs datasets; regression test results --------- Co-authored-by: Tom Vo <[email protected]> CDAT Migration Phase 2: Refactor `qbo` set (#826) CDAT Migration Phase 2: Refactor tc_analysis set (#829) * start tc_analysis_refactor * update driver * update plotting * Clean up plotter - Remove unused variables - Make `plot_info` a constant called `PLOT_INFO`, which is now a dict of dicts - Reorder functions for top-down readability * Remove unused notebook --------- Co-authored-by: tomvothecoder <[email protected]> CDAT Migration Phase 2: Refactor `enso_diags` set (#832) CDAT Migration Phase 2: Refactor `streamflow` set (#837) [Bug]: CDAT Migration Phase 2: enso_diags plot fixes (#841) [Refactor]: CDAT Migration Phase 3: testing and documentation update (#846) CDAT Migration Phase 3 - Port QBO Wavelet feature to Xarray/xCDAT codebase (#860) CDAT Migration Phase 2: Refactor arm_diags set (#842) Add performance benchmark material (#864) Add function to add CF axis attr to Z axis if missing for downstream xCDAT operations (#865) CDAT Migration Phase 3: Add Convective Precipitation Fraction in lat-lon (#875) CDAT Migration Phase 3: Fix LHFLX name and add catch for non-existent or empty TE stitch file (#876) Add support for time series datasets via glob and fix `enso_diags` set (#866) Add fix for checking `is_time_series()` property based on `data_type` attr (#881) CDAT migration: Fix African easterly wave density plots in TC analysis and convert H20LNZ units to ppm/volume (#882) CDAT Migration: Update `mp_partition_driver.py` to use Dataset from `dataset_xr.py` (#883) CDAT Migration - Port JJB tropical subseasonal diags to Xarray/xCDAT (#887) CDAT Migration: Prepare branch for merge to `main` (#885) [Refactor]: CDAT Migration - Update dependencies and remove Dataset._add_cf_attrs_to_z_axes() (#891) CDAT Migration Phase 2: Refactor core utilities and `lat_lon` set (#677) Refer to the PR for more information because the changelog is massive. Update build workflow to run on `cdat-migration-fy24` branch CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743) - Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files - Fix some lingering unit tests failure - Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml` - Add `xskillscore` to `ci.yml` - Fix `pre-commit` issues CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744) - Add Makefile that simplifies common development commands (building and installing, testing, etc.) - Write unit tests to cover all new code for utility functions - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py` - Metrics comparison for `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs - Test run with 3D variables (`_run_3d_diags()`) - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still - Fix subsetting syntax bug using ilev - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()` - Fix failing integration tests pass in CI/CD - Refactor `test_diags.py` -- replace unittest with pytest - Refactor `test_all_sets.py` -- replace unittest with pytest - Test climatology datasets -- tested with 3d variables using `test_all_sets.py` CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746) - Move driver type annotations to `type_annotations.py` - Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py` - Update `_save_data_metrics_and_plots` args to accept `plot_func` callable - Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False` - Move `parameter` arg to the top in `lat_lon_plot.plot` - Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774) CDAT Migration Phase 2: Refactor `qbo` set (#826)
Refer to the PR for more information because the changelog is massive. Update build workflow to run on `cdat-migration-fy24` branch CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743) - Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files - Fix some lingering unit tests failure - Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml` - Add `xskillscore` to `ci.yml` - Fix `pre-commit` issues CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744) - Add Makefile that simplifies common development commands (building and installing, testing, etc.) - Write unit tests to cover all new code for utility functions - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py` - Metrics comparison for `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs - Test run with 3D variables (`_run_3d_diags()`) - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still - Fix subsetting syntax bug using ilev - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()` - Fix failing integration tests pass in CI/CD - Refactor `test_diags.py` -- replace unittest with pytest - Refactor `test_all_sets.py` -- replace unittest with pytest - Test climatology datasets -- tested with 3d variables using `test_all_sets.py` CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746) - Move driver type annotations to `type_annotations.py` - Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py` - Update `_save_data_metrics_and_plots` args to accept `plot_func` callable - Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False` - Move `parameter` arg to the top in `lat_lon_plot.plot` - Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class Regression testing for lat_lon variables `NET_FLUX_SRF` and `RESTOM` (#754) Update regression test notebook to show validation of all vars Add `subset_and_align_datasets()` to regrid.py (#776) Add template run scripts CDAT Migration Phase: Refactor `cosp_histogram` set (#748) - Refactor `cosp_histogram_driver.py` and `cosp_histogram_plot.py` - `formulas_cosp.py` (new file) - Includes refactored, Xarray-based `cosp_histogram_standard()` and `cosp_bin_sum()` functions - I wrote a lot of new code in `formulas_cosp.py` to clean up `derivations.py` and the old equivalent functions in `utils.py` - `derivations.py` - Cleaned up portions of `DERIVED_VARIABLES` dictionary - Removed unnecessary `OrderedDict` usage for `cosp_histogram` related variables (we should do this for the rest of the variables in in #716) - Remove unnecessary `convert_units()` function calls - Move cloud levels passed to derived variable formulas to `formulas_cosp.CLOUD_BIN_SUM_MAP` - `utils.py` - Delete deprecated, CDAT-based `cosp_histogram` functions - `dataset_xr.py` - Add `dataset_xr.Dataset._open_climo_dataset()` method with a catch for dataset quality issues where "time" is a scalar variable that does not match the "time" dimension array length, drops this variable and replaces it with the correct coordinate - Update `_get_dataset_with_derivation_func()` to handle derivation functions that require the `xr.Dataset` and `target_var_key` args (e.g., `cosp_histogram_standardize()` and `cosp_bin_sum()`) - `io.py` - Update `_write_vars_to_netcdf()` to write test, ref, and diff variables to individual netCDF (required for easy comparison to CDAT-based code that does the same thing) - Add `cdat_migration_regression_test_netcdf.ipynb` validation notebook template for comparing `.nc` files CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774) Refactor 654 zonal mean xy (#752) Co-authored-by: Tom Vo <[email protected]> CDAT Migration - Update run script output directory to NERSC public webserver (#793) [PR]: CDAT Migration: Refactor `aerosol_aeronet` set (#788) CDAT Migration: Test `lat_lon` set with run script and debug any issues (#794) CDAT Migration: Refactor `polar` set (#749) Co-authored-by: Tom Vo <[email protected]> Align order of calls to `_set_param_output_attrs` CDAT Migration: Refactor `meridional_mean_2d` set (#795) CDAT Migration: Refactor `aerosol_budget` (#800) Add `acme.py` changes from PR #712 (#814) * Add `acme.py` changes from PR #712 * Replace unnecessary lambda call Refactor area_mean_time_series and add ccb slice flag feature (#750) Co-authored-by: Tom Vo <[email protected]> [Refactor]: Validate fix in PR #750 for #759 (#815) CDAT Migration Phase 2: Refactor `diurnal_cycle` set (#819) CDAT Migration: Refactor annual_cycle_zonal_mean set (#798) * Refactor `annual_cycle_zonal_mean` set * Address PR review comments * Add lat lon regression testing * Add debugging scripts * Update `_open_climo_dataset()` to decode times as workaround to misaligned time coords - Update `annual_cycle_zonal_mean_plot.py` to convert time coordinates to month integers * Fix unit tests * Remove old plotter * Add script to debug decode_times=True and ncclimo file * Update plotter time values to month integers * Fix slow `.load()` and multiprocessing issue - Due to incorrectly updating `keep_bnds` logic - Add `_encode_time_coords()` to workaround cftime issue `ValueError: "months since" units only allowed for "360_day" calendar` * Update `_encode_time_coords()` docstring * Add AODVIS debug script * update AODVIS obs datasets; regression test results --------- Co-authored-by: Tom Vo <[email protected]> CDAT Migration Phase 2: Refactor `qbo` set (#826) CDAT Migration Phase 2: Refactor tc_analysis set (#829) * start tc_analysis_refactor * update driver * update plotting * Clean up plotter - Remove unused variables - Make `plot_info` a constant called `PLOT_INFO`, which is now a dict of dicts - Reorder functions for top-down readability * Remove unused notebook --------- Co-authored-by: tomvothecoder <[email protected]> CDAT Migration Phase 2: Refactor `enso_diags` set (#832) CDAT Migration Phase 2: Refactor `streamflow` set (#837) [Bug]: CDAT Migration Phase 2: enso_diags plot fixes (#841) [Refactor]: CDAT Migration Phase 3: testing and documentation update (#846) CDAT Migration Phase 3 - Port QBO Wavelet feature to Xarray/xCDAT codebase (#860) CDAT Migration Phase 2: Refactor arm_diags set (#842) Add performance benchmark material (#864) Add function to add CF axis attr to Z axis if missing for downstream xCDAT operations (#865) CDAT Migration Phase 3: Add Convective Precipitation Fraction in lat-lon (#875) CDAT Migration Phase 3: Fix LHFLX name and add catch for non-existent or empty TE stitch file (#876) Add support for time series datasets via glob and fix `enso_diags` set (#866) Add fix for checking `is_time_series()` property based on `data_type` attr (#881) CDAT migration: Fix African easterly wave density plots in TC analysis and convert H20LNZ units to ppm/volume (#882) CDAT Migration: Update `mp_partition_driver.py` to use Dataset from `dataset_xr.py` (#883) CDAT Migration - Port JJB tropical subseasonal diags to Xarray/xCDAT (#887) CDAT Migration: Prepare branch for merge to `main` (#885) [Refactor]: CDAT Migration - Update dependencies and remove Dataset._add_cf_attrs_to_z_axes() (#891) CDAT Migration Phase 2: Refactor core utilities and `lat_lon` set (#677) Refer to the PR for more information because the changelog is massive. Update build workflow to run on `cdat-migration-fy24` branch CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743) - Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files - Fix some lingering unit tests failure - Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml` - Add `xskillscore` to `ci.yml` - Fix `pre-commit` issues CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744) - Add Makefile that simplifies common development commands (building and installing, testing, etc.) - Write unit tests to cover all new code for utility functions - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py` - Metrics comparison for `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs - Test run with 3D variables (`_run_3d_diags()`) - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still - Fix subsetting syntax bug using ilev - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()` - Fix failing integration tests pass in CI/CD - Refactor `test_diags.py` -- replace unittest with pytest - Refactor `test_all_sets.py` -- replace unittest with pytest - Test climatology datasets -- tested with 3d variables using `test_all_sets.py` CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746) - Move driver type annotations to `type_annotations.py` - Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py` - Update `_save_data_metrics_and_plots` args to accept `plot_func` callable - Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False` - Move `parameter` arg to the top in `lat_lon_plot.plot` - Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774) CDAT Migration Phase 2: Refactor `qbo` set (#826)
Refer to the PR for more information because the changelog is massive. Update build workflow to run on `cdat-migration-fy24` branch CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743) - Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files - Fix some lingering unit tests failure - Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml` - Add `xskillscore` to `ci.yml` - Fix `pre-commit` issues CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744) - Add Makefile that simplifies common development commands (building and installing, testing, etc.) - Write unit tests to cover all new code for utility functions - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py` - Metrics comparison for `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs - Test run with 3D variables (`_run_3d_diags()`) - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still - Fix subsetting syntax bug using ilev - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()` - Fix failing integration tests pass in CI/CD - Refactor `test_diags.py` -- replace unittest with pytest - Refactor `test_all_sets.py` -- replace unittest with pytest - Test climatology datasets -- tested with 3d variables using `test_all_sets.py` CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746) - Move driver type annotations to `type_annotations.py` - Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py` - Update `_save_data_metrics_and_plots` args to accept `plot_func` callable - Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False` - Move `parameter` arg to the top in `lat_lon_plot.plot` - Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class Regression testing for lat_lon variables `NET_FLUX_SRF` and `RESTOM` (#754) Update regression test notebook to show validation of all vars Add `subset_and_align_datasets()` to regrid.py (#776) Add template run scripts CDAT Migration Phase: Refactor `cosp_histogram` set (#748) - Refactor `cosp_histogram_driver.py` and `cosp_histogram_plot.py` - `formulas_cosp.py` (new file) - Includes refactored, Xarray-based `cosp_histogram_standard()` and `cosp_bin_sum()` functions - I wrote a lot of new code in `formulas_cosp.py` to clean up `derivations.py` and the old equivalent functions in `utils.py` - `derivations.py` - Cleaned up portions of `DERIVED_VARIABLES` dictionary - Removed unnecessary `OrderedDict` usage for `cosp_histogram` related variables (we should do this for the rest of the variables in in #716) - Remove unnecessary `convert_units()` function calls - Move cloud levels passed to derived variable formulas to `formulas_cosp.CLOUD_BIN_SUM_MAP` - `utils.py` - Delete deprecated, CDAT-based `cosp_histogram` functions - `dataset_xr.py` - Add `dataset_xr.Dataset._open_climo_dataset()` method with a catch for dataset quality issues where "time" is a scalar variable that does not match the "time" dimension array length, drops this variable and replaces it with the correct coordinate - Update `_get_dataset_with_derivation_func()` to handle derivation functions that require the `xr.Dataset` and `target_var_key` args (e.g., `cosp_histogram_standardize()` and `cosp_bin_sum()`) - `io.py` - Update `_write_vars_to_netcdf()` to write test, ref, and diff variables to individual netCDF (required for easy comparison to CDAT-based code that does the same thing) - Add `cdat_migration_regression_test_netcdf.ipynb` validation notebook template for comparing `.nc` files CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774) Refactor 654 zonal mean xy (#752) Co-authored-by: Tom Vo <[email protected]> CDAT Migration - Update run script output directory to NERSC public webserver (#793) [PR]: CDAT Migration: Refactor `aerosol_aeronet` set (#788) CDAT Migration: Test `lat_lon` set with run script and debug any issues (#794) CDAT Migration: Refactor `polar` set (#749) Co-authored-by: Tom Vo <[email protected]> Align order of calls to `_set_param_output_attrs` CDAT Migration: Refactor `meridional_mean_2d` set (#795) CDAT Migration: Refactor `aerosol_budget` (#800) Add `acme.py` changes from PR #712 (#814) * Add `acme.py` changes from PR #712 * Replace unnecessary lambda call Refactor area_mean_time_series and add ccb slice flag feature (#750) Co-authored-by: Tom Vo <[email protected]> [Refactor]: Validate fix in PR #750 for #759 (#815) CDAT Migration Phase 2: Refactor `diurnal_cycle` set (#819) CDAT Migration: Refactor annual_cycle_zonal_mean set (#798) * Refactor `annual_cycle_zonal_mean` set * Address PR review comments * Add lat lon regression testing * Add debugging scripts * Update `_open_climo_dataset()` to decode times as workaround to misaligned time coords - Update `annual_cycle_zonal_mean_plot.py` to convert time coordinates to month integers * Fix unit tests * Remove old plotter * Add script to debug decode_times=True and ncclimo file * Update plotter time values to month integers * Fix slow `.load()` and multiprocessing issue - Due to incorrectly updating `keep_bnds` logic - Add `_encode_time_coords()` to workaround cftime issue `ValueError: "months since" units only allowed for "360_day" calendar` * Update `_encode_time_coords()` docstring * Add AODVIS debug script * update AODVIS obs datasets; regression test results --------- Co-authored-by: Tom Vo <[email protected]> CDAT Migration Phase 2: Refactor `qbo` set (#826) CDAT Migration Phase 2: Refactor tc_analysis set (#829) * start tc_analysis_refactor * update driver * update plotting * Clean up plotter - Remove unused variables - Make `plot_info` a constant called `PLOT_INFO`, which is now a dict of dicts - Reorder functions for top-down readability * Remove unused notebook --------- Co-authored-by: tomvothecoder <[email protected]> CDAT Migration Phase 2: Refactor `enso_diags` set (#832) CDAT Migration Phase 2: Refactor `streamflow` set (#837) [Bug]: CDAT Migration Phase 2: enso_diags plot fixes (#841) [Refactor]: CDAT Migration Phase 3: testing and documentation update (#846) CDAT Migration Phase 3 - Port QBO Wavelet feature to Xarray/xCDAT codebase (#860) CDAT Migration Phase 2: Refactor arm_diags set (#842) Add performance benchmark material (#864) Add function to add CF axis attr to Z axis if missing for downstream xCDAT operations (#865) CDAT Migration Phase 3: Add Convective Precipitation Fraction in lat-lon (#875) CDAT Migration Phase 3: Fix LHFLX name and add catch for non-existent or empty TE stitch file (#876) Add support for time series datasets via glob and fix `enso_diags` set (#866) Add fix for checking `is_time_series()` property based on `data_type` attr (#881) CDAT migration: Fix African easterly wave density plots in TC analysis and convert H20LNZ units to ppm/volume (#882) CDAT Migration: Update `mp_partition_driver.py` to use Dataset from `dataset_xr.py` (#883) CDAT Migration - Port JJB tropical subseasonal diags to Xarray/xCDAT (#887) CDAT Migration: Prepare branch for merge to `main` (#885) [Refactor]: CDAT Migration - Update dependencies and remove Dataset._add_cf_attrs_to_z_axes() (#891) CDAT Migration Phase 2: Refactor core utilities and `lat_lon` set (#677) Refer to the PR for more information because the changelog is massive. Update build workflow to run on `cdat-migration-fy24` branch CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743) - Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files - Fix some lingering unit tests failure - Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml` - Add `xskillscore` to `ci.yml` - Fix `pre-commit` issues CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744) - Add Makefile that simplifies common development commands (building and installing, testing, etc.) - Write unit tests to cover all new code for utility functions - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py` - Metrics comparison for `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs - Test run with 3D variables (`_run_3d_diags()`) - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still - Fix subsetting syntax bug using ilev - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()` - Fix failing integration tests pass in CI/CD - Refactor `test_diags.py` -- replace unittest with pytest - Refactor `test_all_sets.py` -- replace unittest with pytest - Test climatology datasets -- tested with 3d variables using `test_all_sets.py` CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746) - Move driver type annotations to `type_annotations.py` - Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py` - Update `_save_data_metrics_and_plots` args to accept `plot_func` callable - Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False` - Move `parameter` arg to the top in `lat_lon_plot.plot` - Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774) CDAT Migration Phase 2: Refactor `qbo` set (#826)
Description
1. Examples
In the root
examples/
directory, there are example run scripts that use the legacy way of configuring and running E3SM diagnostics. They should all be refactored to reflect the latest changes.Tasks
examples
scripts work and produce close/same results asmain
ex1
ex2
ex3
ex4
ex5
--a few variables failed due to bugFIXEDex6
--a few variables failed due to bugFIXEDex7
ex5
andex6
variable (comment)/global/cfs/cdirs/e3sm/e3sm_diags/obs_for_e3sm_diags/climatology/ISCCPCOSP/ISCCPCOSP_ANN_climo.nc
2. Run script with
model_vs_model
Tasks
model_vs_model
runget_ref_climo_dataset()
in all sets exceptlat_lon
lat_lon_driver.py
to handle model-only runs more cleanly, also fixes issue where model-only plots would incorrectly generate obs plot and diff plot (which CDAT code does not do)main
([Bug]: Silent bug inadjust_prs_val_units()
conditional whereif prs_val0:
will beFalse
ifprs_val0
is 0 #797)main
(refer to [Bug]:model_vs_model
lat_lon
variables not generated onmain
but generated oncdat-migration-fy24
#854)model_vs_obs
with above change and perform regression3. Documentation
Tasks
/tutorials
as needed -- none needed/tutorials/2024
as needed -- none neededrun_...py
scripts (only keeprun_v2_9_0_all_sets_E3SM_machines.py
run_v2_9_0_all_sets_E3SM_machines.py
torun_all_sets_E3SM_machines.py
Checklist
If applicable: