Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: add time axis/calendar creation to temporal library #364

Closed
durack1 opened this issue Oct 13, 2022 · 9 comments
Closed

[Feature]: add time axis/calendar creation to temporal library #364

durack1 opened this issue Oct 13, 2022 · 9 comments
Labels
type: enhancement New enhancement request

Comments

@durack1
Copy link
Collaborator

durack1 commented Oct 13, 2022

Is your feature request related to a problem?

I have previously written a function to create a time axis (durolib/makeCalendar) using the cdtime library. It seems like this would be a useful feature that doesn't appear to be currently included in xcdat/temporal.py.

Just checking this is in fact the case, or does someone have an uncommitted function available, somewhere? Or alternatively a library that they use to achieve this goal?

Describe the solution you'd like

A new function that replicates durolib/makeCalendar

Describe alternatives you've considered

Google?

Additional context

It doesn't appear that the cftime library includes this functionality from a quick pass

@durack1 durack1 added the type: enhancement New enhancement request label Oct 13, 2022
@pochedls
Copy link
Collaborator

I am not sure where this falls on the spectrum of too-specific-utility to generally-helpful-climate utility, but I'm thinking it probably would be worth including this kind of functionality in xcdat.

I had a need for a similar utility recently: I was reading in timesteps from files in which each file was one hourly timestep. After concatenating them to a DataArrayI needed to create a time axis. I did not create a function like this (I needed to be resilient to the possibility of missing time steps), but this would work for lots of datasets in a similar use case. @durack1 – is this the kind of use you're thinking of?

@durack1 - would you be willing to write + unit test this function? I think it could be added to .temporal as a stand-alone feature (as you suggest).

@chengzhuzhang / @lee1043 / @tomvothecoder – do you think this is a worthwhile utility or is this not the right scope / too liable to bloat making xcdat difficult to maintain?

@chengzhuzhang
Copy link
Collaborator

Hi @pochedls and @pochedls
Thank you for bringing this up. I can see how this utility can be helpful in cases when a time axis needs to be constructed as Steve described. I too don't have a good idea how we could fit it into xcdat. First of all, this function depends on the cdutil and cdtime libraries which are not included in xcdat at its current form. I think there are two items we can discuss: what functionality from both library is needed by xcdat and how can we port them over (are there other libraries we can utilize?); secondly, how can we integrate some higher level application/utilities in xcdat, presumably we don't need them tested as routinely and rigorously?

@lee1043
Copy link
Collaborator

lee1043 commented Oct 14, 2022

@durack1 @pochedls @chengzhuzhang thanks for the discussion.

I believe it would be useful to have the time-axis-creating capability. Recently @msahn had done similar work for temporal regridding using cdms -- converting IMERG's 30 min time interval data to 3 hourly (which is also related to obs4MIPs that @gleckler1 is leading). Such work requires creating proper time axis. I guess something like this will likely come again from xcdat users including those who will contribute obs4MIPs.

@gleckler1 any comment welcome.

@chengzhuzhang
Copy link
Collaborator

Just came cross this cftime_rang function from xarray. https://docs.xarray.dev/en/stable/generated/xarray.cftime_range.html
Not sure if this function would be potentially useful as an alternative?

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Oct 18, 2022

I believe it would be useful to have the time-axis-creating capability. Recently @msahn had done similar work for temporal regridding using cdms -- converting IMERG's 30 min time interval data to 3 hourly (which is also related to obs4MIPs that @gleckler1 is leading). Such work requires creating proper time axis. I guess something like this will likely come again from xcdat users including those who will contribute obs4MIPs.

Hey @lee1043, so the IMERG datasets have time coordinates with 30 min intervals that need to be resampled to 3 hourly?

If so, you can resample the time coordinates directly with xr.Dataset.resample(). This method should maintain the metadata so you won't need to create a new time axis xr.DataArray for the 3 hourly coordinates.

ds.resample(time="3H")

Here's a general guide for resampling: https://docs.xarray.dev/en/stable/user-guide/time-series.html#resampling-and-grouped-operations

Just came cross this cftime_rang function from xarray. docs.xarray.dev/en/stable/generated/xarray.cftime_range.html
Not sure if this function would be potentially useful as an alternative?

Thanks for looking for alternative solutions @chengzhuzhang.

Can we list requirements for this function to see if cftime_range() or other APIs meet our requirements? It sounds like this is a common function that might already be implemented.

Here is an example for creating a time axis xr.DataArray from scratch using xr.cftime_range():

import xarray as xr

import xcdat as xc

# 1. Create the cftime coordinates
time_coords = xr.cftime_range(
    start="2000-01-01", end="2000-02-02", periods=None, freq="D"
)

# 2. Create a time DataArray to represent time axis
time = xr.DataArray(
    name="time",
    data=time_coords,
    dims="time",
    attrs={"axis": "T", "standard_name": "time", "bounds": "time_bnds"},
)

print(time)
"""
<xarray.DataArray 'time' (time: 33)>
array([cftime.DatetimeGregorian(2000, 1, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 1, 2, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 1, 3, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 1, 4, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 1, 5, 0, 0, 0, 0, has_year_zero=False),
	   ....]
      dtype=object)
Coordinates:
  * time     (time) object 2000-01-01 00:00:00 ... 2000-02-02 00:00:00
Attributes:
    axis:           T
    standard_name:  time
"""

# 3. Add time coordinates to the Dataset and generate bounds
ds = xr.Dataset()
ds["time"] = time
ds = ds.bounds.add_bounds("T")

print(ds.time_bnds)

"""
<xarray.DataArray 'time_bnds' (time: 33, bnds: 2)>
array([[cftime.DatetimeGregorian(1999, 12, 31, 12, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2000, 1, 1, 12, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(2000, 1, 1, 12, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2000, 1, 2, 12, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(2000, 1, 2, 12, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2000, 1, 3, 12, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(2000, 1, 3, 12, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2000, 1, 4, 12, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(2000, 1, 4, 12, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2000, 1, 5, 12, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(2000, 1, 5, 12, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2000, 1, 6, 12, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(2000, 1, 6, 12, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2000, 1, 7, 12, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(2000, 1, 7, 12, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2000, 1, 8, 12, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(2000, 1, 8, 12, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2000, 1, 9, 12, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(2000, 1, 9, 12, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2000, 1, 10, 12, 0, 0, 0, has_year_zero=False)],
		...]
Coordinates:
  * time     (time) object 2000-01-01 00:00:00 ... 2000-02-02 00:00:00
Dimensions without coordinates: bnds
Attributes:
    xcdat_bounds:  True
"""

@lee1043
Copy link
Collaborator

lee1043 commented Oct 18, 2022

@tomvothecoder thanks for the tip. Good to learn about the xarray's resample function. Does it also adjust time bound to 3 hour? @msahn converted 30 min interval to 3 hourly data by averaging (or summing) 30 min timestep in each 3 hour range. The resampled dataset therefore should have right time bound, not just sampled from the original time bound. Do you know if the xarray's reample function handle the time bound too?

@msahn @gleckler1 tagging you two for potential relevance to the Obs4MIP data processing.

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Oct 18, 2022

@tomvothecoder thanks for the tip. Good to learn about the xarray's resample function. Does it also adjust time bound to 3 hour? @msahn converted 30 min interval to 3 hourly data by averaging (or summing) 30 min timestep in each 3 hour range. The resampled dataset therefore should have right time bound, not just sampled from the original time bound. Do you know if the xarray's reample function handle the time bound too?

@msahn @gleckler1 tagging you two for potential relevance to the Obs4MIP data processing.

Xarray's resample function does not resample time bounds unfortunately, they usually get dropped after resampling. (related issue pydata/xarray#2231 and xarray-contrib/cf-xarray#10).

A possible workaround is to resample time coordinates, then generate new bounds using them. Does this work for this case?

Example:

# 1. Create a dataset with 30min time coordinates
ds = xr.Dataset()
ds["time"] = xr.DataArray(
    name="time",
    data=xr.cftime_range(start="2000-01-01", end="2000-01-02", periods=None, freq="30min"),
    dims="time",
    attrs={"axis": "T", "standard_name": "time", "bounds": "time_bnds"},
)

# 2. Resample time coordinates to 3 hourly (average)
ds_resample = ds.resample(time="3H").mean()

# 3. Generate bounds with resampled time coordinates
ds_resample = ds_resample.bounds.add_bounds(axis="T")
print(ds_resample.time_bnds)

 

@durack1
Copy link
Collaborator Author

durack1 commented Oct 20, 2022

Just came cross this cftime_rang function from xarray. https://docs.xarray.dev/en/stable/generated/xarray.cftime_range.html
Not sure if this function would be potentially useful as an alternative?

Nice find @chengzhuzhang, I had been naively looking in https://unidata.github.io/cftime/api.html - maybe I should just centralize my searching within the xarray docs alone.

From a quick glance, this looks like exactly what I was chasing

@pochedls
Copy link
Collaborator

It appears this feature exists in cftime_range.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New enhancement request
Projects
None yet
Development

No branches or pull requests

5 participants