Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset.from_dataframe will produce a FutureWarning for DatetimeTZ data #2666

Open
shoyer opened this issue Jan 11, 2019 · 6 comments
Open

Comments

@shoyer
Copy link
Member

shoyer commented Jan 11, 2019

This appears with the development version of pandas; see pandas-dev/pandas#24716 for details.

Example:

In [16]: df = pd.DataFrame({"A": pd.date_range('2000', periods=12, tz='US/Central')})

In [17]: df.to_xarray()
/Users/taugspurger/Envs/pandas-dev/lib/python3.7/site-packages/xarray/core/dataset.py:3111: FutureWarning: Converting timezone-aware DatetimeArray to timezone-naive ndarray with 'datetime64[ns]' dtype. In the future, this will return an ndarray with 'object' dtype where each element is a 'pandas.Timestamp' with the correct 'tz'.
        To accept the future behavior, pass 'dtype=object'.
        To keep the old behavior, pass 'dtype="datetime64[ns]"'.
  data = np.asarray(series).reshape(shape)
Out[17]:
<xarray.Dataset>
Dimensions:  (index: 12)
Coordinates:
  * index    (index) int64 0 1 2 3 4 5 6 7 8 9 10 11
Data variables:
    A        (index) datetime64[ns] 2000-01-01T06:00:00 ... 2000-01-12T06:00:00
@shoyer
Copy link
Member Author

shoyer commented Jan 11, 2019

I'm open to suggestions here, especially from users who use DatetimeTZ data in pandas.

As noted in pandas-dev/pandas#24716, I think the cleanest solution is probably to add a dtypes argument to from_dataframe, to allow users to specify their own desired dtypes for pandas -> numpy coercion.

@meistermeister
Copy link

The current implementation caused some issues, since files from different sources suddenly can't import and sort together like they used to. I'm fine with a dtypes argument, but where will you pass it? This is arising in .sort_index() for me.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Dec 30, 2019

Just FYI, we're potentially enforcing this deprecation in pandas-dev/pandas#30563 (which would be included in a pandas release in a week or two). Is that likely to cause problems for xarray users?

It's not clear to me what the desired behavior is (#3291 seems to want to preserve the tz, though it isn't clear they are willing to be forced into an object dtype array for it).

@TomAugspurger
Copy link
Contributor

And there are a couple places that need updating, even with a dtypes argument to let the user specify things. We also hit this via Dataset.__setitem__

~/sandbox/xarray/xarray/core/dataset.py in __setitem__(self, key, value)
   1268             )
   1269
-> 1270         self.update({key: value})
   1271
   1272     def __delitem__(self, key: Hashable) -> None:

~/sandbox/xarray/xarray/core/dataset.py in update(self, other, inplace)
   3521         """
   3522         _check_inplace(inplace)
-> 3523         merge_result = dataset_update_method(self, other)
   3524         return self._replace(inplace=True, **merge_result._asdict())
   3525

~/sandbox/xarray/xarray/core/merge.py in dataset_update_method(dataset, other)
    862                     other[key] = value.drop_vars(coord_names)
    863
--> 864     return merge_core([dataset, other], priority_arg=1, indexes=dataset.indexes)

~/sandbox/xarray/xarray/core/merge.py in merge_core(objects, compat, join, priority_arg, explicit_coords, indexes, fill_value)
    550         coerced, join=join, copy=False, indexes=indexes, fill_value=fill_value
    551     )
--> 552     collected = collect_variables_and_indexes(aligned)
    553
    554     prioritized = _get_priority_vars_and_indexes(aligned, priority_arg, compat=compat)

~/sandbox/xarray/xarray/core/merge.py in collect_variables_and_indexes(list_of_mappings)
    275                 append_all(coords, indexes)
    276
--> 277             variable = as_variable(variable, name=name)
    278             if variable.dims == (name,):
    279                 variable = variable.to_index_variable()

~/sandbox/xarray/xarray/core/variable.py in as_variable(obj, name)
    105     elif isinstance(obj, tuple):
    106         try:
--> 107             obj = Variable(*obj)
    108         except (TypeError, ValueError) as error:
    109             # use .format() instead of % because it handles tuples consistently

~/sandbox/xarray/xarray/core/variable.py in __init__(self, dims, data, attrs, encoding, fastpath)
    306             unrecognized encoding items.
    307         """
--> 308         self._data = as_compatible_data(data, fastpath=fastpath)
    309         self._dims = self._parse_dimensions(dims)
    310         self._attrs = None

~/sandbox/xarray/xarray/core/variable.py in as_compatible_data(data, fastpath)
    229     if isinstance(data, np.ndarray):
    230         if data.dtype.kind == "O":
--> 231             data = _possibly_convert_objects(data)
    232         elif data.dtype.kind == "M":
    233             data = np.asarray(data, "datetime64[ns]")

~/sandbox/xarray/xarray/core/variable.py in _possibly_convert_objects(values)
    165     datetime64 and timedelta64, according to the pandas convention.
    166     """
--> 167     return np.asarray(pd.Series(values.ravel())).reshape(values.shape)
    168
    169

~/sandbox/numpy/numpy/core/_asarray.py in asarray(a, dtype, order)
     83
     84     """
---> 85     return array(a, dtype, copy=False, order=order)
     86
     87

~/sandbox/pandas/pandas/core/series.py in __array__(self, dtype)
    730                 "To keep the old behavior, pass 'dtype=\"datetime64[ns]\"'."
    731             )
--> 732             warnings.warn(msg, FutureWarning, stacklevel=3)
    733             dtype = "M8[ns]"
    734         return np.asarray(self.array, dtype)

@shoyer
Copy link
Member Author

shoyer commented Dec 30, 2019

Just FYI, we're potentially enforcing this deprecation in pandas-dev/pandas#30563 (which would be included in a pandas release in a week or two). Is that likely to cause problems for xarray users?

I don't think so. Xarray users have been seeing this warning for a while, so they should expect something will change.

Also, I don't think there are that many users using DatetimeTZ in xarray.

And there are a couple places that need updating, even with a dtypes argument to let the user specify things. We also hit this via Dataset.__setitem__

I think this is basically the same change. Is there a full example of the behavior that you are worried about?

@TomAugspurger
Copy link
Contributor

I think this is basically the same change.

Ah, I was mistaken. I was thinking we needed to plump a dtype argument all the way through there, but I don't think that's necessary. I may be able to submit a PR with a dtypes argument for from_dataframe tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants