Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataArray to_dict() without converting with numpy tolist() #1599

Closed
nicain opened this issue Sep 29, 2017 · 15 comments · Fixed by #7739
Closed

DataArray to_dict() without converting with numpy tolist() #1599

nicain opened this issue Sep 29, 2017 · 15 comments · Fixed by #7739

Comments

@nicain
Copy link

nicain commented Sep 29, 2017

Enhancement:
Following #432 and #917, I need to convert my DataArray to a dictionary, but do not need to fully-jsonize the data. In fact, I would much prefer to get the data without using "decode_numpy_dict_values" to convert numpy to lists using tolist.

Is this feature requested elsewhere, or was it intentionally not considered as a use-case? If not, happy to contribute code.

@shoyer
Copy link
Member

shoyer commented Sep 29, 2017

We could add a keyword argument to control this, e.g., numpy=True to indicate that data should be kept as NumPy arrays.

@nicain
Copy link
Author

nicain commented Sep 29, 2017

Yeah, my thoughts exactly. I am almost done with this in my fork... numpy might be confusing for those who will see the module name when they look at it. How does a kwarg:

def to_dict(self, tolist=True):

sound?

@shoyer
Copy link
Member

shoyer commented Sep 29, 2017

I'll let others chime in on the API design here but yes that sounds pretty reasonable, too.

@chrisbarber
Copy link

Could have a callable serializer kwarg that defaults to np.ndarray.tolist. I have a use case where I would pass in np.ndarray.tobytes for this. But then again, I could just use numpy=True or tolist=False and then walk the dict myself.

@nicain
Copy link
Author

nicain commented Sep 29, 2017

@chrisbarber Maybe a method to_byte_dict that (in implementation) takes the output from numpy=True and walks the dict with tobytes?

@chrisbarber
Copy link

@nicain, for sure. Probably best for the API's sake to stick to the simplicity of a flag.

@nicain
Copy link
Author

nicain commented Oct 2, 2017

@chrisbarber

Here is my example implementation: https://github.com/nicain/xarray/blob/fa86e3d38ebf4e641cafd963a5b69a77539b931d/xarray/core/dataarray.py#L1388

@shoyer Can you point me to where to edit documentation and unit test best practices?

@shoyer
Copy link
Member

shoyer commented Oct 2, 2017

We need to write a contributing guide!

But in brief:

See also: https://github.com/pydata/xarray/pull/1485/files

@nicain
Copy link
Author

nicain commented Oct 2, 2017

Sweet, this is exactly what I am used to! Ill give a shot to writing a few tests later today, and make a PR. Can you suggest some core devs I can ping for a PR review? Ill ask @chrisbarber (we work together) to review, but I'd like more eyeballs on it.

@shoyer
Copy link
Member

shoyer commented Oct 2, 2017

@nicain myself, @jhamman and/or @MaximilianR are good choices for reviewers, thanks!

@stale
Copy link

stale bot commented Sep 2, 2019

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Sep 2, 2019
@stale stale bot closed this as completed Oct 2, 2019
@dcherian dcherian reopened this Oct 3, 2019
@stale stale bot removed the stale label Oct 3, 2019
@jmccreight
Copy link
Contributor

I'd be interested in reviving this, this is exactly what I want to achieve. It's not clear if there was some reason this never went ahead. I looked around but didnt find anything. LMK if it there's some reason not to pursue it. THanks

@jhamman
Copy link
Member

jhamman commented Apr 7, 2023

@jmccreight - I don't think there is any specific reason this didn't get done. Still open to contributions here if you are interested.

@jmccreight
Copy link
Contributor

jmccreight commented Apr 7, 2023

The PR #7739 is available for review. @jhamman @dcherian would be my choices. i think this is pretty straight forward. I suppose the name of the kwarg being numpy_data is debatable. I based this on the discussion of numpy vs tolist above, preferring the former but acknowledging the comment that a package name as an arg is odd. Could do as_numpy or something slightly different.

@dcherian
Copy link
Contributor

dcherian commented Apr 11, 2023

Perhaps we should have array_to_list: bool instead. If False, we just preserve the underlying array type.

Then the user could do ds.as_numpy().to_dict(array_to_list=False) to always get numpy arrays as #7739

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants