-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dask gdf fixes #262
Dask gdf fixes #262
Conversation
(sorry for the hack, this was useful for std/var in dask-gdf)
These don't do anything currently, and are silently ignored
Can one of the admins verify this patch? |
LGTM |
if other == 2: | ||
return self * self | ||
else: | ||
return NotImplemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is janky. Is there a GPU implementation somewhere that we can use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrocklin This is a WIP in rapidsai/libgdf#94
add to whitelist |
84b14c4
to
d1bc16e
Compare
pygdf/series.py
Outdated
data.columns = ['x'] | ||
data = DataFrame.from_pandas(data) | ||
data = data['x'] | ||
data.name = name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a hacky workaround where we should address the issue(s) instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. I'll dive through and figure out how to construct a Series object manually in the morning. Any pointers would be welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently a pygdf.Series can be constructed from a pandas.Series just via normal __init__
which ends up hitting this clause: https://github.com/gpuopenanalytics/pygdf/blob/master/pygdf/columnops.py#L183
So the Pandas Series is casted to a numpy array. I think you could add a check for if it's an instance of a Pandas Series and set the name if it is and then call the columnops.as_column
function.
We were listing all locals in the file, so it doesn't seem to accomplish much. It does however require people editing this file to make two changes rather than one.
Changes look good to me thus far, some of the binaryops will require a bit of refactoring once new binaryops land in libgdf, but that's to be expected. |
If this seems mergable to you without added burden then I would not be against merging it. I think that most of this is defensible regardless of what happens. |
This PR includes a few small fixes that were useful in getting dask-gdf to inherit nicely from dask.dataframe. In doing so we found that the PyGDF code needed to match pandas slightly more in a few cases.
This is a work in progress. I would not merge this. So far it's up for discussion only.