Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dask gdf fixes #262

Merged
merged 18 commits into from
Sep 27, 2018
Merged

Dask gdf fixes #262

merged 18 commits into from
Sep 27, 2018

Conversation

mrocklin
Copy link
Collaborator

This PR includes a few small fixes that were useful in getting dask-gdf to inherit nicely from dask.dataframe. In doing so we found that the PyGDF code needed to match pandas slightly more in a few cases.

This is a work in progress. I would not merge this. So far it's up for discussion only.

(sorry for the hack, this was useful for std/var in dask-gdf)
These don't do anything currently, and are silently ignored
@GPUtester
Copy link
Collaborator

Can one of the admins verify this patch?

@mrocklin
Copy link
Collaborator Author

@scopatz
Copy link
Contributor

scopatz commented Sep 26, 2018

LGTM

if other == 2:
return self * self
else:
return NotImplemented
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is janky. Is there a GPU implementation somewhere that we can use?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrocklin This is a WIP in rapidsai/libgdf#94

@kkraus14 kkraus14 changed the title Dask gdf fixes [WIP] Dask gdf fixes Sep 26, 2018
@kkraus14 kkraus14 added 2 - In Progress Currently a work in progress proposal Change current process or code labels Sep 26, 2018
@kkraus14
Copy link
Collaborator

add to whitelist

pygdf/series.py Outdated
data.columns = ['x']
data = DataFrame.from_pandas(data)
data = data['x']
data.name = name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a hacky workaround where we should address the issue(s) instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. I'll dive through and figure out how to construct a Series object manually in the morning. Any pointers would be welcome.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently a pygdf.Series can be constructed from a pandas.Series just via normal __init__ which ends up hitting this clause: https://github.com/gpuopenanalytics/pygdf/blob/master/pygdf/columnops.py#L183

So the Pandas Series is casted to a numpy array. I think you could add a check for if it's an instance of a Pandas Series and set the name if it is and then call the columnops.as_column function.

@kkraus14
Copy link
Collaborator

Changes look good to me thus far, some of the binaryops will require a bit of refactoring once new binaryops land in libgdf, but that's to be expected.

@mrocklin mrocklin changed the title [WIP] Dask gdf fixes Dask gdf fixes Sep 27, 2018
@mrocklin
Copy link
Collaborator Author

Changes look good to me thus far, some of the binaryops will require a bit of refactoring once new binaryops land in libgdf, but that's to be expected.

If this seems mergable to you without added burden then I would not be against merging it. I think that most of this is defensible regardless of what happens.

@kkraus14 kkraus14 merged commit 3775443 into rapidsai:master Sep 27, 2018
@mrocklin mrocklin deleted the dask-gdf-fixes branch September 27, 2018 21:56
@kkraus14 kkraus14 mentioned this pull request Oct 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress proposal Change current process or code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants