Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert cudf.Scalar usage to pylibcudf and pyarrow usage #17686

Merged
merged 12 commits into from
Jan 8, 2025

Conversation

mroeschke
Copy link
Contributor

@mroeschke mroeschke commented Jan 7, 2025

Description

A lot of cudf.Scalar usage is to eventually end up with a device scalar object (pylibcudf.Scalar) to pass to a pylibcudf routine. The conversion logic to get there can be achieved by converting to a pyarrow.Scalar and using pylibcudf.interop.from_arrow. This way we offload a lot of scalar-conversion-logic in cudf.Scalar to pyarrow.Scalar which can further be converted using the interop method.

This PR just tackles some straightforward cases of the above. Another PR will tackle the more involved cases

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke added Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 7, 2025
@mroeschke mroeschke self-assigned this Jan 7, 2025
@mroeschke mroeschke requested a review from a team as a code owner January 7, 2025 02:20
@mroeschke mroeschke requested review from wence- and Matt711 January 7, 2025 02:20
Copy link
Contributor

@Matt711 Matt711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some questions/suggestions, but I'm approving because overall the changes look good to me. And you said you plan to follow-up this PR to handle more scalar conversion cases.

@@ -36,7 +37,7 @@ def gather(

@acquire_spill_lock()
def scatter(
sources: list[ColumnBase | cudf.Scalar],
sources: list[ColumnBase | ScalarLike],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? I think ScalarLike is an alias for Any.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it remains ScalarLike, should we handle the different scalars we can get?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I changed this when trying to going down the rabbit hole of using pylibcudf.Scalar here but it became more involved that I had thought. This was a leftover I forgot to revert

@@ -3255,7 +3255,7 @@ def duplicated(
)
distinct = libcudf.column.Column.from_pylibcudf(plc_column)
result = copying.scatter(
[cudf.Scalar(False, dtype=bool)],
[cudf.Scalar(False)],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can make this plc.Scalar. And then fix scatter's implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I'll plan to tackle this in a follow up

@mroeschke
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 2c385c4 into rapidsai:branch-25.02 Jan 8, 2025
106 checks passed
@mroeschke mroeschke deleted the ref/cudf_scalar branch January 8, 2025 21:00
rapids-bot bot pushed a commit that referenced this pull request Jan 16, 2025
I realized in #17686 that this refactoring might have sidestepped the caching benefits in the prior `cudf.Scalar` usage. This PR introduces `pa_scalar_to_plc_scalar` which caches the pyarrow to pylibcudf conversion in that PR.

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - Matthew Murray (https://github.com/Matt711)

URL: #17707
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants