Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handle cudf.pandas in from_pandas_edgelist #4525

Merged
merged 1 commit into from
Jul 8, 2024

Conversation

eriknw
Copy link
Contributor

@eriknw eriknw commented Jul 8, 2024

Optimistically use cupy, but fall back to numpy if necessary.

Also, bump lint versions.

CC @rlratzel

Optimistically use cupy, but fall back to numpy if necessary
@eriknw eriknw requested a review from a team as a code owner July 8, 2024 17:21
@github-actions github-actions bot added the python label Jul 8, 2024
Comment on lines 39 to 40
src_array = df[source].to_numpy()
dst_array = df[target].to_numpy()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy=False is currently the default here, but should we add copy=False to be clear about potential data movement? I wonder whether pandas will eventually follow numpy 2 semantics and use copy=None to only copy if necessary and copy=False to raise if a copy is necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add copy=False to be clear about potential data movement?

That seems like a good idea to me, unless there's a reason you think why it might be better not to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eriknw eriknw added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change nx-cugraph labels Jul 8, 2024
Copy link
Contributor

@rlratzel rlratzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment on lines 39 to 40
src_array = df[source].to_numpy()
dst_array = df[target].to_numpy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add copy=False to be clear about potential data movement?

That seems like a good idea to me, unless there's a reason you think why it might be better not to.

@rlratzel
Copy link
Contributor

rlratzel commented Jul 8, 2024

/merge

@rapids-bot rapids-bot bot merged commit 355efc1 into rapidsai:branch-24.08 Jul 8, 2024
131 checks passed
Comment on lines 68 to 69
src_indices = cp.array(src_array)
dst_indices = cp.array(dst_array)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this may perform an extra/unnecessary copy if cp.asarray above performed a copy. Maybe we should figure out best practice for determining whether a copy was made or not and avoid do extra copies.

rapids-bot bot pushed a commit that referenced this pull request Jul 30, 2024
This continues #4525 (and [this comment](#4525 (comment))) to avoid copies and to be more optimal whether using pandas, cudf, or cudf.pandas. Notably, using `s.to_numpy` with cudf will return a _numpy_ array, but `cudf.pandas` may return a _cupy_ array (proxy).

Also, `s.to_numpy(copy=False)` ([from comment](#4525 (comment))) is not used, b/c cudf's `to_numpy` raises if `copy=False`. We get the behavior we want by not specifying `copy=`.

I don't know if this is the best way to determine whether a copy occurred or not, but this seems like a useful pattern to establish, because we want to make ingest more efficient.

CC @rlratzel

Authors:
  - Erik Welch (https://github.com/eriknw)
  - Ralph Liu (https://github.com/nv-rliu)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)

URL: #4528
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change nx-cugraph python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants