Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Change ways to access ptr in Buffer #12587

Merged
merged 29 commits into from
Jan 26, 2023

Conversation

galipremsagar
Copy link
Contributor

@galipremsagar galipremsagar commented Jan 20, 2023

Description

This PR:
With the introduction of copy-on-write(#11718), and spilling, there are different ways each Buffer implementation expects the code to behave when a .ptr is accessed, i.e., in case of COW, a ptr can be accessed for read or write purposes, in case of spilling the ptr can be accessed with a spill lock and the buffer will be spillable after the spill lock goes out of scope during execution. For these reasons we introduced ptr, mutable_ptr and data_array_view, _data_array_view, mask_array_view, and _mask_array_view. With so many ways to access buffer ptr and array views, this has become quite difficult for devs to know when to use what unless you are fully familiar with the implementation details. It will also lead us to a lot of special case handling for Buffer, SpillableBuffer, and CopyonWriteBuffer. For this reason, we have decided to simplify fetching the pointer with a get_ptr(mode="read"/"write") API, fetching data & mask array views will also become methods that accept mode like data_array_view(mode="read"/"write") & mask_array_view(mode="read"/"write"). It is the expectation that when the caller passed "read", they don't tamper with the buffer or the memory it is pointing to. In the case of "write", they are good to mutate the memory it is pointing to.

Note that even with mode="read"/"write" the caller should still, if appropriate, acquire a spill lock for the duration of the access. If this is not done, and the buffer is a SpillableBuffer, it will permanently be marked as unspillable.

  • Introduces get_ptr() to replace ptr property.
  • Replaces data_array_view & mask_array_view methods with data_array_view(mode=r/w) & mask_array_view(mode=r/w)

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@galipremsagar galipremsagar added 3 - Ready for Review Ready for review by team Python Affects Python cuDF API. 4 - Needs cuDF (Python) Reviewer labels Jan 20, 2023
@galipremsagar galipremsagar requested a review from a team as a code owner January 20, 2023 14:08
@galipremsagar galipremsagar self-assigned this Jan 20, 2023
@galipremsagar galipremsagar requested review from wence- and vyasr January 20, 2023 14:08
@galipremsagar galipremsagar added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change breaking Breaking change and removed non-breaking Non-breaking change labels Jan 20, 2023
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small queries.

python/cudf/cudf/_lib/column.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/column.pyx Show resolved Hide resolved
python/cudf/cudf/_lib/column.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/copying.pyx Show resolved Hide resolved
python/cudf/cudf/core/buffer/buffer.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/column/categorical.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/column/numerical.py Show resolved Hide resolved
python/cudf/cudf/core/df_protocol.py Show resolved Hide resolved
python/cudf/cudf/core/frame.py Show resolved Hide resolved
python/cudf/cudf/utils/queryutils.py Outdated Show resolved Hide resolved
Co-authored-by: Lawrence Mitchell <[email protected]>
@codecov
Copy link

codecov bot commented Jan 20, 2023

Codecov Report

Base: 86.58% // Head: 85.72% // Decreases project coverage by -0.86% ⚠️

Coverage data is based on head (810d921) compared to base (b6dccb3).
Patch has no changes to coverable lines.

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-23.02   #12587      +/-   ##
================================================
- Coverage         86.58%   85.72%   -0.86%     
================================================
  Files               155      155              
  Lines             24368    24911     +543     
================================================
+ Hits              21098    21356     +258     
- Misses             3270     3555     +285     
Impacted Files Coverage Δ
python/cudf/cudf/_version.py 1.41% <0.00%> (-98.59%) ⬇️
python/cudf/cudf/core/buffer/spill_manager.py 72.50% <0.00%> (-7.50%) ⬇️
python/cudf/cudf/utils/dtypes.py 77.85% <0.00%> (-1.61%) ⬇️
python/cudf/cudf/options.py 86.11% <0.00%> (-1.59%) ⬇️
python/cudf/cudf/core/single_column_frame.py 94.30% <0.00%> (-1.27%) ⬇️
python/cudf/cudf/io/json.py 91.04% <0.00%> (-1.02%) ⬇️
...ython/custreamz/custreamz/tests/test_dataframes.py 98.38% <0.00%> (-1.01%) ⬇️
python/dask_cudf/dask_cudf/io/csv.py 96.34% <0.00%> (-1.00%) ⬇️
python/dask_cudf/dask_cudf/io/parquet.py 91.81% <0.00%> (-0.59%) ⬇️
python/cudf/cudf/core/column/string.py 88.00% <0.00%> (-0.53%) ⬇️
... and 51 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@galipremsagar galipremsagar requested a review from wence- January 20, 2023 18:07
@galipremsagar
Copy link
Contributor Author

Some small queries.

Addressed 👍 This PR should be ready for another round of reviews.

madsbk
madsbk previously requested changes Jan 23, 2023
Copy link
Member

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good but I don't think we need the "internal_write" mode.

python/cudf/cudf/core/buffer/buffer.py Show resolved Hide resolved
python/cudf/cudf/core/buffer/spillable_buffer.py Outdated Show resolved Hide resolved
docs/cudf/source/developer_guide/library_design.md Outdated Show resolved Hide resolved
docs/cudf/source/developer_guide/library_design.md Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/column.pyx Outdated Show resolved Hide resolved
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting there, thanks! Some more minor things I think.

python/cudf/cudf/core/buffer/buffer.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/buffer/buffer.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/column/categorical.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/column/column.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/column/column.py Show resolved Hide resolved
python/cudf/cudf/core/series.py Show resolved Hide resolved
python/cudf/cudf/core/window/rolling.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/window/rolling.py Show resolved Hide resolved
python/cudf/cudf/utils/applyutils.py Show resolved Hide resolved
python/strings_udf/strings_udf/_typing.py Show resolved Hide resolved
@galipremsagar galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer labels Jan 25, 2023
"shape": (self.size,),
"strides": None,
"typestr": "|u1",
"version": 0,
}

@property
def _readonly_proxy_cai_obj(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I don't think we should add extra property here given that it's used only in one other function (do we expect more uses for it?)

If not, I think we can just use cuda_array_interface_wrapper in those places instead.

Copy link
Contributor Author

@galipremsagar galipremsagar Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is only being used in data_array_view & mask_array_view, and one more place for copy-on-write scenarios: https://github.com/rapidsai/cudf/pull/12556/files#diff-e23cdfbd2c7e4be6f1d08ea3cec7e075e42851641068fe318985775f6b9efd99R214.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK -- let's leave as is now, and we can take stock in the merge of copy-on-write.

@galipremsagar galipremsagar removed the 5 - Ready to Merge Testing and reviews complete, ready to merge label Jan 25, 2023
Comment on lines 72 to 73
with acquire_spill_lock():
kernel.forall(len(data))(str_views, output)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid setting this precedent; tests really shouldn't have to worry about spilling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline and removed this lock to keep things consistent where we are explicitly not testing for spilling ability.

@galipremsagar galipremsagar added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Jan 25, 2023
@galipremsagar galipremsagar dismissed madsbk’s stale review January 26, 2023 03:50

Pending reviews have been addressed, dismissing to proceed with merging.

@galipremsagar
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit f7d434d into rapidsai:branch-23.02 Jan 26, 2023
rapids-bot bot pushed a commit to rapidsai/cuspatial that referenced this pull request Jan 26, 2023
This PR adapts to the breaking changes being introduced in: rapidsai/cudf#12587

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Mark Harris (https://github.com/harrism)
  - Lawrence Mitchell (https://github.com/wence-)
  - Michael Wang (https://github.com/isVoid)

URL: #871
ajschmidt8 pushed a commit to rapidsai/cuml that referenced this pull request Jan 26, 2023
This PR adapts to the breaking changes being introduced in: rapidsai/cudf#12587

Authors:
   - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
   - Lawrence Mitchell (https://github.com/wence-)
   - AJ Schmidt (https://github.com/ajschmidt8)
   - Dante Gama Dessavre (https://github.com/dantegd)
pentschev added a commit to pentschev/dask-cuda that referenced this pull request Jan 27, 2023
Fix test that reads directly from `cudf.Buffer` pointer to new
`get_ptr(mode="read")`, in accordance with changes from
rapidsai/cudf#12587 .
rapids-bot bot pushed a commit to rapidsai/dask-cuda that referenced this pull request Jan 27, 2023
Fix test that reads directly from `cudf.Buffer` pointer to new `get_ptr(mode="read")`, in accordance with changes from rapidsai/cudf#12587 .

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Lawrence Mitchell (https://github.com/wence-)

URL: #1094
jakirkham pushed a commit to jakirkham/cuml that referenced this pull request Feb 27, 2023
This PR adapts to the breaking changes being introduced in: rapidsai/cudf#12587

Authors:
   - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
   - Lawrence Mitchell (https://github.com/wence-)
   - AJ Schmidt (https://github.com/ajschmidt8)
   - Dante Gama Dessavre (https://github.com/dantegd)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge breaking Breaking change improvement Improvement / enhancement to an existing function Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants