Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support out-of-band buffers in Python pickling #5132

Merged
merged 4 commits into from
May 9, 2020

Conversation

jakirkham
Copy link
Member

When Python pickle's protocol 5 or greater is used, this change will support more efficient serialization of out-of-band buffers. This is analogous to Dask's custom serialization except for pickling. As such this is helpful in any Python serialization case where pickling is used. If an older pickling protocol is used, we simply proceed as before.

jakirkham added 4 commits May 7, 2020 13:31
This lets us get access to the `protocol` argument, which can be useful
if we want to take advantage of newer pickling protocols.
In Pickle's protocol 5, out-of-band buffers are supported, which avoids
unnecessary copies when serializing data. In other words, this similar
to Dask's custom serialization except for pickling and can be supported
by any library that can use this Python standard. The only requirement
is we wrap any bytes-like objects in `PickleBuffer`s in our
`__reduce_ex__` method, which is what we do here. If an older Pickle
protocol is in use, we simply skip this path and go about pickling the
NumPy array as we would have otherwise.
@jakirkham jakirkham requested a review from a team as a code owner May 7, 2020 20:35
Copy link
Collaborator

@kkraus14 kkraus14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @shwina as this may have implications for pack / unpack

@kkraus14 kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 4 - Needs cuDF (Python) Reviewer labels May 7, 2020
@jakirkham
Copy link
Member Author

Are we considering host buffers with pack/unpack?

@kkraus14
Copy link
Collaborator

kkraus14 commented May 7, 2020

Are we considering host buffers with pack/unpack?

I think we'd want to have pack / unpack in the path for pickling a dataframe efficiently for example.

@jakirkham
Copy link
Member Author

Sure that makes sense. I have another idea on how we might do that, but it's probably a different PR. Can add a draft PR for us to look at if it's of interest.

@codecov
Copy link

codecov bot commented May 7, 2020

Codecov Report

Merging #5132 into branch-0.14 will decrease coverage by 0.03%.
The diff coverage is 86.86%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.14    #5132      +/-   ##
===============================================
- Coverage        88.47%   88.44%   -0.04%     
===============================================
  Files               54       55       +1     
  Lines            10276    10405     +129     
===============================================
+ Hits              9092     9203     +111     
- Misses            1184     1202      +18     
Impacted Files Coverage Δ
python/cudf/cudf/core/buffer.py 82.35% <80.00%> (-0.58%) ⬇️
python/dask_cudf/dask_cudf/io/opt_parquet.py 85.34% <85.34%> (ø)
...ython/dask_cudf/dask_cudf/io/tests/test_parquet.py 100.00% <100.00%> (ø)
python/cudf/cudf/core/column/string.py 84.97% <0.00%> (-0.14%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 17ad431...1e2d39a. Read the comment docs.

@jakirkham
Copy link
Member Author

Sure that makes sense. I have another idea on how we might do that, but it's probably a different PR. Can add a draft PR for us to look at if it's of interest.

Went ahead and placed this in PR ( #5139 ) for discussion.

@kkraus14
Copy link
Collaborator

kkraus14 commented May 8, 2020

rerun tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants