-
Notifications
You must be signed in to change notification settings - Fork 930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support out-of-band buffers in Python pickling #5132
Support out-of-band buffers in Python pickling #5132
Conversation
This lets us get access to the `protocol` argument, which can be useful if we want to take advantage of newer pickling protocols.
In Pickle's protocol 5, out-of-band buffers are supported, which avoids unnecessary copies when serializing data. In other words, this similar to Dask's custom serialization except for pickling and can be supported by any library that can use this Python standard. The only requirement is we wrap any bytes-like objects in `PickleBuffer`s in our `__reduce_ex__` method, which is what we do here. If an older Pickle protocol is in use, we simply skip this path and go about pickling the NumPy array as we would have otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @shwina as this may have implications for pack / unpack
Are we considering host buffers with pack/unpack? |
I think we'd want to have |
Sure that makes sense. I have another idea on how we might do that, but it's probably a different PR. Can add a draft PR for us to look at if it's of interest. |
Codecov Report
@@ Coverage Diff @@
## branch-0.14 #5132 +/- ##
===============================================
- Coverage 88.47% 88.44% -0.04%
===============================================
Files 54 55 +1
Lines 10276 10405 +129
===============================================
+ Hits 9092 9203 +111
- Misses 1184 1202 +18
Continue to review full report at Codecov.
|
Went ahead and placed this in PR ( #5139 ) for discussion. |
rerun tests |
When Python pickle's protocol 5 or greater is used, this change will support more efficient serialization of out-of-band buffers. This is analogous to Dask's custom serialization except for pickling. As such this is helpful in any Python serialization case where pickling is used. If an older pickling protocol is used, we simply proceed as before.