-
-
Notifications
You must be signed in to change notification settings - Fork 689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: accommodate dask_loads for default serializers of dask #1942
ENH: accommodate dask_loads for default serializers of dask #1942
Conversation
@ZhiyLiu thanks! The |
@mrocklin @jakirkham how does this look to you? |
Yes. It passes on my local machine. Looking at the log. There is an error regarding to the compilation of C++ code, as follows:
Not sure this is relevant to the python code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the changes here. Is there a particular issue or traceback somewhere that explains what was breaking before? It might be simpler for ITK to implement standard pickle serialization/deserialization (I thought that it did this already).
try: | ||
import numpy as np | ||
from distributed.protocol import dask_deserialize | ||
from typing import Dict, List | ||
@dask_deserialize.register(NDArrayITKBase) | ||
def deserialize(header: Dict, frames: List[bytes]) -> NDArrayITKBase: | ||
loads = dask_deserialize.dispatch(np.ndarray) | ||
return NDArrayITKBase(loads(header, frames)) | ||
except ImportError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is typical to handle this as follows:
try: | |
import numpy as np | |
from distributed.protocol import dask_deserialize | |
from typing import Dict, List | |
@dask_deserialize.register(NDArrayITKBase) | |
def deserialize(header: Dict, frames: List[bytes]) -> NDArrayITKBase: | |
loads = dask_deserialize.dispatch(np.ndarray) | |
return NDArrayITKBase(loads(header, frames)) | |
except ImportError: | |
pass | |
try: | |
import numpy as np | |
from distributed.protocol import dask_deserialize | |
from typing import Dict, List | |
except ImportError: | |
pass | |
else: | |
@dask_deserialize.register(NDArrayITKBase) | |
def deserialize(header: Dict, frames: List[bytes]) -> NDArrayITKBase: | |
loads = dask_deserialize.dispatch(np.ndarray) | |
return NDArrayITKBase(loads(header, frames)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that registering a deserialization function would have much of an effect without also registering a serialization function.
pickled = pickle.dumps(ndarray_itk_base) | ||
reloaded = pickle.loads(pickled) | ||
equal = (reloaded == ndarray_itk_base).all() | ||
assert equal, 'Different results before and after pickle' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test doesn't engage the code that you've written above. Pickle is the standard python serialization technique. Dask's serialization is different. If you wanted to test it you would try the following:
from distributed.protocol import serialize, deserialize
header, frames = serialize(ndarray_itk_base)
obj = deserialize(header, frames)
assert ...
That should have been addressed in db4a897. The Python wrapping and tests succeeded in the related CI builds, so @ZhiyLiu you may want to rebase on |
@thewtex @mrocklin @jhlegarreta @jakirkham Thanks for your review and nice comments! This commit aims to solves an issue showed in this notebook. Briefly, the final cell in the notebook complains:
if no deserialization method is registered for NDArrayITKBase. Honestly, this kind usage of dask is not quite common. In the notebook I scatter the chunked data; on the contrary, a common scatter will scatter a numpy array. However, there will be no problem if the submitted function
returns a numpy array instead of NDArrayITKBase. This suggests that NDArrayITKBase objects cannot be used like numpy arrays in dask. There are a couple of solutions to this issue, some of which are commented out in the notebook, e.g., This commit is one of these solutions, that needs no special care from a user.Hope this makes sense. P.S. The test data involved can be obtained from OME-TIFF sample data. |
Sorry for the slow reply. Some thoughts on options for approaching efficient serialization with Dask. If one communicates objects that Dask already knows how to work with (like NumPy arrays), then one will get optimized serialization for free. This may involve a little (or maybe no) code in ITK to make sure NumPy arrays are handled reasonable well and handed to Dask when functions return. The benefit here seems useful even outside the Dask context. If one adds pickle protocol 5 support to objects (possibly extending to older Python's with Alternatively one can use Dask's custom serialization by following this example on how to extend to custom types. The usual strategy here is to put these registration bits in its own module. This will soften the Distributed dependency and provide a single point for registering these functions. The latter is useful as one needs to Any of these options seems fine. Possibly it's worth doing multiple or even all of them (this is what we did in RAPIDS rapidsai/cudf#5139 ). Just trying to show what's possible and what it would entail to do 🙂 |
33ffdd6
to
0a30c60
Compare
@ZhiyLiu looking good! @mrocklin @jakirkham thank you for the detailed reviews and pointers! From what I can see, with a squash this is ready to merge.
If I am missing something, please let me know; otherwise, I will merge this on Friday. |
Merged in 27fe268 |
PR Checklist
Refer to the ITK Software Guide for
further development details if necessary.
Dask loads method may complain TypeError: No dispatch for <class 'itkPyBufferPython.NDArrayITKBase'> due to the missing of a deserialization method for NDArrayITKBase. This commit tries to solve this issue by registering a deserialization method in dask for NDArrayITKBase.
Correspondingly, the test case verifies the pickling and unpickling of a NDArrayITKBase instance.