-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deserialise data on demand #5900
Comments
Overall this seems like a good objective and valuable thing to do. Other things that are nice about keeping data serialized longer:
One concern about the proposed design:
This sounds like the on-disk bytes will be read fully into memory before sending them. I would really like to see a design where we can stream data from disk directly to the network without copying the entire thing into memory (via This is a big problem, because if the sender is already approaching its limit, it currently has two bad options:
A combination of these two is what we currently do when paused (limit to one outgoing transfer at a time, and un-spilling that key can still maybe kill the worker). But if sending spilled data didn't use any extra RAM, we wouldn't have to choose between bad options. We could just send the data, basically for free. This is why I don't like the idea of
Maybe I recognize that comms may also require an additional streaming/file-like interface (and/or a per-comm A less-common corollary to think about is whether we should be able to receive data straight to disk, without holding the full pickled bytes in |
As discussed in #4424, I don't think an async setitem is necessary. Its only benefit would be to be able to block a task from transitioning from state=running to state=memory when the disk writes cannot keep up with the task completion rate, which is something that pause() already deals with using a, IMHO, more holistic approach than a simple fixed-size queue of writes, since it also considers unmanaged memory. |
Sorry Idk to what extent this is off topic given the discussion above, but happy to move to a new issue if we'd like to pursue further. Though will just say this small bit. If we do go down the road of sending files between workers (say as a way of sending spilled data between workers), as of Python 3.8 zero-copy operations are used to transfer files (when available). IOW the file data would not move through userspace, but would be handled directly by the kernel. Thus saving the Worker the memory cost of the transfer. |
New insights here #7351 (comment) suggest that this proposal may offer poor cost/benefit ratio. Note that the data was generated on a cluster of 5 workers and we should rerun the test on a much larget scale to confirm it. |
Issue
As of today, data is eagerly deserialised as soon as it is un-spilled and as soon as it is received from another worker.
This has the benefit that we know that all data in Worker.data.fast is already deserialised when we need it and that Worker.data can be treated as an opaque MutableMapping.
However, this design simplicity comes at a very steep cost:
Proposed design
Whenever a Worker acquires data in any way other than task completion, meaning
rebalance()
,replicate()
, orscatter(broadcast=True)
(soon to be reimplemented on top of the AMM)it should not unpickle the data immediately. Instead, it should keep it pickled. Worker.data.fast will contain a mix of pickled and unpickled data.
Whenever a task compute() starts, it will receive a mix of pickled and unpickled inputs. It will unpickle the pickled inputs and put them back into data.fast, replacing the pickled version.
Or if you prefer: the current two mappings
data.slow
anddata.fast
should be replaced bydata.slow
,data.fast_pickled
anddata.fast_unpickled
; compute() will always read and write fromdata.fast_unpickled
, which will internally read and delete from the other two; whereas network receive and send will always read and write fromdata.fast_pickled
which will internally read and delete on the other mappings. This also carries the benefit that all pickle/unpickle work, be it due to network activity or spilling/unspilling, can now be encapsulated in a single module.Challenges
The text was updated successfully, but these errors were encountered: