-
-
Notifications
You must be signed in to change notification settings - Fork 723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable compression for fast comms #7768
Conversation
crusaderky
commented
Apr 11, 2023
•
edited
Loading
edited
- Closes Compression slows down network comms #7655
- Disable compression by default for all network comms
- Disable compression (hardcoded) for network comms on localhost, regardless of settings
- Separate compression settings for spilling
- Reduce the threshold for compression from 0.9 to 0.7 (a buffer won't be compressed if it doesn't shrink down to at least 70% of its original size, even if compression is enabled).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one suggestion, but otherwise looks great. Would be handy to remove the optional compression codecs for cramjam later 😜
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 26 files + 1 26 suites +1 15h 47m 47s ⏱️ + 1h 55m 3s For more details on these failures and errors, see this check. Results for commit 0e6f7f0. ± Comparison against base commit 57ae3e7. This pull request removes 28 and adds 48 tests. Note that renamed tests count towards both.
♻️ This comment has been updated with latest results. |
f4fe2ea
to
6c30e8d
Compare
11b8ca7
to
56fb8c4
Compare
1b9d498
to
0a71058
Compare
The current state of this PR is that it is done, with one major flaw: At the moment I cannot find a decently-looking fix for it. |
Tightly related: #7774 |
Can you elaborate? I haven't tested anything for this but when gathering over the scheduler, the data is passing two conncetions
You say 2.) is not compressed in this case. Why is that? |
Even if this was true, the data submitted between workers should be typically orders of magnitude more than the data submitted to the client. My guess is this should typically still be a net positive effect on wall time |
Because the data does not get deserialized and then serialized again when it goes through the scheduler. So it's serialized and compressed once, using the settings established during the worker<->scheduler comms, and then does not change afterwards. |
Correct, it is not being deserialized (thankfully) but the serialization and compression is decoupled to a certain degree. Currently it is hard coded that we will not reconsider compression on a IIUC this is controlled in this section here distributed/distributed/protocol/core.py Lines 57 to 67 in 76bbfaf
And we may just need to call |
min_size: int = 10_000, | ||
sample_size: int = 10_000, | ||
nsamples: int = 5, | ||
min_ratio: float = 0.7, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Down from 0.9
@@ -0,0 +1,52 @@ | |||
from __future__ import annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to move all compression-specific tests from test_protocol.py to here.
I'll do it in a successive PR in order to keep code review simple.
After async conversation, I removed all the complex and conceptually flawed machinery around distingushing client vs. worker vs. scheduler, and retained only a separate switch for network vs. spill. This PR is now ready for review. A/B testsAll tests are vs. mainline git tip (lz4 used for both network and spill) This PR (uncompressed network, lz4 on spill) delivers as expected, with a boost in network performance and no degradation in spill performance. I also ran A/B for other configurations, for completeness' sake:
|
I also A/B tested a very early prototype of integration with cramjam-lz4, which shows a very mild speedup vs. regular lz4. This is not so surprising, as I did not implement the extra machinery for Protype here: https://github.com/crusaderky/distributed/tree/cramjam |
Awesome, and interesting about cramjam, thanks for taking the time. :)
Hope this shouldn't be too awful, Suppose one could keep around a self.buf = cramjam.Buffer()
...
self.buf.seek(0)
nbytes = cramjam.lz4.compress_into(data, self.buf)
self.buf.seek(0)
return self.buf.read(nbytes) So it'd grow to the max allocation, but kept around to avoid any additional allocations. |
@milesgranger I was actually talking about decompression only. I didn't really plan to use compress_into. Wouldn't we need one buffer per thread calling cramjam? I wouldn't be terribly happy with an extra 90 MiB[1] RAM usage per worker. [1] 64 MiB ( If you just call compress(), does cramjam need to enlarge the buffer multiple times? If so, is there a way to tell it to create an oversized buffer straight away and then shrink it once at the end instead? |
All |
Alternatively, you can use |
what's missing here? |
just the review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked through as best I could, and seems good to me. Albeit I could very well have missed any subtle issues.
I noticed that the EDIT: Looks like some dask config parsing / validation is failing |
@jrbourbeau I'll have a look at it |
👍 thanks @crusaderky! Unfortunately I've not been able to reproduce locally yet... |
I encountered similar problems locally after we switched to pyproject.toml |
Noting that the docs build has magically fixed itself ✨ |
Spill to disk compression was introduced in dask/distributed#7768 and Dask-CUDA should also allow modifying the default compression via Dask config. This change is required to support `distributed>=2023.5.0`. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #1190