-
-
Notifications
You must be signed in to change notification settings - Fork 725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that serialized data is measured correctly #7593
Ensure that serialized data is measured correctly #7593
Conversation
def test_sizeof_serialize(Wrap, Wrapped): | ||
size = 100_000 | ||
ser_obj = Wrap(b"0" * size) | ||
assert size <= sizeof(ser_obj) < size * 1.05 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allowing 5% overhead is a bit generous but this test doesn't really care if the wrapper would add more metadata to it or if python object sizes changed, etc.
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 26 files ± 0 26 suites ±0 12h 50m 14s ⏱️ + 33m 33s For more details on these failures, see this check. Results for commit 52eee9e. ± Comparison against base commit 700f14a. ♻️ This comment has been updated with latest results. |
Apparently, there is something wrong with the offload TPE. I get some spurious errors
|
distributed/utils.py
Outdated
@@ -1390,7 +1390,6 @@ def is_valid_xml(text): | |||
|
|||
|
|||
_offload_executor = ThreadPoolExecutor(max_workers=1, thread_name_prefix="Dask-Offload") | |||
weakref.finalize(_offload_executor, _offload_executor.shutdown) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just an attempt to fix the error I'm seeing
Two reasons why I believe this should be removed regardless of whether this is a fix or not
- All python versions 3.8+ are ensuring that worker threads are terminating on interpreter shutdown already. They explicitly handle the case of collected executors, interpreter shutdown and instance shutdown identically.
- Judging by the
finalize
docs I'm not even sure if this callback is ever triggered
A finalizer will never invoke its callback during the later part of the interpreter shutdown when module globals are liable to have been replaced by None.
since _offload_executor
is a module global and unless it has been replaced by None, it has no chance of being GCed/finalized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not responsible after all but I still suggest to remove this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Judging by the finalize docs I'm not even sure if this callback is ever triggered
It's interesting, later in the docs it states:
Note It is important to ensure that func, args and kwargs do not own any references to obj, either directly or indirectly, since otherwise obj will never be garbage collected. In particular, func should not be a bound method of obj.
Notably the last part, which I suppose is part of the unneeded bit of this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that note is interesting as well, i.e. this finalize is useless for many reasons
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened #7639
Our code base is riddled with this pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one line makes me very nervous. Everything you wrote makes perfect sense, but just in case you're wrong, could you move it to its own PR so that it's not ending up in the release? It has the potential of leaving workers stuck on shutdown, and it also has the potential of different behaviour on different Python versions and on different OSs, so I believe some thorough testing is in order.
There are non-trivial failures, e.g. |
043b138
to
4fe360d
Compare
7eb755e
to
690ec48
Compare
finally: | ||
threadpool.shutdown() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somehow, this actually shutdown the actual offload threadpool, not just the mock, i.e. nothing in our test suite was using the offloader threadpool after this test ran 🤯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not entirely understand why this is shutting down the actual threadpool but I don't care. I removed the mock and it works now
distributed/tests/test_worker.py
Outdated
async def custom_worker_offload(func, *args): | ||
res = func(*args) | ||
if not istask(args) and istask(res): | ||
in_deserialize.set() | ||
await wait_in_deserialize.wait() | ||
return res | ||
|
||
while CountingThreadPool.counter == 0: | ||
await asyncio.sleep(0) | ||
monkeypatch.setattr("distributed.worker.offload", custom_worker_offload) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test logic is now slightly different but I believe more robust. We don't truly care about the offloading part but rather that there is an await during deserialization. Therefore, I'll only patch the offload method. To ensure that we're truly in the task spec deserialization I put in the istask
guard above
I think test failures are unrelated. @crusaderky care for another look? |
FWIW benchmark results are available here https://github.com/coiled/coiled-runtime/actions/runs/4384325501 but they don't look particularly interesting. I don't think our tests are very sensitive to this, yet. See also coiled/benchmarks#696 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
distributed/utils.py
Outdated
@@ -1390,7 +1390,6 @@ def is_valid_xml(text): | |||
|
|||
|
|||
_offload_executor = ThreadPoolExecutor(max_workers=1, thread_name_prefix="Dask-Offload") | |||
weakref.finalize(_offload_executor, _offload_executor.shutdown) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one line makes me very nervous. Everything you wrote makes perfect sense, but just in case you're wrong, could you move it to its own PR so that it's not ending up in the release? It has the potential of leaving workers stuck on shutdown, and it also has the potential of different behaviour on different Python versions and on different OSs, so I believe some thorough testing is in order.
@crusaderky next time, feel free to just push these minor changes |
I've started an A/B test with uncompressible as well as compressible data, based on coiled/benchmarks#696: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix @fjetter and review @crusaderky
Test failures seem unrelated, but the distributed/shuffle/tests/test_shuffle.py::test_new_worker
failure in this build looks interesting. cc @hendrikmakait for visibility
I ran the A/B tests and I'm observing a very modest (5%), but consistent speedup in |
Well, it's an easy win regardless. Even if it's not boosting performance significantly, having a more healthy event loop is a win already for stability. |
I have produced a meaningful A/B test, running on data that is 42% compressible and takes a substantial amount of time to compress.
It's important to note that, before this PR, test_anom_mean and test_double_diff were respectively 50% and 30% slower on compressible data than they were on uncompressible data, and test_vorticity is 67% slower on compressible data. I'll open a separate issue to discuss these findings. |
Closes #7589
TODO: Tests are still missing / not done, yet