Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCP simplify receiving frames in comms #3599

Closed
wants to merge 10 commits into from

Conversation

jakirkham
Copy link
Member

Simplifies the logic for receiving frames in tcp and ucx. Drops some legacy code on the tcp side. Also preallocates buffers to receive data into beforehand (and subselects the non-0-length buffers). Then hands off control to the underlying library receiving data to fill these buffers. Afterwards handles any sanity checks.

cc @quasiben @pentschev

@jakirkham jakirkham force-pushed the simp_write_recv_ops branch 7 times, most recently from 1820f98 to d69511e Compare March 20, 2020 06:44
Should make it easier to disambiguate things like `frame` and `frames`
as they are now `each_frame` and `frames`.
As of Tornado 5.0+, this feature is included in Tornado. The
`.read_bytes(...)` code path is a holdover from before Tornado got this
feature. Not to mention it is more performant than the
`.read_bytes(...)` code path. Given this, drop the `.read_bytes(...)`
case and always use `.read_into(...)`.
@jakirkham jakirkham force-pushed the simp_write_recv_ops branch from ad5a99b to 3ff1ce2 Compare March 20, 2020 06:53
@pentschev
Copy link
Member

LGTM @jakirkham , thanks for the work! I did some testing (not very heavy though) and it doesn't seem to create any issues. On the other hand it also doesn't seem to improve performance, but I'm guessing we will need some larger workflows to be tested to see that, which I didn't do.

@quasiben
Copy link
Member

Will this PR supersede #3453 ? If not, is it a step in that direction ?

This looks good to be. This PR also makes me really appreciate CI due to the tcp changes I am not as familiar with.

It would be good to get comments from @mrocklin or @TomAugspurger before merging in

@jakirkham
Copy link
Member Author

It's a preliminary step in that direction.

Agreed that's why I figured we could try the same thing in tcp and ucx cases.

Yeah if Matt and Tom have time to review, that would be welcome 🙂

FWIW I also tried to combining the async recv calls into an asyncio.gather call (instead of having a for-loop). However it seems in the tcp case Tornado isn't able to handle overlapping receives, so we see some test failures (maybe other people know how to make this work? 😉). In either case we may still consider it for the ucx case.

@jakirkham
Copy link
Member Author

cc @jrbourbeau (in case you have thoughts here as well 😉)

@jakirkham jakirkham force-pushed the simp_write_recv_ops branch from 3ff1ce2 to cbe0bf1 Compare March 20, 2020 16:54
@jakirkham
Copy link
Member Author

FWIW the UCX tests pass for me locally except for issue ( rapidsai/ucx-py#461 ), which I run into with or without these changes. The latter may be a hardware configuration though.

@mrocklin
Copy link
Member

FWIW I also tried to combining the async recv calls into an asyncio.gather call (instead of having a for-loop). However it seems in the tcp case Tornado isn't able to handle overlapping receives, so we see some test failures (maybe other people know how to make this work? wink). In either case we may still consider it for the ucx case

Yeah, this is actually something that we want to handle sequentially. We want to get one frame, and then only once that's done get the next one. Otherwise we have lots of coroutines all trying to pull bytes off a wire at the same time, which is unwise.

for each_frame in recv_frames:
each_length = len(each_frame)
n = await stream.read_into(each_frame)
assert n == each_length, (n, each_length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this change has a semantic difference in that previously we used to include empty frames but now we don't. Is this correct?

Is there a specific reason for this change? Does it affect performance in some way, or is it strictly cosmetic? If it's strictly cosmetic, and if there is a change to semantics (however minor) I'm tempted to avoid the change just because I wouldn't be surprised if it has some unforeseen effect.

Copy link
Member Author

@jakirkham jakirkham Mar 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not true actually. We excluded them before as well. It's just now done outside of the for-loop. 🙂

The idea was to more-or-less pass control to Tornado or UCX-Py completely while receiving frames without all of the boilerplate in Dask between each receive. Tried asyncio.gather to push this even further, but Tornado doesn't seem to like that (seems to work with UCX-Py though 😉).

In any event I thought it would be a nice thing to do the same thing we are doing in UCX for TCP. At the end of the day, I don't have strong feelings about it (especially if there is push back). 🤷‍♂

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any other thoughts here, @mrocklin? 🙂

@jakirkham jakirkham changed the title Simplify receiving frames in comms TCP simplify receiving frames in comms Mar 27, 2020
@jakirkham
Copy link
Member Author

We've broken out the UCX part and integrated it separately with PR ( #3651 ). This is now the change for TCP only. Would be good to discuss again when you have time Matt, but it's fine to sit in the interim.

@mrocklin
Copy link
Member

mrocklin commented Mar 27, 2020 via email

@jakirkham
Copy link
Member Author

Other changes have gone in that more-or-less handle what this was set out to do. Closing...

@jakirkham jakirkham closed this May 7, 2020
@jakirkham jakirkham deleted the simp_write_recv_ops branch May 7, 2020 04:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants