Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sock): Use the updated cancellation cb interface. #1940

Merged
merged 3 commits into from
Oct 23, 2023

Conversation

royjacobson
Copy link
Contributor

Fixes #1939

}

ConnectionFlow(peer);

if (break_poll_id_ != UINT32_MAX) {
socket_->CancelPoll(break_poll_id_);
if (breaker_cb_) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you do this, you must make sure CancelOnErrorCb() can be called multiple times. we had some issues with shutting down a connection and data race around this codepath. you can see I used to reset break_poll_id_ to avoid calling CancelPoll if OnBreakCb was called first.

Copy link
Collaborator

@romange romange Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we must ensure that OnBreakCb won't be called once cc_.reset(); is called. Currently, we have such cases but I am not sure why

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the race with cc_.reset(); - how does it work with io_uring?

@royjacobson royjacobson force-pushed the cancellation_cb_epoll branch 4 times, most recently from 94eb02b to 8de145c Compare September 28, 2023 08:32
@royjacobson
Copy link
Contributor Author

This seems to still fail:
https://github.com/dragonflydb/dragonfly/actions/runs/6336485574/job/17209646367?pr=1940

But it just silently times out and I'm not sure if it's my fault or not.

@adiholden
Copy link
Collaborator

@royjacobson I see the regression tests finish with 3 minutes in other CIs and here we get 48 minutes. Does it pass on your laptop?

@royjacobson
Copy link
Contributor Author

@royjacobson I see the regression tests finish with 3 minutes in other CIs and here we get 48 minutes. Does it pass on your laptop?

Tested locally, await client.close() seems to block (?) on my laptop in the test_tls_insecure test. Unfortunately don't have time to investigate :/

@adiholden
Copy link
Collaborator

@royjacobson Thank you, I will take it from here. Best wishes

@dranikpg dranikpg marked this pull request as ready for review October 10, 2023 11:15
@dranikpg
Copy link
Contributor

Will add romange/helio#155 here once its merged to improve error diagnostics

@adiholden
Copy link
Collaborator

Hi @dranikpg is this ready? are you still waiting on helio?

@dranikpg dranikpg force-pushed the cancellation_cb_epoll branch from 9b9a076 to d23e1f7 Compare October 22, 2023 08:06
@dranikpg
Copy link
Contributor

@adiholden

Sure, I forgot to merge in helio then 😅 I've rebased just now on main which should contain the latest helio pull

@dranikpg dranikpg merged commit 313501d into main Oct 23, 2023
@dranikpg dranikpg deleted the cancellation_cb_epoll branch October 23, 2023 10:08
@dranikpg
Copy link
Contributor

dranikpg commented Nov 4, 2023

#1960

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Epoll sockets don't cancel blocking commands when they close
4 participants