-
-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: ping race condition #144
base: main
Are you sure you want to change the base?
Conversation
While you are correct that the setter for the This being said, the logic for handling "locking" in this code has at least three TOCTOU errors I've seen so far. (To be sure, I mean no insult to @0xTim or his efforts to make the code In any event, while your fix is definitely in the correct spirit (and that method should definitely be making that check, even if only as an assertion), I don't think it's actually solving the underlying issue, just hiding it. Off the top of my head, the fixes that are needed include:
I think that covers all the obvious issues I saw so far. I know this is a lot - if you aren't sure how to tackle it, I'll see if I can do at least some of it this weekend (assuming @0xTim doesn't beat me to it). |
@gwynne Thank you for the detailed response. I was aware that this PR doesn't really fix the cause. Unfortunately, I have little experience with Sendable, especially Let me know if I can help in any way. Thanks again for the fast response! |
@gwynne Quick update: with the changes in this PR the crash still occurs, just less frequently. I am down to 1 crash every 3-4 days with the same setup as described in my first post. So you are right that this is not the fix yet. |
I have been guessing in the dark for a while but this seems to fix the crash (see below) that is happening in certain cases. I cannot reproduce them other than extremely heavy load for hours or sometimes days.
In general, having ~2000 active Websocket connections with 10 second ping intervals each on 8 different servers generates 1-2 of those crashes per day.
I assumed it was a race condition, which was confirmed by a few people on the Discord as well. These changes seem to fix it, which means I haven't been able to see the crash since.
In general, being in the EventLoop is a good approach anyways, even if I am not sure why there are cases of race conditions.
One idea I had is that setting
pingInterval
triggers the schedule function, but is not guaranteed to be in the EventLoop. If channel is accessed at the same time from somewhere else the race condition might occur. It's a guess though.Original Crash Report: