-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move timers to kernel thread #736
Comments
A kernel thread should traverse the counters and sleep until timeout for the first updated APM elapses. |
The issue simply hangs a server on 30K server groups. The issue seems relatively complex. There are several thoughts about requirements to the timer replacing threads:
It's worth mentioning that comment for Client keep-alive timerCurrently Another opportunity is to modify TCP keepalive timer to drop inactive connection instead of sending TCP probe. In this case the standard behaviour must be saved and the new one is introduced for client connections only. The pros of the approach is only one timer in the system instead of two. The cons is larger kernel patch. APM percentiles updateThe reason for
This particular point is linked with issue #712 (Review APM & Ratio scheduler). If a "good" APM should use more complex algorithms, then with #712 in mind the whole analytics must be moved to user space and probably get cleanded (e.g. sampled) data from the kernel. ReferencesSee the timer wheel design along with the appropriate code comment - there are already timer arrays with declining accuracy to batch the timer events along with timer slack. |
More timer issuesBH synchronizationsNote that #916 introduces multiple Also keep this comment #916 (comment) in mind. Server connectionsCurrently server connections, even for the initial attempt in Keep alive timer on client connections#1428 happened due to timer mismanagement: there are 4 possible concurrent events on a client socket:
So we had to introduce a We can introduce a socket state ( Too many timer modificationsAt the moment at least |
Currently, we do some calculations in timer basis and the calculations can be relatively expensive. Meantime, timer is serviced in the same softirq processing network traffic, so it can lead to packet loss. A separate kernel thread should be introduced to manage such non crucial logic (e.g. connection eviction and statistics). Another option and/or feature of the timer implementation is described in #736 (comment) as a two-mode timer implementation balancing between softirq shots and timer interrupts.
It's worth mentioning that Linux TCP code also uses timers, so an additional research of behavior of concurrently armed timers on thousands of connections (e.g. Slow DDoS) is required.
The text was updated successfully, but these errors were encountered: