Move timers to kernel thread #736

krizhanovsky · 2017-05-29T10:37:56Z

Currently, we do some calculations in timer basis and the calculations can be relatively expensive. Meantime, timer is serviced in the same softirq processing network traffic, so it can lead to packet loss. A separate kernel thread should be introduced to manage such non crucial logic (e.g. connection eviction and statistics). Another option and/or feature of the timer implementation is described in #736 (comment) as a two-mode timer implementation balancing between softirq shots and timer interrupts.

It's worth mentioning that Linux TCP code also uses timers, so an additional research of behavior of concurrently armed timers on thousands of connections (e.g. Slow DDoS) is required.

krizhanovsky · 2018-02-16T19:48:43Z

tfw_apm_add_srv() creates a timer for each server, so for scenario of #76 with thousands of servers it creates thousands of timers, so Tempesta FW being w/o any load eats CPU for 10K servers:

    26.64%  [tempesta_fw]  [k] tfw_apm_prcntl_tmfn
    16.04%  [kernel]       [k] __lock_acquire
     6.30%  [kernel]       [k] lock_release
     5.67%  [kernel]       [k] run_timer_softirq
     5.48%  [kernel]       [k] check_chain_key

A kernel thread should traverse the counters and sleep until timeout for the first updated APM elapses.

krizhanovsky · 2018-02-20T02:27:18Z

The issue simply hangs a server on 30K server groups.

The issue seems relatively complex. There are several thoughts about requirements to the timer replacing threads:

there should be per-cpu threads, just like ksoftirqd (see add_timer_on()).
the threads must work with timer-like arrays, so a producer can schedule a work to be done nearly in 1ms after scheduling a work to be done nearly in 1 s.
it seems the threads can do the jobs in near time, mroe relaxed than Linux kernel timer does it;
the jobs must be aggresively batched, e.g. it has sense to reestablish many connections at once (probably wich have slightly different retry timeouts) than execute a reestablishing function for each connection;
the threads must actively voluntary yield CPU for softirqs

It's worth mentioning that comment for inet_csk_init_xmit_timers() suggests to replace the 3 timers by only one with varying delays depending on the expected event.

Client keep-alive timer

Currently tfw_cli_conn_send() can be called from a softirq serving server response on a CPU X and modifies the timer on the CPU. Meantime the timer was set up by tfw_cli_conn_alloc()on establishing a new client connection on CPU Y (the CPU native the client connection). So it has sense to replace the timer by a per-CPU lists of client connections and make a CPU gate, like ss_send()/ss_do_send(), which reinserts a connection on transmission. The old connections can be evicted (closed) at the same time with the reinsertion and by a timer or a new eviction thread. The eviction thread should be lighter than a timer because it doesn't require mod_timer() and can exit quickly on just checking a timestamp of a client connection at the head of the queue.

Another opportunity is to modify TCP keepalive timer to drop inactive connection instead of sending TCP probe. In this case the standard behaviour must be saved and the new one is introduced for client connections only. The pros of the approach is only one timer in the system instead of two. The cons is larger kernel patch.

APM percentiles update

The reason for tfw_apm_prcntl_tmfn() to be in the top is heavy calculations with many loops, not the timers itself. If possible, tfw_apm_prcntl_tmfn() should be reworked to:

perform necessary computation only on the local CPU;
use less number of loops;
make computations on response time updates rather than by timeout.

This particular point is linked with issue #712 (Review APM & Ratio scheduler). If a "good" APM should use more complex algorithms, then with #712 in mind the whole analytics must be moved to user space and probably get cleanded (e.g. sampled) data from the kernel.

References

See the timer wheel design along with the appropriate code comment - there are already timer arrays with declining accuracy to batch the timer events along with timer slack.

krizhanovsky · 2018-03-02T15:06:22Z

More timer issues

BH synchronizations

Note that #916 introduces multiple _bh() synchronization calls, e.g. #916 (comment) . With moving some work from softirq (timer) context to kernel threads, the synchronization can be relaxed to remove _bh()'s.

Also keep this comment #916 (comment) in mind.

Server connections

Currently server connections, even for the initial attempt in tfw_sock_srv_connect_srv(), are established through a timer. It seems it was done because of simpler implementation, but technically this looks dirty.

Keep alive timer on client connections

#1428 happened due to timer mismanagement: there are 4 possible concurrent events on a client socket:

keep alive timer, probably closing a corresponding client connection (the case for BUG at ttls.c:2243 #1428)
transmission on the socket (e.g. response forwarding)
connection_drop hook call, e.g. on connection close from the peer
connection closing from Tempesta side with appropriate TLS close notify alert transmission (the case for BUG at ttls.c:2243 #1428)

So we had to introduce a timer_lock in #1429 to prevent the races.

We can introduce a socket state (closed/active) and timestamp for the KA timer. All the socket connections residing on a local CPU can be arranged in a queue by the KA timestamp. Each softirq shot just needs to check the queue head for elaspsed timestamps in a busy looping mode. If there is not enough traffic, then we should fallback to the timer interrupts. I.e. it makes sense to introduce a timer implementation for Tempesta code, which work in two modes: softirq busy looping and timer interrupts.

Too many timer modifications

At the moment at least clnt_sock.c and websocket.c call mod_timer() on each ingress packet. Having that we normally may have 1 hour keep-alive timer, we do way too many timer updates. It could make sense just to let timer expire and check a connection timestamps.

krizhanovsky added the performance label May 29, 2017

krizhanovsky added this to the 1.0 WebOS milestone May 29, 2017

keshonok mentioned this issue Jun 17, 2017

Review APM & Ratio scheduler #712

Open

krizhanovsky modified the milestones: backlog, 0.10 Kernel-User Space Transport Feb 16, 2018

krizhanovsky mentioned this issue Feb 20, 2018

Requests scheduling to massive farm of backend servers #76

Closed

krizhanovsky added the crucial label Feb 20, 2018

krizhanovsky modified the milestones: 0.10 Kernel-User Space Transport , 0.9 Web server Feb 20, 2018

This was referenced Feb 26, 2018

Fix #672: Add health monitoring for backend servers. #877

Merged

Fix #76 #886 #908 #911, many other fixes and optimizations #916

Merged

krizhanovsky modified the milestones: 1.3 Web server, 1.1 QUIC Aug 8, 2018

krizhanovsky modified the milestones: 1.1 QUIC, 1.0 Beta Sep 9, 2018

krizhanovsky modified the milestones: 1.0 Beta, 0.9 TDBv0.2 Feb 2, 2019

krizhanovsky modified the milestones: 0.9 TDBv0.2, 1.1 TDB (ML, QUIC, DoH etc.) Jun 26, 2019

krizhanovsky removed the crucial label Jun 26, 2019

krizhanovsky modified the milestones: 1.1 TBD (ML, QUIC, DoH etc.), 1.0 Stability - GA Jan 21, 2020

krizhanovsky mentioned this issue Aug 21, 2020

Delete keep-alive timer on connection_drop hook #1429

Merged

krizhanovsky modified the milestones: 1.1 TBD - ML, DoH and other features after 1.0, 1.2 TBD Jan 3, 2022

krizhanovsky mentioned this issue Apr 20, 2022

Websocket simple proxy protocol implementation #1595

Merged

krizhanovsky modified the milestones: 1.xx TBD, 1.x: TBD Apr 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move timers to kernel thread #736

Move timers to kernel thread #736

krizhanovsky commented May 29, 2017 •

edited

Loading

krizhanovsky commented Feb 16, 2018 •

edited

Loading

krizhanovsky commented Feb 20, 2018 •

edited

Loading

krizhanovsky commented Mar 2, 2018 •

edited

Loading

Move timers to kernel thread #736

Move timers to kernel thread #736

Comments

krizhanovsky commented May 29, 2017 • edited Loading

krizhanovsky commented Feb 16, 2018 • edited Loading

krizhanovsky commented Feb 20, 2018 • edited Loading

Client keep-alive timer

APM percentiles update

References

krizhanovsky commented Mar 2, 2018 • edited Loading

More timer issues

BH synchronizations

Server connections

Keep alive timer on client connections

Too many timer modifications

krizhanovsky commented May 29, 2017 •

edited

Loading

krizhanovsky commented Feb 16, 2018 •

edited

Loading

krizhanovsky commented Feb 20, 2018 •

edited

Loading

krizhanovsky commented Mar 2, 2018 •

edited

Loading