-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Requests scheduling to massive farm of backend servers #76
Comments
Well, the
Of course, we can re-allocate the array as needed and so on,
These two cases require different implementations and involve different optimizations, so I think we really need separate modules. |
This is not a different logic, this is just a different cases, so they should be treated in the same code base. Probably, you just can allocate array for small server set or use hash table or trees to handle thousands of servers. But the different containers should be processed by the same logic. Or please make an example of logic which is fundamentally different for the cases. |
The system should dynamically establish new connections to busy upstream servers and also dynamically shrink redundant connections (also applicable for forward proxy case). UPD. It still has sense to be able to change number of connections to upstream servers. However, Tempesta FW will not support forward proxying. With wide HTTPS usage forward proxying is limited by corporate networks and other small installation which do not process millions requests per second. There is no ISP usage any more. So this is completely different use case with different environment and requirements. UPD 2. I created a new issue #710 for the functionality, so no need to implement it this time. |
As we've seen in our performance benchmarks and shown in third-party benchmarks HTTP servers, like Nginx or Apache HTTPD, shows quite low performance on 4 concurrent connection, so our current default 4 server connections and 32 as a maximum number are just inadequate. I'd say 32 as default connections with VMs running Tempesta together with a user space HTTP server in mind, and 32768 (USHORT_MAX - 1024, which is 64512 is the maximum number of ephimeral ports. The main consequence of the issue is that all current scheduling algorithms must be reworked to support dynamically sized arrays. A naive solution could be to keep schedulers data per CPU and establish number of upstream connections equal to The issue relates to #51 since that also updates schedulers code. |
While the 2-tier schedulers are certainly should be modified to support dynamically sized arrays, the real performance issue is with HTTP scheduler which in practice must be able to process thousands of server groups. The problem is in In current milestone these constants should be eliminated in PRs #670 and #666. UPD. This comment is separate into a new issue #732, so it shouldn't be done in context of #76. |
All the requirements are already implemented or moved to separated issues/task. |
It seems the issue is done, but we still have no results from #680 test. Let's close it if the test shows that we really able to efficiently handle 1M hosts. |
Creating many backends, with 1 backend in server group, causes problems. Creating 16 interfaces with 64 ports on interface, makes problem:
8x32: backends created with nginx, single nginx per interface, nginx config contains ports used: 16384, 16375, etc for each interface |
testing: test_1M.py from vlts-680-1M |
I didn't notice
|
After the fix 6d11ff1
|
After the fix 94b18ed performance profile became:
However, reloading 10K server groups takes about 30 seconds, the same as for full restart. |
With the commit c58993a (also https://github.com/tempesta-tech/linux-4.9.35-tfw/commit/f20d5703592ce3078d3415edbc5b2703f614d9b7 for the kernel) I still cannot normally start Tempesta FW with 30K backends using configuration #680 (comment) . (Surely it'd be better to use many IP addresses and ports to avoid lock contention on single TCP socket.) The system hangs on softirq softlockups. Only following patch allows to "normally" start Tempesta FW: diff --git a/tempesta_fw/apm.c b/tempesta_fw/apm.c
index b82a3ce..5f78ee1 100644
--- a/tempesta_fw/apm.c
+++ b/tempesta_fw/apm.c
@@ -1034,9 +1034,10 @@ tfw_apm_add_srv(TfwServer *srv)
/* Start the timer for the percentile calculation. */
set_bit(TFW_APM_DATA_F_REARM, &data->flags);
+ goto AK_DBG;
setup_timer(&data->timer, tfw_apm_prcntl_tmfn, (unsigned long)data);
mod_timer(&data->timer, jiffies + TFW_APM_TIMER_INTVL);
-
+AK_DBG:
srv->apmref = data;
return 0;
diff --git a/tempesta_fw/sock_srv.c b/tempesta_fw/sock_srv.c
index dc9e0ba..3b4e361 100644
--- a/tempesta_fw/sock_srv.c
+++ b/tempesta_fw/sock_srv.c
@@ -227,7 +227,12 @@ tfw_sock_srv_connect_try_later(TfwSrvConn *srv_conn)
/* Don't rearm the reconnection timer if we're about to shutdown. */
if (unlikely(!ss_active()))
return;
-
+{
+ static unsigned long delta = 0;
+ timeout = 1000 + delta;
+ delta += 10;
+ goto AK_DBG_end;
+}
if (srv_conn->recns < ARRAY_SIZE(tfw_srv_tmo_vals)) {
if (srv_conn->recns)
TFW_DBG_ADDR("Cannot establish connection",
@@ -249,7 +254,7 @@ tfw_sock_srv_connect_try_later(TfwSrvConn *srv_conn)
timeout = tfw_srv_tmo_vals[ARRAY_SIZE(tfw_srv_tmo_vals) - 1];
}
srv_conn->recns++;
-
+AK_DBG_end:
mod_timer(&srv_conn->timer, jiffies + msecs_to_jiffies(timeout));
}
@@ -2119,7 +2124,7 @@ static TfwCfgSpec tfw_srv_group_specs[] = {
},
{
.name = "server_connect_retries",
- .deflt = "10",
+ .deflt = "1", // AK_DBG "10",
.handler = tfw_cfgop_in_conn_retries,
.spec_ext = &(TfwCfgSpecInt) {
.range = { 0, INT_MAX }, The reason is #736: |
Currently
TFW_SCHED_MAX_SERVERS
is defined as 64 backend server at maximum, which is not enough for virtualized environments (virtual hosting or clouds), so it should be extended at least to 64K and all scheduling algorithms also must be updated accordingly to process such number of backend servers.We still do not expect too many servers per one site group among which client requests are scheduled, but we expect a lot of independent sites.
CRUCIAL NOTE: it's quite atypical to have 64K servers in the same server group. Virtualized environment means many small small sites behind Tempesta FW, i.e. the sense of the issue is HTTP scheduler, which must schedule a request among thousands of server groups. In this case most of server groups can have only one server. Meantime there could be really powerful installations with hundreds upstream servers. Thus 2-tier schedulers (ration, hash etc.) still must have dynamic arrays for connections and servers, but probably we don't need to introduce special data structures able to efficiently handle thousands of servers in the same server group.
The text was updated successfully, but these errors were encountered: