-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http_server: Fix keep-alive for HTTP/1.1 requests #8761
Conversation
This patch moves the initialisation of a HTTP/1.1 stream from the session initialisation to the session ingestion step. Previously, using keep-alive on HTTP/1.1 would panic on the second request, as it would attempt to use components like the header map that are destroyed after each stream, but only initialised at the start of the session, when the HTTP protocol autodetection was run. Signed-off-by: Andrew Titmuss <[email protected]>
This patch stops the forced connection-close for requests that aren't HTTP/2 on the in_http plugin Signed-off-by: Andrew Titmuss <[email protected]>
This patch stops the forced connection-close for requests that aren't HTTP/2 on the in_elasticsearch plugin Signed-off-by: Andrew Titmuss <[email protected]>
This patch stops the forced connection-close for requests that aren't HTTP/2 on the in_opentelemetry plugin Signed-off-by: Andrew Titmuss <[email protected]>
This patch stops the forced connection-close for requests that aren't HTTP/2 on the in_splunk plugin Signed-off-by: Andrew Titmuss <[email protected]>
Valgrind output below Ran the following curl command twice while fluent-bit was running curl -v --http1.1 -d '{"hello":"world"}' -H 'Content-Type: application/json' 'http://localhost:9880/?[0-2]' $ valgrind --leak-check=full ./bin/fluent-bit -i http -o stdout
==582166== Memcheck, a memory error detector
==582166== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==582166== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==582166== Command: ./bin/fluent-bit -i http -o stdout
==582166==
Fluent Bit v3.0.3
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
___________.__ __ __________.__ __ ________
\_ _____/| | __ __ ____ _____/ |_ \______ \__|/ |_ ___ _\_____ \
| __) | | | | \_/ __ \ / \ __\ | | _/ \ __\ \ \/ / _(__ <
| \ | |_| | /\ ___/| | \ | | | \ || | \ / / \
\___ / |____/____/ \___ >___| /__| |______ /__||__| \_/ /______ /
\/ \/ \/ \/ \/
[2024/04/25 21:10:25] [ info] [fluent bit] version=3.0.3, commit=e47718ad23, pid=582166
[2024/04/25 21:10:25] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2024/04/25 21:10:25] [ info] [cmetrics] version=0.8.0
[2024/04/25 21:10:25] [ info] [ctraces ] version=0.5.0
[2024/04/25 21:10:25] [ info] [output:stdout:stdout.0] worker #0 started
[2024/04/25 21:10:25] [ info] [input:http:http.0] initializing
[2024/04/25 21:10:25] [ info] [input:http:http.0] storage_strategy='memory' (memory only)
[2024/04/25 21:10:25] [ info] [sp] stream processor started
[0] http.0: [[1714043453.692853667, {}], {"hello"=>"world"}]
[1] http.0: [[1714043453.723406781, {}], {"hello"=>"world"}]
[2] http.0: [[1714043453.724659850, {}], {"hello"=>"world"}]
[0] http.0: [[1714043461.957414623, {}], {"hello"=>"world"}]
[1] http.0: [[1714043461.958202834, {}], {"hello"=>"world"}]
[2] http.0: [[1714043461.958626946, {}], {"hello"=>"world"}]
[2024/04/25 21:11:03] [engine] caught signal (SIGINT)
[2024/04/25 21:11:03] [ warn] [engine] service will shutdown in max 5 seconds
[2024/04/25 21:11:04] [ info] [engine] service has stopped (0 pending tasks)
[2024/04/25 21:11:04] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2024/04/25 21:11:04] [ info] [output:stdout:stdout.0] thread worker #0 stopped
==582166==
==582166== HEAP SUMMARY:
==582166== in use at exit: 0 bytes in 0 blocks
==582166== total heap usage: 2,055 allocs, 2,055 frees, 2,205,819 bytes allocated
==582166==
==582166== All heap blocks were freed -- no leaks are possible
==582166==
==582166== For lists of detected and suppressed errors, rerun with: -s
==582166== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) |
Debug output below Ran the same curl request as the Valgrind check above $ ./bin/fluent-bit -vv -i http -o stdout
Fluent Bit v3.0.3
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
___________.__ __ __________.__ __ ________
\_ _____/| | __ __ ____ _____/ |_ \______ \__|/ |_ ___ _\_____ \
| __) | | | | \_/ __ \ / \ __\ | | _/ \ __\ \ \/ / _(__ <
| \ | |_| | /\ ___/| | \ | | | \ || | \ / / \
\___ / |____/____/ \___ >___| /__| |______ /__||__| \_/ /______ /
\/ \/ \/ \/ \/
[2024/04/25 21:15:48] [ info] Configuration:
[2024/04/25 21:15:48] [ info] flush time | 1.000000 seconds
[2024/04/25 21:15:48] [ info] grace | 5 seconds
[2024/04/25 21:15:48] [ info] daemon | 0
[2024/04/25 21:15:48] [ info] ___________
[2024/04/25 21:15:48] [ info] inputs:
[2024/04/25 21:15:48] [ info] http
[2024/04/25 21:15:48] [ info] ___________
[2024/04/25 21:15:48] [ info] filters:
[2024/04/25 21:15:48] [ info] ___________
[2024/04/25 21:15:48] [ info] outputs:
[2024/04/25 21:15:48] [ info] stdout.0
[2024/04/25 21:15:48] [ info] ___________
[2024/04/25 21:15:48] [ info] collectors:
[2024/04/25 21:15:48] [ info] [fluent bit] version=3.0.3, commit=e47718ad23, pid=583141
[2024/04/25 21:15:48] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2024/04/25 21:15:48] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2024/04/25 21:15:48] [ info] [cmetrics] version=0.8.0
[2024/04/25 21:15:48] [ info] [ctraces ] version=0.5.0
[2024/04/25 21:15:48] [ info] [input:http:http.0] initializing
[2024/04/25 21:15:48] [ info] [input:http:http.0] storage_strategy='memory' (memory only)
[2024/04/25 21:15:48] [debug] [http:http.0] created event channels: read=21 write=22
[2024/04/25 21:15:48] [debug] [downstream] listening on 0.0.0.0:9880
[2024/04/25 21:15:48] [debug] [stdout:stdout.0] created event channels: read=24 write=25
[2024/04/25 21:15:48] [ info] [sp] stream processor started
[2024/04/25 21:15:48] [ info] [output:stdout:stdout.0] worker #0 started
[2024/04/25 21:15:53] [trace] [io] connection OK
[2024/04/25 21:15:53] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2024/04/25 21:15:53] [trace] [io coro=(nil)] [net_read] ret=149
[2024/04/25 21:15:53] [trace] [input chunk] update output instances with new chunk size diff=38, records=1, input=http.0
[2024/04/25 21:15:53] [trace] [io coro=(nil)] [net_write] trying 43 bytes
[2024/04/25 21:15:53] [trace] [io coro=(nil)] [net_write] ret=43 total=43/43
[2024/04/25 21:15:53] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2024/04/25 21:15:53] [trace] [io coro=(nil)] [net_read] ret=149
[2024/04/25 21:15:53] [trace] [input chunk] update output instances with new chunk size diff=38, records=1, input=http.0
[2024/04/25 21:15:53] [trace] [io coro=(nil)] [net_write] trying 43 bytes
[2024/04/25 21:15:53] [trace] [io coro=(nil)] [net_write] ret=43 total=43/43
[2024/04/25 21:15:53] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2024/04/25 21:15:53] [trace] [io coro=(nil)] [net_read] ret=149
[2024/04/25 21:15:53] [trace] [input chunk] update output instances with new chunk size diff=38, records=1, input=http.0
[2024/04/25 21:15:53] [trace] [io coro=(nil)] [net_write] trying 43 bytes
[2024/04/25 21:15:53] [trace] [io coro=(nil)] [net_write] ret=43 total=43/43
[2024/04/25 21:15:53] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2024/04/25 21:15:53] [trace] [io coro=(nil)] [net_read] ret=0
[2024/04/25 21:15:53] [trace] [downstream] destroy connection #42 to tcp://127.0.0.1:49556
[2024/04/25 21:15:54] [trace] [task 0x70ba88025220] created (id=0)
[2024/04/25 21:15:54] [debug] [task] created task=0x70ba88025220 id=0 OK
[2024/04/25 21:15:54] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] http.0: [[1714043753.605323373, {}], {"hello"=>"world"}]
[1] http.0: [[1714043753.605581054, {}], {"hello"=>"world"}]
[2] http.0: [[1714043753.605736594, {}], {"hello"=>"world"}]
[2024/04/25 21:15:54] [debug] [out flush] cb_destroy coro_id=0
[2024/04/25 21:15:54] [trace] [coro] destroy coroutine=0x70ba80001050 data=0x70ba80001070
[2024/04/25 21:15:54] [trace] �[93m[engine] [task event]�[0m task_id=0 out_id=0 return=OK
[2024/04/25 21:15:54] [debug] [task] destroy task=0x70ba88025220 (task_id=0)
[2024/04/25 21:15:55] [trace] [io] connection OK
[2024/04/25 21:15:55] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2024/04/25 21:15:55] [trace] [io coro=(nil)] [net_read] ret=149
[2024/04/25 21:15:55] [trace] [input chunk] update output instances with new chunk size diff=38, records=1, input=http.0
[2024/04/25 21:15:55] [trace] [io coro=(nil)] [net_write] trying 43 bytes
[2024/04/25 21:15:55] [trace] [io coro=(nil)] [net_write] ret=43 total=43/43
[2024/04/25 21:15:55] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2024/04/25 21:15:55] [trace] [io coro=(nil)] [net_read] ret=149
[2024/04/25 21:15:55] [trace] [input chunk] update output instances with new chunk size diff=38, records=1, input=http.0
[2024/04/25 21:15:55] [trace] [io coro=(nil)] [net_write] trying 43 bytes
[2024/04/25 21:15:55] [trace] [io coro=(nil)] [net_write] ret=43 total=43/43
[2024/04/25 21:15:55] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2024/04/25 21:15:55] [trace] [io coro=(nil)] [net_read] ret=149
[2024/04/25 21:15:55] [trace] [input chunk] update output instances with new chunk size diff=38, records=1, input=http.0
[2024/04/25 21:15:55] [trace] [io coro=(nil)] [net_write] trying 43 bytes
[2024/04/25 21:15:55] [trace] [io coro=(nil)] [net_write] ret=43 total=43/43
[2024/04/25 21:15:55] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2024/04/25 21:15:55] [trace] [io coro=(nil)] [net_read] ret=0
[2024/04/25 21:15:55] [trace] [downstream] destroy connection #42 to tcp://127.0.0.1:49570
[2024/04/25 21:15:56] [trace] [task 0x70ba88025ec0] created (id=0)
[2024/04/25 21:15:56] [debug] [task] created task=0x70ba88025ec0 id=0 OK
[0] http.0: [[1714043755.870506601, {}], {"hello"=>"world"}]
[1] http.0: [[1714043755.871465391, {}], {"hello"=>"world"}]
[2] http.0: [[1714043755.871616262, {}], {"hello"=>"world"}]
[2024/04/25 21:15:56] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[2024/04/25 21:15:56] [debug] [out flush] cb_destroy coro_id=1
[2024/04/25 21:15:56] [trace] [coro] destroy coroutine=0x70ba80008370 data=0x70ba80008390
[2024/04/25 21:15:56] [trace] �[93m[engine] [task event]�[0m task_id=0 out_id=0 return=OK
[2024/04/25 21:15:56] [debug] [task] destroy task=0x70ba88025ec0 (task_id=0)
[2024/04/25 21:15:59] [engine] caught signal (SIGINT)
[2024/04/25 21:15:59] [trace] [engine] flush enqueued data
[2024/04/25 21:15:59] [ warn] [engine] service will shutdown in max 5 seconds
[2024/04/25 21:16:00] [ info] [engine] service has stopped (0 pending tasks)
[2024/04/25 21:16:00] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2024/04/25 21:16:00] [ info] [output:stdout:stdout.0] thread worker #0 stopped |
And finally for reference, here's the curl output for one execution of above $ curl -v --http1.1 -d '{"hello":"world"}' -H 'Content-Type: application/json' 'http://localhost:9880/?[0-2]'
* Host localhost:9880 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
* Trying [::1]:9880...
* connect to ::1 port 9880 from ::1 port 40718 failed: Connection refused
* Trying 127.0.0.1:9880...
* Connected to localhost (127.0.0.1) port 9880
> POST /?0 HTTP/1.1
> Host: localhost:9880
> User-Agent: curl/8.7.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 17
>
* upload completely sent off: 17 bytes
< HTTP/1.1 201 Created
< content-length: 0
<
* Connection #0 to host localhost left intact
* Found bundle for host: 0x643f13fd0220 [serially]
* Re-using existing connection with host localhost
> POST /?1 HTTP/1.1
> Host: localhost:9880
> User-Agent: curl/8.7.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 17
>
* upload completely sent off: 17 bytes
< HTTP/1.1 201 Created
< content-length: 0
<
* Connection #0 to host localhost left intact
* Found bundle for host: 0x643f13fd0220 [serially]
* Re-using existing connection with host localhost
> POST /?2 HTTP/1.1
> Host: localhost:9880
> User-Agent: curl/8.7.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 17
>
* upload completely sent off: 17 bytes
< HTTP/1.1 201 Created
< content-length: 0
<
* Connection #0 to host localhost left intact |
Not sure why that macOS test failed - concerning since the specific failure relates to |
As for the performance difference with this change, I'm seeing about a 25% improvement with this setup:
upstream fluent {
server 127.0.0.1:9880;
keepalive 16;
}
server {
listen 80;
location / {
proxy_pass http://fluent;
proxy_http_version 1.1;
proxy_set_header "Connection" "";
}
}
import http from 'k6/http';
export const options = {
vus: 100,
duration: '30s'
};
export default function() {
const url = 'http://localhost';
const payload = '{"hello":"world"}';
const params = {
headers: {
'Content-Type': 'application/json'
}
};
http.post(url, payload, params);
}
Before: $ k6 run script.js
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: script.js
output: -
scenarios: (100.00%) 1 scenario, 100 max VUs, 1m0s max duration (incl. graceful stop):
* default: 100 looping VUs for 30s (gracefulStop: 30s)
data_received..................: 163 MB 5.4 MB/s
data_sent......................: 165 MB 5.5 MB/s
http_req_blocked...............: avg=2.45µs min=250ns med=801ns max=37.58ms p(90)=1.21µs p(95)=1.38µs
http_req_connecting............: avg=488ns min=0s med=0s max=31.97ms p(90)=0s p(95)=0s
http_req_duration..............: avg=2.59ms min=55.39µs med=2.6ms max=37.08ms p(90)=2.9ms p(95)=3.14ms
{ expected_response:true }...: avg=2.67ms min=83.12µs med=2.6ms max=37.08ms p(90)=2.91ms p(95)=3.16ms
http_req_failed................: 3.08% ✓ 35217 ✗ 1104829
http_req_receiving.............: avg=10.31µs min=3.2µs med=8.18µs max=5.81ms p(90)=12.71µs p(95)=14.5µs
http_req_sending...............: avg=5.64µs min=1.69µs med=4.46µs max=34.6ms p(90)=6.68µs p(95)=7.43µs
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=2.58ms min=42.59µs med=2.59ms max=21.72ms p(90)=2.88ms p(95)=3.12ms
http_reqs......................: 1140046 37996.734938/s
iteration_duration.............: avg=2.62ms min=68.5µs med=2.62ms max=40.52ms p(90)=2.93ms p(95)=3.18ms
iterations.....................: 1140046 37996.734938/s
vus............................: 100 min=100 max=100
vus_max........................: 100 min=100 max=100
running (0m30.0s), 000/100 VUs, 1140046 complete and 0 interrupted iterations
default ✓ [======================================] 100 VUs 30s After: $ k6 run script.js
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: script.js
output: -
scenarios: (100.00%) 1 scenario, 100 max VUs, 1m0s max duration (incl. graceful stop):
* default: 100 looping VUs for 30s (gracefulStop: 30s)
data_received..................: 180 MB 6.0 MB/s
data_sent......................: 208 MB 6.9 MB/s
http_req_blocked...............: avg=2.25µs min=250ns med=601ns max=48.18ms p(90)=952ns p(95)=1.14µs
http_req_connecting............: avg=522ns min=0s med=0s max=43.31ms p(90)=0s p(95)=0s
http_req_duration..............: avg=2.06ms min=58.63µs med=1.88ms max=99.62ms p(90)=3.35ms p(95)=3.71ms
{ expected_response:true }...: avg=2.06ms min=58.63µs med=1.88ms max=99.62ms p(90)=3.35ms p(95)=3.71ms
http_req_failed................: 0.00% ✓ 0 ✗ 1431309
http_req_receiving.............: avg=8.6µs min=2.91µs med=6.28µs max=6.4ms p(90)=9.27µs p(95)=11.15µs
http_req_sending...............: avg=4.84µs min=1.51µs med=3.53µs max=44.83ms p(90)=5.32µs p(95)=6.18µs
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=2.05ms min=46.14µs med=1.87ms max=99.6ms p(90)=3.33ms p(95)=3.7ms
http_reqs......................: 1431309 47707.976513/s
iteration_duration.............: avg=2.09ms min=81.63µs med=1.9ms max=99.65ms p(90)=3.37ms p(95)=3.73ms
iterations.....................: 1431309 47707.976513/s
vus............................: 100 min=100 max=100
vus_max........................: 100 min=100 max=100
running (0m30.0s), 000/100 VUs, 1431309 complete and 0 interrupted iterations
default ✓ [======================================] 100 VUs 30s |
You are correct about the second request causing an issue but in the case of HTTP/1.x the stream instance is supposed to live through the entire session so I think these patches would be a better fit :
This ensures that the request instance is properly initialized at all times while also removing a leak that could be triggered by prematurely closed connections. |
This patch addresses feedback from @leonardo-albertovich, reducing how many components are being recreated on each new request for a single session. Now it only recreates the request object, rather than creating a new stream for each request of a session. It replaces my original fix in 0ef074a Signed-off-by: Andrew Titmuss <[email protected]>
Makes sense, and works on my end too. Updated the PR with your patch! |
Signed-off-by: Andrew Titmuss <[email protected]>
This patch moves the initialisation of a HTTP/1.1 stream from the session initialisation to the session ingestion step.
Previously, using keep-alive on HTTP/1.1 would panic on the second request, as it would attempt to use components like the header map that are destroyed after each stream, but only initialised at the start of the session, when the HTTP protocol autodetection was run.
With this fix, HTTP/1.1 keep-alive is now enabled for the HTTP, OpenTelemetry, Elasticsearch and Splunk inputs
Fixes #8758
Enter
[N/A]
in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-test
label to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.