Missing HTTP status codes for cancelled search requests #73849

jporter-dev · 2020-07-30T17:54:56Z

Kibana version: 7.8.1

Elasticsearch version: 7.8.1

Server OS version: Ubuntu 16.04.6 LTS

Browser version: Firefox 79.0b9

Browser OS version: macOS Catalina 10.15.6

Original install method (e.g. download page, yum, from source, etc.): apt

Describe the bug: Some responses from /internal/search/es respond with empty status code, header, and body. Presumably from cancelled queries.

Steps to reproduce:

Load up a dashboard and query/filter
Remove filters
Re-add filters
Observe network tab with blank status codes, zero headers, and no response body.

Expected behavior:
Network log should contain proper queries to the ES API with corresponding responses. Cancelled queries should respond with a proper HTTP status code (204 seems appropriate).

Screenshots (if relevant):

Errors in browser console (if relevant):
Failed to load resource: the server responded with a status of 502 (Proxy Error) - this only occurs when Kibana is reverse proxied.

Any additional context:
This is primarily an issue when reverse proxying Kibana. Apache fusses when it receives an upstream response that doesn't have a valid header or is missing the response code.

A temporary remediation for this with Apache is setting ProxyBadHeader Ignore in the server config. Obviously this is not the ideal solution, since the cancelled inflight requests should return a valid HTTP status code despite responding with no content.

Edit: The ProxyBadHeader fix doesn't fully resolve the problem. Not 100% why but I speculate that the upstream server is both sending empty responses and terminating requests early.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-07-30T18:47:48Z

Pinging @elastic/kibana-security (Team:Security)

elasticmachine · 2020-07-30T19:39:12Z

Pinging @elastic/kibana-app-arch (Team:AppArch)

BBQigniter · 2020-08-18T09:21:48Z

we see a similar problem - i tried the ProxyBadHeader Ignore suggestion but it didn't help. I also tried to set in the apache's ProxyPass line the addtional parameters retry=1 acquire=3000 timeout=600 KeepAlive=On

It happens for example every few clicks on the "view single document"-button at a message. sometimes it even says that there is no document with that id - a refresh then works. sometimes we also get following error-message "proxy error" with following log-output

_construct@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:466674
Wrapper@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:466064
_createSuper/<@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:464852
HttpFetchError@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:467831
_callee3$@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1292397
l@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:288:969217
s/o._invoke</<@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:288:968971
_/</e[t]@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:288:969574
asyncGeneratorStep@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1285920
_next@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286249
promise callback*asyncGeneratorStep@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286038
_next@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286249
promise callback*asyncGeneratorStep@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286038
_next@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286249
_asyncToGenerator/</<@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286386
_asyncToGenerator/<@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286143
fetchResponse@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1292730
_callee$@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1288895
l@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:288:969217
s/o._invoke</<@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:288:968971
_/</e[t]@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:288:969574
asyncGeneratorStep@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1285920
_next@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286249
promise callback*asyncGeneratorStep@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286038
_next@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286249
_asyncToGenerator/</<@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286386
_asyncToGenerator/<@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286143
_callee2$/</<@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1289593
_callee2$@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1288382
l@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:288:969217
s/o._invoke</<@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:288:968971
_/</e[t]@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:288:969574
asyncGeneratorStep@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1285920
_next@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286249
_asyncToGenerator/</<@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286386
_asyncToGenerator/<@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286143
Fetch/</<@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1289716
search@https://test-kibana.REMOVED/32141/bundles/plugin/data/data.plugin.js:6:47756
search/</<@https://test-kibana.REMOVED/32141/bundles/plugin/dataEnhanced/dataEnhanced.plugin.js:1:145473
__kbnSharedDeps__</d</t.prototype._tryNext@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:271:391171
__kbnSharedDeps__</d</t.prototype._next@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:271:391073
__kbnSharedDeps__</u</t.prototype.next@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:21:41156
l@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:271:1151083
__kbnSharedDeps__</o</t.prototype._execute@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:271:409061
__kbnSharedDeps__</o</t.prototype.execute@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:271:408877
__kbnSharedDeps__</o</t.prototype.flush@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:271:405111
setInterval handler*__kbnSharedDeps__</o</t.prototype.requestAsyncId@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:271:408580
__kbnSharedDeps__</o</t.prototype.schedule@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:271:408478
__kbnSharedDeps__</r</e.prototype.schedule@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:271:920108
__kbnSharedDeps__</o</t.prototype.schedule@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:271:404983
s/<@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:271:1150975

the apache access and error logs look like

# access_log
10.115.50.90 - my.user [18/Aug/2020:10:21:31 +0200] "POST /api/index_management/indices/reload HTTP/1.1" 502 341 "https://test-kibana.REMOVED/app/kibana" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:79.0) Gecko/20100101 Firefox/79.0" 689 536

# error_log
[Tue Aug 18 10:21:31.566807 2020] [proxy_http:error] [pid 26954:tid 139893653104384] (104)Connection reset by peer: [client 10.115.50.90:0] AH01102: error reading status line from remote server REPLACED.NAME:5601, referer https://test-kibana.REMOVED/app/kibana
[Tue Aug 18 10:21:31.566936 2020] [proxy:error] [pid 26954:tid 139893653104384] [client 10.115.50.90:0] AH00898: Error reading from remote server returned by /api/index_management/indices/reload, referer https://test-kibana.REMOVED/app/kibana

It also happens connecting directly to Kibana without using the in-between apache reverse-proxy

Happens with latest Firefox 79.0 and Chrome 84.0.4147.125 on windows 10

With ELK 6.8 we did not have those problems.

jporter-dev · 2020-08-18T16:39:17Z

@BBQigniter - I have been seeing the errors even after adding the ProxyBadHeader setting. I think there are two things happening here - some responses are blank with the missing header (ProxyBadHeader should resolve those), and some responses are being terminated prematurely by Kibana (connection reset by peer) - I think these are giving the remaining errors.

The errors are less common when setting ProxyBadHeader, but it could also be a placebo. This is a pretty glaring issue since errors popping up while using Kibana are not at all ideal.

BBQigniter · 2020-08-19T06:39:07Z

yes, those errors are super annoying and thankfully we only have those currently on our test-cluster where I'm testing the upgrade procedure.

Just upgraded whole cluster to version 7.9 - same errors :(

lizozom · 2020-08-23T11:50:41Z

@joshporter1 In terms of the search feature, unnecessary requests (for example when running a new search before the previous one had finished) get canceled, from the browser, using an AbortController.

It is typical for a request canceled in flight, to show an empty response body and no status code (status code 0).

Here is a command line example fetching and cancelling a request to some random online API in Chrome.
You can see that the body is empty and the request is marked ask canceled:

I was surprised however to find out that Firefox doesn't mark the request as canceled, but it's still the same scenario:

So anyway, the empty responses are part of a normal workflow. I wonder how we can work around this in a reverse proxy scenario however.

jporter-dev · 2020-08-23T17:04:36Z

@lizozom I kind of figured that's what was happening, the fact that Firefox wasn't showing it as such is a tad confusing. However, Kibana still shows errors in other browsers such as Chrome and Brave since the reverse proxy server returns 502's on some of the aborted requests.

I'm hoping to find a good solution to solve this at the reverse proxy layer but I've combed through the docs a few times and haven't found anything useful.

lizozom · 2020-08-24T10:16:31Z

Can you share some of the resources you've been looking into? I'm curious to try and do this setup, to see if I can reproduce the issue.

jporter-dev · 2020-08-24T14:10:04Z

Sure thing, I've mostly been digging through the mod_proxy docs at the options available for the Apache reverse proxy.

The ProxyBadHeader option is the only one that I found that had any impact, setting it to Ignore resolved the invalid header error message, but we are still left with the connection reset by peer message.

BBQigniter · 2020-08-25T07:46:35Z

Like I wrote in my first comment, I saw "proxy errors" and or "no document with that id" also while pointing my browser directly to Kibana without going through the reverse proxy. It only happens not that often and is hardly reproducable.

The next thing I sometimes experience is, that clicking on links etc. sometimes does not work correctly like in this gif.

This also happens with and without reverse proxy usage. If this happens I have to refresh the browser tab in Chrome or Firefox. Kibana 7.9

lizozom · 2020-08-25T11:03:57Z

@BBQigniter can you please explain what the gif should be showing?

BBQigniter · 2020-08-25T11:25:10Z

@lizozom sure. sry, if it's not clear

I opened Firefox and browsed to our test-kibana
I added the _id column to show only that in "Discover"
I expanded one of the events
I clicked on "View single document" and expect to show only that info
It showed me again "Discover" and also reset the the chosen column
After refreshing the page - everything works again as expected.

BBQigniter · 2020-08-25T13:45:31Z

I now found this wonderful blog https://newweaver.com/the-mystery-of-the-502-kibana-errors/

This guy describes EXACTLY the same problem!

/usr/share/kibana/src/core/server/http/http_tools.js function createServer starting at line 109

function createServer(serverOptions, listenerOptions) {
  const server = new _hapi.Server(serverOptions);
  server.listener.keepAliveTimeout = listenerOptions.keepaliveTimeout;
/**
 * fix proposed from https://newweaver.com/the-mystery-of-the-502-kibana-errors/
 */
  server.listener.headersTimeout = listenerOptions.keepaliveTimeout + 1000;
  server.listener.setTimeout(listenerOptions.socketTimeout);
  server.listener.on('timeout', socket => {
    socket.destroy();
  });
  server.listener.on('clientError', (err, socket) => {
    if (socket.writable) {
      socket.end(Buffer.from('HTTP/1.1 400 Bad Request\r\n\r\n', 'ascii'));
    } else {
      socket.destroy(err);
    }
  });
  return server;
}

and restarted Kibana. I now have that "fix" for a few minutes and didn't get any proxy errors for some time. Hopefully that's it.

jporter-dev · 2020-08-25T13:56:06Z

@BBQigniter thanks for that link - please keep me updated on if it does the trick!

BBQigniter · 2020-08-26T06:34:00Z

@joshporter1 it seems the fix really helps, no "proxy error" issues yet while clicking around. Only issues like in the gif and later comment described sometimes still happen.

in the linked blog the guy also mentions #44062 so if elastic updates the underlying nodejs to >v12.17.0 it should be fixed without the need to add the line to the function

lizozom · 2020-08-26T11:57:01Z

We constantly update nodejs versions, so I think its safe to assume this will be resolved soon 🙏

lizozom · 2020-10-05T16:54:55Z

@joshporter1 any updates on this? Can this be closed?

jporter-dev · 2020-10-05T17:06:04Z

@lizozom I applied the workaround "fix" but have not updated or removed the code to verify if it was solved in a future version. So as far as I can tell, it shouldn't be closed unless it was truly resolved without requiring users to alter the core code.

If the underlying nodejs version was upgraded then the issue should probably be re-evaluated.

kobelb · 2020-10-13T18:37:30Z

As far as I can tell, this bug is being caused by nodejs/node#27363. Once Kibana is upgraded to Node > 12.9.0, this will be resolved. There are also potential workarounds that are discussed later.

Node.js has three timeouts that apply to HTTP requests:

Idle socket timeout: how long the TCP socket itself can be idle, this is a sliding timeout.
Headers timeout: how long the client has to send their headers, this is a fixed timeout that is supposed to be restarted for every HTTP request.
Keepalive timeout: how long after a HTTP request has finished that the client has to begin sending TCP keepalives before the socket is destroyed, this is a sliding timeout.

There's a bug in Node.js < 12.9.0 where when multiple HTTP requests occur over the same TCP socket when keepalives are used, that the headers timeout is not reset correctly. This is what the headers timeout is supposed to control:

But with the bug, this is how it's used:

I've been able to replicate this behavior against a test Node.js HTTP server, and Kibana itself using the default timeouts and a basic test client, which is included below. This effectively means that Kibana is pretty much unusable currently behind a reverse-proxy that reuses TCP sockets.

const http = require('http');

const agent = new http.Agent({
    keepAlive: true,
    keepAliveMsecs: 30000
});

let n = 0;

const request = () => {
    const req = http.request({
        protocol: 'http:',
        host: 'localhost',
        port: 5601,
        path: '/',
        method: 'GET',
        agent,
    }, function (res) {
        let data = '';
        res.on('data', (chunk) => {
            data += chunk;{}
        });
        res.on('end', () => {
            console.log(`Done: [${res.statusCode}] - ${data}`);
            if (++n == 1) {
                setTimeout(request, 58000);
            }
        })
    });

    req.on('socket', (socket) => {
        socket.write("GET / HTTP/1.1\r\n");
        setTimeout(() => {
            socket.write(`user-agent: ${Math.random() * 1000}\r\n`);
        }, 1000);
        setTimeout(() => {
            socket.write(`user-agent: ${Math.random() * 1000}\r\n`);
        }, 2000);
        setTimeout(() => {
            socket.write('\r\n')
            req.end();
        }, 3000);
    });
    
    req.on('error', (error) => {
        console.error('error', error);
    });
};

request();

Potential workarounds

I don't really like any of these, but I'm interested in hearing your opinions @restrry.

Allow the end-user to configure the headers timeout

This would allow the end-user to work-around this bug when Kibana is behind a reverse-proxy, and set it to a value higher than the server.keepaliveTimeout. This is the work-around suggested in this issue, and in the blog-post. This leads to a nondeterministic behavior, where the first HTTP request which occurs over a TCP socket has a longer period of time to send their headers than subsequent HTTP requests. This is rather awkward because once we upgrade to Node 12.9.0, this behavior will change and headers timeouts will be used consistently and appropriately everywhere, potentially causing the value that was set by the users in the kibana.yml to need to be adjusted.

Automatically configure the headers timeout to be larger than the keepalive timeout

This would still have the nondeterministic behavior as above, but we at least don't end up violating end-users expectations when we do get to upgrade to Node 12.9.0. This will potentially lead to the headers timeout being set to a much larger value than it would be by default, minimizing the protection that it provides against the [slowloris attack](The headers timeout was originally added to prevent against the slowloris attack so we end up removing this protection. ).

Disable the headers timeout

It's possible to disable the headers timeout by setting this to 0. This removes the protection entirely against the slowloris attack. Removing this protection for users who aren't reusing TCP keepalives and reusing sockets is a potential change in expectation if they were relying on Kibana itself providing this protection.

mshustov · 2020-10-18T15:22:59Z

I read through the nodejs thread nodejs/node#27363 and it seems that:

Disable the headers timeout is not an option due to security reasons.
Allow the end-user to configure the headers timeout requires the end-users to identify the problem and find a recommended solution in the Kibana docs. Furthermore, we are going to update the nodejs version to the fixed v12.19 in Kibana v7.11, but we will have to maintain the headersTimeout setting till v8.0 for BWC reasons.
Automatically configure the headers timeout to be larger than the keepalive timeout seems to be an optimal solution: Kibana works out-of-the-box, and we can remove an internal workaround as soon as update nodejs version in the upcoming release. Regarding This will potentially lead to the headers timeout being set to a much larger value than it would be by default, minimizing the protection that it provides against the slowloris attack. From the official docs headersTimeout is 40sec (increased to 60sec in v12), while keepAlive timeout for Kibana is 120sec. The existing difference is not on the level of the order of magnitude to say we disable the protection against slowlorais attack completely.

@kobelb WDYT about this?

const headersTimeout = listenerOptions.keepaliveTimeout + 2*server.listener.headersTimeout;
server.listener.headersTimeout = headers Timeout;

kobelb · 2020-10-19T19:48:16Z

@restrry I think your suggestion is the "least bad" option that we have, unfortunately... I'll take it :)

mshustov · 2020-11-16T12:42:14Z

@lizozom Could you confirm this problem has been fixed by #61587 ?

lizozom · 2020-11-23T10:08:43Z

IMO yes, but I'd let @kobelb confirm as he did most of the research

kobelb · 2020-11-23T18:07:46Z

Everything is looking good from my perspective. I just tested out master and I'm seeing these timeouts working properly with TCP keepalives.

bhavyarm added Feature:Security/Authentication Platform Security - Authentication Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more! labels Jul 30, 2020

bhavyarm added the bug Fixes for quality problems that affect the customer experience label Jul 30, 2020

legrego added Team:AppArch and removed Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more! labels Jul 30, 2020

legrego removed the Feature:Security/Authentication Platform Security - Authentication label Jul 30, 2020

lukeelmers added Feature:Search Querying infrastructure in Kibana regression labels Jul 30, 2020

lizozom mentioned this issue Oct 5, 2020

Upgrade Node.js to version 12 #61587

Merged

joshdover assigned mshustov Oct 13, 2020

mshustov mentioned this issue Oct 18, 2020

Nodejs Regression #44062

Closed

mshustov mentioned this issue Oct 20, 2020

[KP] Fix Headers timeout issue #81140

Merged

2 tasks

mshustov closed this as completed Nov 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing HTTP status codes for cancelled search requests #73849

Missing HTTP status codes for cancelled search requests #73849

jporter-dev commented Jul 30, 2020 •

edited

Loading

elasticmachine commented Jul 30, 2020

elasticmachine commented Jul 30, 2020

BBQigniter commented Aug 18, 2020

jporter-dev commented Aug 18, 2020

BBQigniter commented Aug 19, 2020

lizozom commented Aug 23, 2020 •

edited

Loading

jporter-dev commented Aug 23, 2020

lizozom commented Aug 24, 2020

jporter-dev commented Aug 24, 2020

BBQigniter commented Aug 25, 2020

lizozom commented Aug 25, 2020

BBQigniter commented Aug 25, 2020 •

edited

Loading

BBQigniter commented Aug 25, 2020 •

edited

Loading

jporter-dev commented Aug 25, 2020

BBQigniter commented Aug 26, 2020

lizozom commented Aug 26, 2020 •

edited

Loading

lizozom commented Oct 5, 2020

jporter-dev commented Oct 5, 2020

kobelb commented Oct 13, 2020 •

edited

Loading

mshustov commented Oct 18, 2020 •

edited

Loading

kobelb commented Oct 19, 2020

mshustov commented Nov 16, 2020

lizozom commented Nov 23, 2020

kobelb commented Nov 23, 2020

Missing HTTP status codes for cancelled search requests #73849

Missing HTTP status codes for cancelled search requests #73849

Comments

jporter-dev commented Jul 30, 2020 • edited Loading

elasticmachine commented Jul 30, 2020

elasticmachine commented Jul 30, 2020

BBQigniter commented Aug 18, 2020

jporter-dev commented Aug 18, 2020

BBQigniter commented Aug 19, 2020

lizozom commented Aug 23, 2020 • edited Loading

jporter-dev commented Aug 23, 2020

lizozom commented Aug 24, 2020

jporter-dev commented Aug 24, 2020

BBQigniter commented Aug 25, 2020

lizozom commented Aug 25, 2020

BBQigniter commented Aug 25, 2020 • edited Loading

BBQigniter commented Aug 25, 2020 • edited Loading

jporter-dev commented Aug 25, 2020

BBQigniter commented Aug 26, 2020

lizozom commented Aug 26, 2020 • edited Loading

lizozom commented Oct 5, 2020

jporter-dev commented Oct 5, 2020

kobelb commented Oct 13, 2020 • edited Loading

Potential workarounds

Allow the end-user to configure the headers timeout

Automatically configure the headers timeout to be larger than the keepalive timeout

Disable the headers timeout

mshustov commented Oct 18, 2020 • edited Loading

kobelb commented Oct 19, 2020

mshustov commented Nov 16, 2020

lizozom commented Nov 23, 2020

kobelb commented Nov 23, 2020

jporter-dev commented Jul 30, 2020 •

edited

Loading

lizozom commented Aug 23, 2020 •

edited

Loading

BBQigniter commented Aug 25, 2020 •

edited

Loading

BBQigniter commented Aug 25, 2020 •

edited

Loading

lizozom commented Aug 26, 2020 •

edited

Loading

kobelb commented Oct 13, 2020 •

edited

Loading

mshustov commented Oct 18, 2020 •

edited

Loading