Provide time synchronization check between Kibana and ES #203700
Labels
enhancement
New value added to drive a business result
Team:Core
Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
We had a recent ResponseOps SDH where (we believe) a Kibana clock was out-of-sync with an ES clock. Looking at the Kibana logs, we saw the following messages, which was the big tip-off:
This is a message from TSDB, which maintains a "window" of time a document may be added (with it's associated time field). There's some discussion of the window, here: elastic/integrations#7345
In this case it appears Fleet wanted to add some metrics, with a timestamp of
06:03:27Z
(date presumably generated from Kibana), and the higher end of the TSDB window is05:42:54Z
(presumably time generated by ES). Off by almost 20 minutes. We believe in this case the Kibana clock was probably around 20m ahead of the ES clock. Fleet code here: https://github.com/pmuellr/kibana/blob/e7b63115bef5f56c6f6a3e913f120ff258f65dcf/x-pack/plugins/fleet/server/services/metrics/fleet_metrics_task.ts#L139-L175This message was an extremely valuable clue, because there typically aren't any direct clues when clocks get skewed. We just see unusual behavior, often in task manager. For example, https://github.com/elastic/sdh-kibana/issues/3309
So, thinking we should add some kind of check, presumably in core. I think the idea would be to check on an interval - maybe 5m - do some kind of simple query to ES that would return it's idea of the current date. Perhaps using the
date
header in the ES response, or runningES|QL
commandROW now=NOW()
. Maybe both, make sure they are "close enough"? Then compare to Kibana's time.We've seen differences of up to 7 seconds cause issues, so it's a really tight window, in terms of having to take into account ES and Kibana latency in processing the ES query and response.
In the end, I don't think we want to do anything but log this condition, as I believe most of Kibana will operate well with skewed clocks (though perhaps with some unexpected ES responses, depending on where dates are generated and then evaluated). But it would be invaluable in being able to identify this as a system configuration issue.
The text was updated successfully, but these errors were encountered: