Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide time synchronization check between Kibana and ES #203700

Open
pmuellr opened this issue Dec 10, 2024 · 2 comments
Open

Provide time synchronization check between Kibana and ES #203700

pmuellr opened this issue Dec 10, 2024 · 2 comments
Labels
enhancement New value added to drive a business result Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@pmuellr
Copy link
Member

pmuellr commented Dec 10, 2024

We had a recent ResponseOps SDH where (we believe) a Kibana clock was out-of-sync with an ES clock. Looking at the Kibana logs, we saw the following messages, which was the big tip-off:

Error occurred while publishing Fleet metrics: ResponseError: illegal_argument_exception
  Root causes:
    illegal_argument_exception: the document timestamp      [2024-12-05T06:03:27.000Z] is outside of ranges
    of currently writable indices [[2024-08-25T00:54:49.000Z,2024-12-05T05:42:54.000Z]]

This is a message from TSDB, which maintains a "window" of time a document may be added (with it's associated time field). There's some discussion of the window, here: elastic/integrations#7345

In this case it appears Fleet wanted to add some metrics, with a timestamp of 06:03:27Z (date presumably generated from Kibana), and the higher end of the TSDB window is 05:42:54Z (presumably time generated by ES). Off by almost 20 minutes. We believe in this case the Kibana clock was probably around 20m ahead of the ES clock. Fleet code here: https://github.com/pmuellr/kibana/blob/e7b63115bef5f56c6f6a3e913f120ff258f65dcf/x-pack/plugins/fleet/server/services/metrics/fleet_metrics_task.ts#L139-L175

This message was an extremely valuable clue, because there typically aren't any direct clues when clocks get skewed. We just see unusual behavior, often in task manager. For example, https://github.com/elastic/sdh-kibana/issues/3309

So, thinking we should add some kind of check, presumably in core. I think the idea would be to check on an interval - maybe 5m - do some kind of simple query to ES that would return it's idea of the current date. Perhaps using the date header in the ES response, or running ES|QL command ROW now=NOW(). Maybe both, make sure they are "close enough"? Then compare to Kibana's time.

We've seen differences of up to 7 seconds cause issues, so it's a really tight window, in terms of having to take into account ES and Kibana latency in processing the ES query and response.

In the end, I don't think we want to do anything but log this condition, as I believe most of Kibana will operate well with skewed clocks (though perhaps with some unexpected ES responses, depending on where dates are generated and then evaluated). But it would be invaluable in being able to identify this as a system configuration issue.

@pmuellr pmuellr added enhancement New value added to drive a business result Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc labels Dec 10, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@rudolf
Copy link
Contributor

rudolf commented Dec 18, 2024

Core does a version check on an interval and we could easily inspect the date header in these responses to check for terrible clock skew. Because we never know what the response latency is there is some limitations in how accurate this could be, e.g. if the event loop is blocked it would appear like a clock skew problem which in fact would not be the case.

But I think warning of skew more than 60s would avoid most false positives and could be a helpful hint when debugging obscure behavior.

https://github.com/elastic/kibana/blob/main/packages/core/elasticsearch/core-elasticsearch-server-internal/src/version_check/ensure_es_version.ts#L162

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

No branches or pull requests

3 participants