Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the default refresh_interval a sensible default for Observability data? #78776

Open
jpountz opened this issue Oct 6, 2021 · 5 comments
Open
Labels
>enhancement :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@jpountz
Copy link
Contributor

jpountz commented Oct 6, 2021

With the default refresh interval, either a shard is considered search-active and it gets refreshed every second, or it is not and it doesn't get refreshed. A shard is considered search-active if it has received a search request in the last 30 seconds.

I wonder if this default makes sense for Observability data. I believe that the intuition behind this default is that there might be someone starting a search session in Kibana that involves loading dashboards, Discover, etc. So when we notice that someone starts a search session, we start refreshing every second so that all searches see recent data and don't incur the cost of running the refresh (except for the first request).

But there are other usage patterns. Maybe someone is using Kibana to display dashboards on big screens in a war room. If they do this with Kibana refreshing dashboards every 10 seconds, then the shard will be considered search-active all the time and 9 refreshes out of 10 that Elasticsearch performs will be useless. This can be a big deal given how frequent refreshes hurt the indexing rate.

Another usage pattern is alerting. There are some data streams that rarely get queried, except by Alerting every 5 minutes by default. In that case, every 5 minutes Elasticsearch would refresh the shard every second for 30 seconds before no longer refreshing. This gives the worst of both worlds: the first request incurs the cost of running the refresh as part of executing the search, and 29 out of the 30 refreshes are unnecessary.

I wonder if we should consider alternatives for the default refresh, such as refreshing as part of the search request unless there was already a refresh in the past second. This would make search a bit slower in some cases, but this would also significantly reduce the impact of searches on indexing. Or maybe there are even better approaches?

@jpountz jpountz added >enhancement team-discuss :Search/Search Search-related issues that do not fall into other categories labels Oct 6, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Oct 7, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@ywelsch
Copy link
Contributor

ywelsch commented Oct 25, 2021

I agree that a single request to an index-heavy search-idle index resulting in the index refreshing 30 times for the next 30 seconds would not be a good fit for the cases that you mentioned.

I was thinking about two complementary solutions:

  • Allow configuring refresh behavior at a more fine-granular level (i.e. allow coupling the search-idle behavior with longer refresh intervals, e.g. 5 seconds for observability data). Observability could then ship with a different set of defaults for these settings.
  • Avoid refreshes in the first place. For observability data, it's less important to have data that is extremely real-time, and data a couple of seconds old would be good enough. Search requests could specify that they don't require the absolute latest data by specifying an upper time range bound of "now-5s" and Elasticsearch would track min/max timestamp values of data that was indexed but not refreshed, allowing it forgo refreshes in case where the requested time range would not match any data in the indexing buffer.

@jpountz
Copy link
Contributor Author

jpountz commented Oct 27, 2021

Allow configuring refresh behavior at a more fine-granular level

Right, my intuition was that we could keep refresh_interval's semantics about how stale a point-in-time view of the index is allowed to be and introduce a separate parameter about whether refreshes should be performed lazily, whenever a search request comes in and the data is not fresh enough, or eagerly on a schedule. I'm unsure if there is a true use-case for the current default behavior where you might either pay the cost of the refresh as part of the search request or not depending on how long the shard has been search-idle before your request?

Elasticsearch would track min/max timestamp values of data that was indexed but not refreshed

Ohhh I like this idea.

@StephanErb
Copy link

Has there been any further discussion or decision on how refresh_interval will be used for observability data, especially with TSDB indices?

Some recent work by @martijnvg on #95776 goes towards optimizing refreshes during search on search-idle indices (#95544, #95541). However I think in practice most observability users will have to run with a custom refresh interval that is much higher (i.e. 10s of seconds, in the ballpark of metricset period) to make the ingestion cost effective. Having the ability to couple the search-idle behaviour with longer refresh intervals could thus be really helpful.

@javanna javanna added :Search Foundations/Search Catch all for Search Foundations and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@elasticsearchmachine elasticsearchmachine added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

7 participants