-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is the default refresh_interval
a sensible default for Observability data?
#78776
Comments
Pinging @elastic/es-search (Team:Search) |
I agree that a single request to an index-heavy search-idle index resulting in the index refreshing 30 times for the next 30 seconds would not be a good fit for the cases that you mentioned. I was thinking about two complementary solutions:
|
Right, my intuition was that we could keep
Ohhh I like this idea. |
Has there been any further discussion or decision on how Some recent work by @martijnvg on #95776 goes towards optimizing refreshes during search on search-idle indices (#95544, #95541). However I think in practice most observability users will have to run with a custom refresh interval that is much higher (i.e. 10s of seconds, in the ballpark of metricset period) to make the ingestion cost effective. Having the ability to couple the search-idle behaviour with longer refresh intervals could thus be really helpful. |
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
With the default refresh interval, either a shard is considered search-active and it gets refreshed every second, or it is not and it doesn't get refreshed. A shard is considered search-active if it has received a search request in the last 30 seconds.
I wonder if this default makes sense for Observability data. I believe that the intuition behind this default is that there might be someone starting a search session in Kibana that involves loading dashboards, Discover, etc. So when we notice that someone starts a search session, we start refreshing every second so that all searches see recent data and don't incur the cost of running the refresh (except for the first request).
But there are other usage patterns. Maybe someone is using Kibana to display dashboards on big screens in a war room. If they do this with Kibana refreshing dashboards every 10 seconds, then the shard will be considered search-active all the time and 9 refreshes out of 10 that Elasticsearch performs will be useless. This can be a big deal given how frequent refreshes hurt the indexing rate.
Another usage pattern is alerting. There are some data streams that rarely get queried, except by Alerting every 5 minutes by default. In that case, every 5 minutes Elasticsearch would refresh the shard every second for 30 seconds before no longer refreshing. This gives the worst of both worlds: the first request incurs the cost of running the refresh as part of executing the search, and 29 out of the 30 refreshes are unnecessary.
I wonder if we should consider alternatives for the default refresh, such as refreshing as part of the search request unless there was already a refresh in the past second. This would make search a bit slower in some cases, but this would also significantly reduce the impact of searches on indexing. Or maybe there are even better approaches?
The text was updated successfully, but these errors were encountered: