Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate elastic search max bucket behaviour #1822

Closed
PSeitz opened this issue Jan 23, 2023 · 3 comments
Closed

Investigate elastic search max bucket behaviour #1822

PSeitz opened this issue Jan 23, 2023 · 3 comments

Comments

@PSeitz
Copy link
Contributor

PSeitz commented Jan 23, 2023

As reported here quickwit-oss/quickwit#812, term queries run into the aggregation bucket limit.

Investigate behavior of elastic search for term aggregation queries.

Opposed to other aggregations like HistogramAggregation, there is a half-way sensible upper limit due to the number of terms in the dictionary.
This upper limit may be circumvented with nested aggregations.

@PSeitz
Copy link
Contributor Author

PSeitz commented Jan 23, 2023

Elasticsearch has been changed to count the buckets for the bucket limit only in the reduce phase: elastic/elasticsearch#57042

PSeitz added a commit that referenced this issue Jan 23, 2023
postpone bucket size aggregation limit for terms aggregation
improve min_doc_count special use case 0, it loaded more texts from the
dict than segment_size limit

Note that we effectively check now that segment_size does not exceed the
bucket limit. segment_size is 10 * size. So requests would fail with
size 6500 for a bucket limit of 65000.

closes #1822
PSeitz added a commit that referenced this issue Jan 23, 2023
postpone bucket size aggregation limit for terms aggregation
improve min_doc_count special use case 0, it loaded more texts from the
dict than segment_size limit

Note that we effectively check now that segment_size does not exceed the
bucket limit. segment_size is 10 * size. So requests would fail with
size 6500 for a bucket limit of 65000.

closes #1822
@PSeitz
Copy link
Contributor Author

PSeitz commented Jan 24, 2023

After talking with @imotov, I got a better understanding on what elastic search is doing and how we should adjust our current behavior

  1. Add a memory circuit breaker. During collection, track the current memory consumption (estimate is fine) caused by aggregations and abort if exceeding a certain threshold
  2. max_bucket is only used to check the number of buckets before sending data back to the client. Probably when converting intermediate results to final results.

@PSeitz
Copy link
Contributor Author

PSeitz commented Mar 21, 2023

Partial Fixed by: #1942

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants