Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queries for aggregate values from a single large field is slow #1620

Closed
robskillington opened this issue May 9, 2019 · 2 comments · Fixed by #1628
Closed

Queries for aggregate values from a single large field is slow #1620

robskillington opened this issue May 9, 2019 · 2 comments · Fixed by #1628

Comments

@robskillington
Copy link
Collaborator

robskillington commented May 9, 2019

With the new field matcher, we have sped up matching the documents matched by a query such as __name__ matching .* because it is turned into a field search for index query search execution.

A problem remains however that with queries for aggregate values for very frequently appearing fields (i.e. __name__) where literally each metric (index document) has the field, instead of being able to return the values by walking the FST values, we actually return the postings lists and walk each document to see if tag value for that document is new/exists already in our aggregate result.

We can special case this in index/block.go at the top of the block.Query method, for queries that are single field matchers we can just walk the FST of each segment we have and add them to the aggregate results and return that.

It will significantly speed up queries for single field queries that match a very high amount of documents.

@arnikola
Copy link
Collaborator

I think we can do this in an easier/more generic way than special-casing the single matcher case; we could add a "MatchQuery" type to the AggregationOptions sent in to session.Aggregate(..); then can use a similar path, by editing execBlockAggregateQueryFn to take in the AggregationOption TermFilter, and then use that existing path with additional filtering?

@prateek
Copy link
Collaborator

prateek commented May 11, 2019

I’d prefer the second path too (adding the casing in the aggregate path) but it’s a little more involved than the TermFilter. We can further avoid iterating all the fields in Field FST (because we have exactly one field). That will require an additional interface changes in m3ninx to restrict a range when getting an iterator from Fields() but should make it even faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants