Queries for aggregate values from a single large field is slow #1620

robskillington · 2019-05-09T14:35:35Z

With the new field matcher, we have sped up matching the documents matched by a query such as __name__ matching .* because it is turned into a field search for index query search execution.

A problem remains however that with queries for aggregate values for very frequently appearing fields (i.e. __name__) where literally each metric (index document) has the field, instead of being able to return the values by walking the FST values, we actually return the postings lists and walk each document to see if tag value for that document is new/exists already in our aggregate result.

We can special case this in index/block.go at the top of the block.Query method, for queries that are single field matchers we can just walk the FST of each segment we have and add them to the aggregate results and return that.

It will significantly speed up queries for single field queries that match a very high amount of documents.

The text was updated successfully, but these errors were encountered:

arnikola · 2019-05-11T17:22:41Z

I think we can do this in an easier/more generic way than special-casing the single matcher case; we could add a "MatchQuery" type to the AggregationOptions sent in to session.Aggregate(..); then can use a similar path, by editing execBlockAggregateQueryFn to take in the AggregationOption TermFilter, and then use that existing path with additional filtering?

prateek · 2019-05-11T20:51:15Z

I’d prefer the second path too (adding the casing in the aggregate path) but it’s a little more involved than the TermFilter. We can further avoid iterating all the fields in Field FST (because we have exactly one field). That will require an additional interface changes in m3ninx to restrict a range when getting an iterator from Fields() but should make it even faster.

prateek mentioned this issue May 11, 2019

[dbnode] Optimize index.Aggregate() for FieldQuery #1628

Merged

robskillington closed this as completed in #1628 May 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queries for aggregate values from a single large field is slow #1620

Queries for aggregate values from a single large field is slow #1620

robskillington commented May 9, 2019 •

edited

Loading

arnikola commented May 11, 2019

prateek commented May 11, 2019

Queries for aggregate values from a single large field is slow #1620

Queries for aggregate values from a single large field is slow #1620

Comments

robskillington commented May 9, 2019 • edited Loading

arnikola commented May 11, 2019

prateek commented May 11, 2019

robskillington commented May 9, 2019 •

edited

Loading