Posting fetch optimizations #6416

yeya24 · 2023-06-06T00:36:30Z

Problem Statement

When processing a series query on store gateway, the process looks like below for each block we need to query:

Fetch postings per label matcher, either from index cache or from objstore
Doing boolean operations for the postings we fetched, mainly AND (intersection), OR (merge). This is usually called posting expansion
After expanding postings, we fetch corresponding series and chunks.
... (skip as not important) for this issue

For example, we have a query like up{cluster="us", env="prod"}. It contains 3 matchers and let's assume we have a block with 1M time series.

name = "up", matches 100 series
cluster = "us", matches 100K series
env = "prod", matches 500K series

In this situation, each query we are trying to fetch postings for the 3 matchers so it is fetching (500K + 100K + 100) postings first and then we apply intersection. Since the number of expanded posting is bound by the highest selective matcher, the expanded posting might contain only 50 series but we end up fetching 600K+ postings.

A better way to do in this case might be fetching 100 series matching __name__ = "up" first then we decode the series labels and see if it matches {cluster="us"} and {env="prod"}.

Another situation might be the same query, but each matcher matches 100, 200 and 500K series respectively. In this situation, maybe we can do intersection on the first 2 matchers first, fetch series and do series matching on the last label matcher.

Last situation, each matcher might match 100, 200 and 300 series respectively and then we apply our current posting fetching logic.

Proposal

To address this problem, the first thing is to have an easy way to get cardinality per posting. Number of entries is already part of the index file so maybe we can make it part of the index header.

Once we have the cardinality info, we need a cost model to calculate what's the cost of each strategy and decide which we should pick. We need a cheap way to calculate the cost so that's why we need number of entries per posting in index header.

Regarding strategy implementation, it is still TBA. We can start simple by maybe hardcoding or default to one strategy.

The text was updated successfully, but these errors were encountered:

yeya24 · 2023-06-06T03:19:30Z

If we have the cardinality per posting, we can use an easy way to expand postings:

Start from the two smallest posting lists and calculate intersection
Continue doing intersection with next smallest posting list until done.

This sounds less work to do when calculating intersections as mentioned in https://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf

Edited: probably this is unnecessary and existing intersection algorithm is good enough. Need benchmarks.

fpetkovski · 2023-06-06T04:38:22Z

Is this related to #6357?

yeya24 · 2023-06-06T04:57:12Z

#6357 is tackling the problem from a different way IIUC.
It doesn't change how we fetch postings so we are still fetching postings from all matchers.

yeya24 changed the title ~~Posting fetching optimization strategy~~ Posting fetching optimizations Jun 6, 2023

yeya24 mentioned this issue Jun 6, 2023

Cache expanded postings #6417

Closed

yeya24 changed the title ~~Posting fetching optimizations~~ Posting fetch optimizations Jun 7, 2023

douglascamata added the component: store label Jun 12, 2023

yeya24 mentioned this issue Jun 22, 2023

Optimize postings fetching by checking postings and series size #6465

Merged

2 tasks

GiedriusS closed this as completed in #6465 Sep 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Posting fetch optimizations #6416

Posting fetch optimizations #6416

yeya24 commented Jun 6, 2023 •

edited

Loading

yeya24 commented Jun 6, 2023 •

edited

Loading

fpetkovski commented Jun 6, 2023

yeya24 commented Jun 6, 2023

Posting fetch optimizations #6416

Posting fetch optimizations #6416

Comments

yeya24 commented Jun 6, 2023 • edited Loading

Problem Statement

Proposal

yeya24 commented Jun 6, 2023 • edited Loading

fpetkovski commented Jun 6, 2023

yeya24 commented Jun 6, 2023

yeya24 commented Jun 6, 2023 •

edited

Loading

yeya24 commented Jun 6, 2023 •

edited

Loading