-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shard ingester queries. #3852
Shard ingester queries. #3852
Conversation
This is still experimental but already yield from 2x to 6x faster for short period queries. I'm still playing with it but I want to share how I do it early. Signed-off-by: Cyril Tovena <[email protected]>
This is promising, I'm getting exceptional result in our biggest cluster. |
Signed-off-by: Cyril Tovena <[email protected]>
This PR introduces Series Queries Sharding. It does not check the boundaries of ingesters data since I'm assuming grafana#3852 will be merge first. Signed-off-by: Cyril Tovena <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some suggestions which I tested locally but didn't want to push to your branch. They are based off my fork of this PR.
This does two things:
- Fixes the hashing calculations - notably maintaining that the internal hash space must be consistent. i.e. moving from 2 buckets to 4, the first bucket (0), covers the first 50% of the hash ring. Moving this same space into 4 buckets should actually map shards
(0_of_2) -> (0_of_4, 1_of_4)
instead of(0_of_2) -> (0_of_4, 2_of_4)
. - Adds logic for mapping any schema sharding factor into the relevant superset
[shard]
of any inverted index sharding factor. Notably, this would allow us to decouple the two. However, because the shard factors wouldn't need to be mutually divisible any more, we'd need to amend theLookup
,LabelNames
, andLabelValues
to also filter any returned values, ensuring we filter out any not belonging to the desired schema shard. Because this code is currently protected by thevalidateShard
function, we can safely defer this decision for now.
One thing which I didn't include but expect we'll need is to align the hash functions used in the Cortex Index with the one used here. Notably, this is not exposed from cortex and is currently hardcoded into the Entries
implementations. Since the chunk store is deprecated at this point, I expect we can copy that as it's not subject to change. The reason why we'll need to agree on a specific hash function here is because the querier queries both the store and the ingester for a particular shard. Our queries will quickly prove incorrect when series belong to different shards on each of these query paths.
Not sure about that. |
So I have an idea :) I'm going to change sharding in the store to align it with ingester see you soon ! |
Co-authored-by: Owen Diehl <[email protected]>
Co-authored-by: Owen Diehl <[email protected]>
Utimately we should have a storage that relies on fingerprint, but that's harder to change. Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
@owen-d I think this is ready I've aligned the shard computation in the ingester. This means they use the same type of hash. I would have prefer to keep it based of the fingerprint but it's hard to change in the storage. Something to keep in mind when we redesign the storage. |
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple nits, then LGTM! We'll need to eventually add a way to either enforce that schema configs are multiples of indexShards or find a way to refactor the inverted index to work with any arbitrary sharding factor.
Co-authored-by: Owen Diehl <[email protected]>
Co-authored-by: Owen Diehl <[email protected]>
Co-authored-by: Owen Diehl <[email protected]>
Co-authored-by: Owen Diehl <[email protected]>
I don't want to invest too much in this yet. Not sure what is the future of that code. |
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
* Shards Series API. This PR introduces Series Queries Sharding. It does not check the boundaries of ingesters data since I'm assuming #3852 will be merge first. Signed-off-by: Cyril Tovena <[email protected]> * Fixes tests sorting. Signed-off-by: Cyril Tovena <[email protected]>
This is still experimental but already yield from 2x to 6x faster for short period queries.
I'm still playing with it but I want to share how I do it early.
Signed-off-by: Cyril Tovena [email protected]