-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate and fix unbalanced series between ingesters (no shuffle sharding) #4736
Comments
This comment is a redacted copy-paste of an internal ticket at Grafana Labs. Why are series unbalanced between ingesters?In the past I've analysed the correlation between the number of series per ingester and the number of tokens registered in the ring and I haven't found any correlation. There are two reasons for this:
So, I've computed the % of tokens owned by each ingester taking in account zone-awareness and RF=3 and there's an almost perfect correlation with the number of series (no shuffle sharding) per ingester: Summary: series are uneven because token ranges are unevenly spaced with computing it taking in account zone-awareness and RF=3. The spread of series per-ingester vs owned tokensWe observe a 24% spread between the ingester with lowest and highest number of series of an user with shuffle sharding disabled. The same 24% spread is measured in the number of tokens owned per ingester taking in account zone-awareness and RF=3, and a similar spread of 22% is measured in the number of tokens registered in the ring: Summary: zone-awareness and RF=3 does not add more spread than the initial one caused by uneven token ranges registered in the ring. Experiment with a simulated ring with perfectly evenly spaced tokensI've built a ring with perfectly evenly spaced tokens and then run the same analysis. Tokens are still perfectly spaced even after applying the RF=3 and zone-awareness: Summary: this is a counterproof that "zone-awareness and RF=3 does not add more spread than the initial one caused by uneven token ranges registered in the ring". Conclusion
References |
How much the "number of tokens per ingester" impact the spread between ingesters?I've run a simulation to check how the "tokens ownership spread" (which is the difference in % between the tokens owned by the lowest and highest ingester) change based on changing the number of tokens per ingester: How the spread changes (lower is better):
References |
Is
|
This weekend I played a bit with the tool created by @pracucci and extended it a bit in the following way:
Then I did some analyses to see :
I didn't get perfect results, but I am glad I did this exercise. In analysis A1 I calculated both "spreads" per-zone and for all ingesters. The later is pretty bad, but the former is actually good: depending from the combination of number of ingester - number of tokens per ingester we actually get either spread 0 or spred 0.5. The question is: which spreads should we be more concerned about: per-zone or in total? I guess the answer is both. This is also what Cassandra tries to achieve, and I would like to try to apply their-like algorithm (which is pretty complecated but I studied it a lot in the last days) and see how it works. The good thing with this approach is that we don't need a huge number of tokens per ingester. Some thoughts:
|
Increasing the number of tokens can mitigate the effect, especially if the current number of tokens is relatively small (64). |
Inspired by the analysis done by @pracucci , I have spent some time on searching how other companies faced similar problems. I was particularly curious about the solution provided by Cassandra (details). As a result of that research I have built a prototype for building better balanced rings, both with and without zone-awareness. The prototype is just a simulator and does NOT use the same Problem DescriptionStarting from a ring of tokens and from a set of instances with a certain token ownership, we want to enrich the ring with additional tokens and assign them to a new instance in such a way that token ownership of the new set of instances is as close as possible to the previous one. The main challenges here are:
Each token is owned by exactly one instance, and each instance owns the same fixed number of tokens ( Current Approach (Mimir-like)In the current approach (analyses done so far) the 3 challenges are addressed like this:
Proposed Approach (Cassandra-like)The approach used in my prototype addresses the 3 challenges like this:
Evaluation of an insertion of a candidate tokenWhen a candidate token is inserted in a ring of tokens, it might affect ownership of other tokens from the ring, as well as of the instances that own the affected tokens. Changes of a token are calculated as the difference between the standard deviation of the ownership of the token in the original ring enriched with the candidate token and the standard deviation of the ownership of the token in the original ring. Changes of an instance are calculated as a sum of changes of all the tokens owned by that instance. The sum of changes of all tokens and instances represents the evaluation of the candidate token. Adding a new instance to the tokenAs stated above, for each new instance
Preliminary ResultsRing generationIn this phase I wanted to generate different rings related to 66 instances and compare them. I considered 3 scenarios:
In each scenario I generated rings with different number of tokens per instance (4, 16, 64, 128, 256 and 512), first by using the random token selection (current approach) and then by using candidate token selection (new approach). For simplicity, I am sharing the results for scenario 3 only, and I am happy to share the rest of them. Optimal token ownership in this case is The first chart shows the comparison of standard deviations (average over 15 executed iterations) of both approaches for different number of tokens per instance. It can be seen that the standard deviation with lower number of tokens is much lower in the new approach, and that for a higher number of tokens they are close to each other. For example, the precision that is reached with 256 tokens in the current approach is similar to the one with 64 tokens in the new approach, while the precisions with 128 tokens are similar. You can also see the token ownership distributions of the 4 rings mentioned above (64 and 128 tokens per instance built with the 2 approaches). The new approach builds much more balanced rings comparing to the current one. Timeseries distributionsI have then used the 4 rings generated above to simulate the distribution of 10M random keys (simulating 10M timeseries). The charts below show how many of the 10M tokens replicated 3 times with zone-awareness enabled are owned by each series. In this case, optimal token ownership is The final chart I am sharing here is the distribution of 10M timeseries in the rings of 513 instances with 64 and 128 tokens each generated with the current and the new approach. We can notice that, having more tokens in the ring, the distribution became a bit more chaotic, but the standard deviation from the optimal token ownership ( A prototype for building better balanced rings |
Changes of ring topologyI have done an additional test: I started from a generated ring of 66 instances with a certain number of tokens per instance, and I collected all standard deviations from the current optimal ownership while I was:
Note: Next week I want to simulate removals of random instances (from a random zone) and consecutive addition of another instance in the same zone.
|
As mentioned in @pracucci's feedback, I am adding some additional comparisons:
|
Hey, hi, can you explain me why are we talking about zone-awareness & replication factor here? Maybe I'm missing the point, but I think that it only makes the conversation harder, without adding a lot of value to the problem statement. I think we should consider just one zone, say The only point where all zones matter is when designing an algorithm to choose the tokens, as unfortunately (or fortunately, if we keep adding features that use the ring) we use the same token space, so when deciding which tokens a new instance should take it should:
|
I had a chat with Yuri and had some ideas that I want to write down. My thoughts flowIn the simulations I see here, I see iterations of simulations based on different randomness seeds. This introduces an extra dimension to the problem to reason about and also a non-deterministic production behaviour. Why do we need randomness? Can we use the same seed all the time? Let's say we want 512 tokens per ingester. First I thought: can we make the ingester-zone-a-0 take the tokens Then I thought, well that is still suboptimal, I see two issues:
But, it's easy for us to make each ingester know it's index (0, 1, 2...) and zone (a, b, ic...). (I know it's easy in Kubernetes, and I guess it's easy on bare-metal setups with Ansible, etc.) So, what if the ingesters don't base their decision on what they see in the ring, but on what should be in the ring instead? When the But wait, the whole point of ingesters registering the tokens in the ring is to distributors know which series belong to each ingester, right? So, we still need to pass the tokens to the distributors. Well no, we only have to tell the distributors that we're TL;DR, my proposalWrite a function that assigns the optimal func tokens(c, idx, zone int) []uint32 {
// Whatever is the optimal distribution.
// Tested that doesn't generate collisions for 0 <= zone < 10 and 0 <= idx < 10000
// Naive approach, which of course will collide.
s := rand.NewSource(int64(idx))
out := make([]uint32, count)
for i := 0; i < count; i++ {
out[i] = uint32(s.Int63() + int64(zone)) // offset each zone by 1.
}
return out
} Now, each ingester registers itself in the ring with the index and zone and the number of tokens. All ingesters should have the same number of tokens (otherwise we should deal with a migration process, as what is optimal for 128 may not be for 512). Each distributor reads the available ingesters and rebuilds the ingester in-memory for the given amount of ingesters. This way we remove all the randomness from this process, we make the messages lightweight and easy to understand, and the simulation will be much easier to follow. |
I was thinking about generating the tokens for the first instance per zone like this, and then to add tokens for all consecutive instances in the middle between 2 existing tokens in the same zone: func tokens(tokenCount, zoneCount int) [][]uint32 {
tokensPerZone := make([][]uint32, 0, zonesCount)
for z := 0; z < zonesCount; z++ {
tokensPerZone = append(tokensPerZone, make([]uint32, 0, tokensPerInstanceCount))
for t := 0; t < tokensPerInstanceCount; t++ {
token := uint32(math.Pow(2, 32)*(1.0-(3.0*float64(t)+float64(zonesCount)))/float64(zonesCount*tokensPerInstanceCount)) - 1
tokensPerZone[z] = append(tokensPerZone[z], token)
}
slices.Sort(tokensPerZone[z])
}
return tokensPerZone
} |
Introduction of the Spread Minimizing ApproachThe Cassandra-like approach has been abandoned, and we have opted for a different one, called spread minimizing approach. Assuming that each instance owns
Some important properties of the spread minimizing approach are:
|
Preliminary Analyses (dedicated cells) of the Spread Minimizing ApproachWe have done some preliminary analysis of the spread, calculated as
Test 1Test 2Test 3 |
Out-of-order instance registrationWe have analyzed how the spread in the ring built by the spread minimizing approach changes when the instances are not registered with the ring in ascending, but in a random order. We have calculated the worse spreads that we could have in rings with For example, in the case of adding 3 instances (with ids Q: What does this high spread mean for us? The instance |
In this PR the
|
How can we migrate to the SpreadMinimizingTokenGenerator?Proposal 1 (working)A possible way for migrating ingesters from the default,
Proposal 2 (not working because of automated downscaling)Note: this proposal is possible only if there is no downscaling process (triggered by automated downscaling) going on. Similarly to Proposal 1, the steps need to be done zone-by-zone. Suppose we are migrating
|
@pracucci do you agree we should close this issue, since |
I do! |
While running Mimir in production (and Cortex before) we've always observed some unbalanced series between ingesters. This unbalance exists both when shuffle sharding is enabled and disabled, even if it's even more emphasised when shuffle sharding is enabled.
For example, the following Mimir cluster shows a 25% spread between the ingester with the lowest and higher number of series:
This issue tracks the investigation and discussion of how to fix it in the case shuffle sharding is not used. When shuffle sharding is used, it adds extra complexity to fix it, but I believe we first have to fix it for the "no shuffle sharding" case and then move to shuffle sharding (because shuffle sharding was designed on the expectation that series for a given tenant are evenly sharded between ingesters of its shard).
The text was updated successfully, but these errors were encountered: