-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor load balancing module #408
Comments
I'm convinced - composability is indeed pretty much only needed for token awareness, and we can afford dropping it in favor of specialized implementations. @piodul? @nemosupremo I'd appreciate your opinion as a user, since I remember that you implement your own load balancing policy. The kind of change described here will most likely be breaking backward compatibility, but I'd assume it would not require much changes to your already coded policies, rather some interface updates. @ultrabug I'd also appreciate your opinion as a user (and a humble laureate of multiple scylla user awards) in fact, all user opinions would be very much welcome. |
First of all thank you and bravo for this clear explanation @havaker , I'm in awe!
About composability: I'm all in favor to have the best implementations available straight from the driver. So I would also indeed drop it. The most advanced users who implement their own (which I'm not part of) should also be provided a comprehensive documentation guide.
When things go wrong, most database drivers are quite poor in exposing the context of their errors. I find that this discussion has the potential to allow exposing more about the actual plan and cluster topology as seen by the driver and I would definitely love to see that happening. Things like:
Yes please, please please 👍
Do we want to add backoff and retry strategies to this conversation or shall we have another one? |
@ultrabug feel free to state your thoughts/proposals in a separate issue. Then, when we change retry/backoff strategies, that issue can be marked as resolved. |
I agree with the proposal, this is a good direction. I must admit that I always found the composition aspect of the retry policies a bit confusing, so it's good that we are getting rid of it. If anybody wants to include the token- or dc-awareness aspect in their custom policy, then we can consider making appropriate helpers public - however, if we do that they become a part of the public interface and changing that interface will be painful after we reach 1.0, so it's not a straightforward decision.
Just want to point out one thing here - we should not rely solely on the cluster events. The data fetched during topology refresh should serve as a source of truth and we should call the |
|
For reference: I ran a very simple experiment which counts how many allocations were performed (https://github.com/psarna/trace_alloc) on one of our examples, parallel-prepared.rs, and current load balancing implementation indeed contributes quite a lot to a total number of allocations. While the original code, with token-aware load balancing enabled, needs ~10 mallocs per request, when you hardcode the code to not be token-aware, the number drops to ~6 per request, which is a 40% improvement. |
FYI it's possible to reduce nr. of allocations here to zero scylladb/scylla-go-driver#255 |
What does the load balancing module do?
Load balancing module in our driver is responsible for creating lists of nodes. We are calling this lists plans (query plans), because they define the order in which nodes will be contacted during query execution. A load balancing plan is calculated for every executed query.
Design goals
One of main design decisions that was made was to encapsulate load balancing behavior into composable policies. Composability in this case is the possibility to combine the behavior of different policies.
Polices we wanted to be able to implement were:
In addition to the above-mentioned, we wanted to support the creation of custom policies.
Current implementation overview
Plan representation
Right now, we are representing plans as:
By representing plans as iterators, there is no arbitrary limit to their length. The iterator does not have to keep the entire represented plan within itself (as opposed to e.g. using
pub type Plan<'a> = Vec<Arc<Node>>
). Elements of the plan can be calculated in a lazy way - only when they are definitely needed.The disadvantages of such representation appear when trying to compose policies. More on that later.
Load balancing policy trait
All load balancing policies implement following trait:
All load balancing policies have a
plan()
method. It is called every time a query is executed to calculate a query plan. A policy gets info about a query's statement fromstatement: &Statement
and usescluster: &ClusterData
to get info about cluster's topology and other metadata.Example policy
Round robin is a one of the simplest policies one can implement. Let's see how it fits this defined interfaces.
What is good about this implementation is that it is does not copy any slices, and scales really well with a node count (
O(1)
).Composition
One of the most important features of this driver is token-aware load balancing. This type of load balancing requires calculating replica groups for each query. This groups are then returned as plans.
It is often desired to combine the previously mentioned policies - token-aware and round-robin to distribute the load between replicas. How can this be done using our previously defined interfaces? What we would like to do is to have
TokenAwarePolicy
and aRoundRobinPolicy
. Than we would like to have a way to pipe the plan produced by the token-aware policy to the round-robin. There is no way for aLoadBalancingPolicy
to receive plan produced by some other policy. So how does it work now?Extra trait
In order for one policy to process plans constructed by the previous one, a new trait was introduced.
It's called child policy because it allows further processing of plans made by its parent. Here is an example implementation for round robin policy:
Composition example
This example shows a policy that can add token-awareness to another policy (
TokenAwarePolicy::child_policy
). In its plan method, it generates replica list and forwards this list to a child policy (if a given statement has a token).Current shortcomings
Replica sets calculation
Inaccessibility
Token aware load balancing module has all the code for replica sets calculation. It is not accessible from the outside, so writing a custom token-aware policy requires duplicating this code.
Volatility
Replica sets are calculated when executing each query; they are not cached anywhere. This can cause performance problems in a big clusters that use
NetworkTopologyStrategy
which is expensive to calculate.Composability
Cost
In the current implementation, composability is implemented using
ChildLoadBalancingPolicy
trait. Its methodapply_child_policy
has a signature that requires passing a plan as aVec<Arc<Node>>
. This causes cloningArc
s every time a plan is calculated, which can cause scalability issues. The decision to design the child policy interface in such way was made after analysis of the default policy stack: token-aware round robin. Token aware policy produced plans consisting of a few replicas, which had to be rotated by the round robin. Rotation is not a good transformation to do on iterators. To reduce its cost, it was decided (by me (; ) that the child policy plan argument will be aVec
.Unintuitive
if a user wants to write a custom policy
FilteringPolicy
and useRoundRobinPolicy
combined with it, there are two ways to do it.FilteringPolicy
can implementLoadBalancingPolicy
, useRoundRobinPolicy
as a child policy (asTokenAwarePolicy
does) and feed the round robin with filtered nodesFilteringPolicy
can implement aLoadBalancingPolicy
and not useRoundRobinPolicy
as a child policy (invokeRoundRobinPolicy::plan()
and wrap the resulting plan inIterator::filter
)First option will cause a lot of copying
Arc<Node>
, and can become slow if a cluster is big and the filtered node count low.Second option does not have this trap (because filtering is lazy and it does not causes traversal of the whole iterator). It is only possible because filtering can be applied after round robin though.
State
Policies cannot react to any cluster events (node up/down/removed/added). Because of it they cannot build their own cluster view, and have to rely on
ClusterData
view provided in aplan
method only. Reliance onClusterData
causes addition of fields only specific policies use.When writing
DcAwareRoundRobin
I wanted to speed up lookups of local nodes, so I've added a per-dc view of nodes toClusterData
. I then used this view inDcAwareRoundRobin::plan
. If I were to write aDcAwareRoundRobin
as a custom policy, without modifying anything in the driver, I would be at a disadvantage - there would be no way to return local nodes quickly without that per-dc view in aClusterData
.Proposed solution
Public replica sets calculation
Move replica sets calculation to another, public module.
Conduct benchmarks to see if it is worth to introduce caching.
Remove composability
Most of the users will end up with a token-aware [dc-aware] round robin load balancer. It makes sense to provide them with a high-quality implementation of such one which doesn't have to make any compromises imposed by composability. We wouldn't be the first to make such a decision - DataStax Java driver has already done so.
If we were to implement replica sets calculation caching, our default policy could even produce plans without any
Arc<_>
copying.React to events
Add a mechanism for load balancing policies to react to cluster events. E.g. add methods like
on_up(&self, node: Arc<Node>)
toLoadBalancingPolicy
trait.Restricting access to
ClusterData
and removing unnecessary fields would be good too.The text was updated successfully, but these errors were encountered: