Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transport: load balancing module refactor #612

Merged
merged 32 commits into from
Mar 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
449d6c9
transport: iterator: do not exit function before error reporting
havaker Mar 3, 2023
59b8aed
transport: node: add NodeRef type alias
havaker Feb 12, 2023
e0bbba1
transport: locator: add TokenRing struct
cvybhu Aug 1, 2022
8af8931
transport: locator: add Replicas struct
havaker Dec 1, 2022
2507f74
transport: locator: add ReplicationInfo struct
cvybhu Aug 1, 2022
bd1496c
transport: locator: add PrecomputedReplicas struct
cvybhu Aug 1, 2022
18f6070
transport: locator: add ReplicaLocator struct
havaker Nov 15, 2022
48ea137
transport: locator: add test module
havaker Dec 3, 2022
672c2d9
transport: locator: add tests for replication_info module
havaker Jan 28, 2023
6baacb5
transport: locator: add tests for precomputed_replicas module
havaker Jan 28, 2023
4b88042
transport: locator: change rng in tests to be deterministic
havaker Mar 10, 2023
188d660
transport: cluster: add locator structs to ClusterData
cvybhu Aug 2, 2022
0db57c1
transport: cluster: remove unnecessary ClusterData fields
havaker Nov 24, 2022
9bc6db2
docs: remove mentions of load balancing
cvybhu Jul 29, 2022
9057022
transport: revert LatencyAware load balancing
wprzytula Jan 5, 2023
78f1e2e
transport: load_balancing: add a default policy
havaker Apr 26, 2022
48fd8a3
transport: execution_profile: use load_balancing::DefaultPolicy
havaker Apr 26, 2022
87d2725
transport: load_balancing: drop all tests
havaker Nov 14, 2022
049ec37
transport: load_balancing: remove policies other than the default one
havaker Apr 26, 2022
ed0b8b6
transport: load_balancing: remove ChildLoadBalancingPolicy trait
havaker Apr 26, 2022
b0e2d75
transport: load_balancing: rename Statement to RoutingInfo
havaker Mar 6, 2023
26c2cfc
transport: load_balancing: change interface
havaker Nov 17, 2022
971863a
transport: load_balancing: improve default policy
havaker Nov 21, 2022
c9a022d
default_policy: Warn about forbidden failover with SimpleStrategy
wprzytula Feb 20, 2023
9480d1c
transport: load_balancing: Node::is_alive() made stub - limit to is_e…
wprzytula Jan 10, 2023
073d57d
transport: load_balancing: test framework for DefaultPolicy
wprzytula Feb 1, 2023
f6a1712
transport: load_balancing: tests for DefaultPolicy
wprzytula Feb 1, 2023
a00227b
transport: load_balancing: add on_query_success/failure to the interface
wprzytula Jan 10, 2023
7508eaf
default_policy: Introduced pick_predicate
wprzytula Mar 15, 2023
d7174c9
transport: load_balancing: add latency-awareness to DefaultPolicy
wprzytula Jan 13, 2023
da2c23a
transport: load_balancing: tests for latency awareness in DefaultPolicy
wprzytula Mar 10, 2023
dac9456
docs: add load balancing module documentation
havaker Mar 9, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 1 addition & 4 deletions docs/source/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,10 +48,7 @@
- [UDT (User defined type)](data-types/udt.md)

- [Load balancing](load-balancing/load-balancing.md)
- [Round robin](load-balancing/robin.md)
- [DC Aware Round robin](load-balancing/dc-robin.md)
- [Token aware Round robin](load-balancing/token-robin.md)
- [Token aware DC Aware Round robin](load-balancing/token-dc-robin.md)
- [Default policy](load-balancing/default-policy.md)

- [Retry policy configuration](retry-policy/retry-policy.md)
- [Fallthrough retry policy](retry-policy/fallthrough.md)
Expand Down
12 changes: 2 additions & 10 deletions docs/source/execution-profiles/maximal-example.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ use scylla::query::Query;
use scylla::speculative_execution::SimpleSpeculativeExecutionPolicy;
use scylla::statement::{Consistency, SerialConsistency};
use scylla::transport::ExecutionProfile;
use scylla::transport::load_balancing::{DcAwareRoundRobinPolicy, TokenAwarePolicy};
use scylla::transport::load_balancing::DefaultPolicy;
use scylla::transport::retry_policy::FallthroughRetryPolicy;
use std::{sync::Arc, time::Duration};

Expand All @@ -19,15 +19,7 @@ let profile = ExecutionProfile::builder()
.serial_consistency(Some(SerialConsistency::Serial))
.request_timeout(Some(Duration::from_secs(30)))
.retry_policy(Box::new(FallthroughRetryPolicy::new()))
.load_balancing_policy(
Arc::new(
TokenAwarePolicy::new(
Box::new(
DcAwareRoundRobinPolicy::new("us_east".to_string())
)
)
)
)
.load_balancing_policy(Arc::new(DefaultPolicy::default()))
.speculative_execution_policy(
Some(
Arc::new(
Expand Down
2 changes: 1 addition & 1 deletion docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Although optimized for Scylla, the driver is also compatible with [Apache Cassan
* [Making queries](queries/queries.md) - Making different types of queries (simple, prepared, batch, paged)
* [Execution profiles](execution-profiles/execution-profiles.md) - Grouping query execution configuration options together and switching them all at once
* [Data Types](data-types/data-types.md) - How to use various column data types
* [Load balancing](load-balancing/load-balancing.md) - Load balancing configuration, local datacenters etc.
* [Load balancing](load-balancing/load-balancing.md) - Load balancing configuration
* [Retry policy configuration](retry-policy/retry-policy.md) - What to do when a query fails, query idempotence
* [Driver metrics](metrics/metrics.md) - Statistics about the driver - number of queries, latency etc.
* [Logging](logging/logging.md) - Viewing and integrating logs produced by the driver
Expand Down
47 changes: 0 additions & 47 deletions docs/source/load-balancing/dc-robin.md

This file was deleted.

160 changes: 160 additions & 0 deletions docs/source/load-balancing/default-policy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# DefaultPolicy

`DefaultPolicy` is the default load balancing policy in Scylla Rust Driver. It
can be configured to be datacenter-aware and token-aware. Datacenter failover
for queries with non-local consistency mode is also supported.

## Creating a DefaultPolicy

`DefaultPolicy` can be created only using `DefaultPolicyBuilder`. The
`builder()` method of `DefaultPolicy` returns a new instance of
`DefaultPolicyBuilder` with the following default values:

- `preferred_datacenter`: `None`
- `is_token_aware`: `true`
- `permit_dc_failover`: `false`
- `latency_awareness`: `None`

You can use the builder methods to configure the desired settings and create a
`DefaultPolicy` instance:

```rust
# extern crate scylla;
# fn test_if_compiles() {
use scylla::load_balancing::DefaultPolicy;

let default_policy = DefaultPolicy::builder()
.prefer_datacenter("dc1".to_string())
.token_aware(true)
.permit_dc_failover(true)
.build();
# }
```

### Semantics of `DefaultPolicy`

#### Preferred Datacenter

The `preferred_datacenter` field in `DefaultPolicy` allows the load balancing
policy to prioritize nodes based on their location. When a preferred datacenter
is set, the policy will treat nodes in that datacenter as "local" nodes, and
nodes in other datacenters as "remote" nodes. This affects the order in which
nodes are returned by the policy when selecting replicas for read or write
operations. If no preferred datacenter is specified, the policy will treat all
nodes as local nodes.

When datacenter failover is disabled (`permit_dc_failover` is set to
false), the default policy will only include local nodes in load balancing
plans. Remote nodes will be excluded, even if they are alive and available to
serve requests.

#### Datacenter Failover

In the event of a datacenter outage or network failure, the nodes in that
datacenter may become unavailable, and clients may no longer be able to access
the data stored on those nodes. To address this, the `DefaultPolicy` supports datacenter
failover, which allows to route requests to nodes in other datacenters if the
local nodes are unavailable.

Datacenter failover can be enabled in `DefaultPolicy` by `permit_dc_failover`
setting in the builder. When this flag is set, the policy will prefer to return
alive remote replicas if datacenter failover is permitted and possible due to
consistency constraints.

#### Token awareness

Token awareness refers to a mechanism by which the driver is aware of the token
range assigned to each node in the cluster. Tokens are assigned to nodes to
partition the data and distribute it across the cluster.

When a user wants to read or write data, the driver can use token awareness to
route the request to the correct node based on the token range of the data
being accessed. This can help to minimize network traffic and improve
performance by ensuring that the data is accessed locally as much as possible.

In the case of `DefaultPolicy`, token awareness is enabled by default, meaning
that the policy will prefer to return alive local replicas if the token is
available. This means that if the client is requesting data that falls within
the token range of a particular node, the policy will try to route the request
to that node first, assuming it is alive and responsive.

Token awareness can significantly improve the performance and scalability of
applications built on Scylla. By using token awareness, users can ensure that
data is accessed locally as much as possible, reducing network overhead and
improving throughput.

Please note that for token awareness to be applied, a statement must be
prepared before being executed.

### Latency awareness

Latency awareness is a mechanism that penalises nodes whose measured recent
average latency classifies it as falling behind the others.

Every `update_rate` the global minimum average latency is computed,
and all nodes whose average latency is worse than `exclusion_threshold`
times the global minimum average latency become penalised for
`retry_period`. Penalisation involves putting those nodes at the very end
of the query plan. As it is often not truly beneficial to prefer
faster non-replica than replicas lagging behind the non-replicas,
this mechanism may as well worsen latencies and/or throughput.

> **Warning**
>
> Using latency awareness is **NOT** recommended, unless prior
>benchmarks prove its beneficial impact on the specific workload's
>performance. Use with caution.
### Creating a latency aware DefaultPolicy

```rust
# extern crate scylla;
# fn example() {
use scylla::load_balancing::{
LatencyAwarenessBuilder, DefaultPolicy
};
use std::time::Duration;

let latency_awareness_builder = LatencyAwarenessBuilder::new()
.exclusion_threshold(3.)
.update_rate(Duration::from_secs(3))
.retry_period(Duration::from_secs(30))
.minimum_measurements(200);

let policy = DefaultPolicy::builder()
// Here further customisation is, of course, possible.
// e.g.: .prefer_datacenter(...)
.latency_awareness(latency_awareness_builder)
.build();
# }
```

```rust
# extern crate scylla;
# fn test_if_compiles() {
use scylla::load_balancing::DefaultPolicy;

let default_policy = DefaultPolicy::builder()
.prefer_datacenter("dc1".to_string())
.token_aware(true)
.permit_dc_failover(true)
.build();
# }
```

### Node order in produced plans

The DefaultPolicy prefers to return nodes in the following order:

1. Alive local replicas (if token is available & token awareness is enabled)
2. Alive remote replicas (if datacenter failover is permitted & possible due to consistency constraints)
3. Alive local nodes
4. Alive remote nodes (if datacenter failover is permitted & possible due to consistency constraints)
5. Enabled down nodes
And only if latency awareness is enabled:
6. Penalised: alive local replicas, alive remote replicas, ... (in order as above).

If no preferred datacenter is specified, all nodes are treated as local ones.

Replicas in the same priority groups are shuffled. Non-replicas are randomly
rotated (similarly to a round robin with a random index).
Loading