Add documentation for replication and deployment in zones (#1532)

m3db · Apr 4, 2019 · 2d598d8 · 2d598d8
1 parent e16c41d
commit 2d598d8
Show file tree

Hide file tree

Showing 6 changed files with 57 additions and 1 deletion.
diff --git a/docs/operational_guide/availability_consistency_durability.md b/docs/operational_guide/availability_consistency_durability.md
@@ -10,7 +10,7 @@ Generally speaking, [the default and example configuration for M3DB](https://git
 Database operators who are using M3DB for workloads that require stricter consistency and durability guarantees should consider tuning the default configuration to better suit their use case.
 
 The rest of this document describes the various configuration options that are available to M3DB operators to make such tradeoffs.
-While reading it, we recommend reffering to [the default configuration file](https://github.com/m3db/m3/blob/master/src/dbnode/config/m3dbnode-all-config.yml) (which has every possible configuration value set) to see how the described values fit into M3DB's configuration as a whole.
+While reading it, we recommend referring to [the default configuration file](https://github.com/m3db/m3/blob/master/src/dbnode/config/m3dbnode-all-config.yml) (which has every possible configuration value set) to see how the described values fit into M3DB's configuration as a whole.
 
 ## Tuning for Performance and Availability
 

diff --git a/docs/operational_guide/replication_and_deployment_in_zones.md b/docs/operational_guide/replication_and_deployment_in_zones.md
@@ -0,0 +1,55 @@
+# Replication and Deployment in Zones
+
+## Overview
+
+M3DB supports both deploying across multiple zones in a region or deploying to a single zone with rack-level isolation. It can also be deployed across multiple regions for a global view of data, though both latency and bandwidth costs may increase as a result.
+
+### Replication
+
+A replication factor of at least 3 is highly recommended for any M3DB deployment, due to the consistency levels (for both reads and writes) that require quorum in order to complete an operation. For more information on consistency levels, see the documentation concerning [tuning availability, consistency and durability](availability_consistency_durability.md).
+
+M3DB will do its best to distribute shards evenly among the availability zones while still taking each individual node's weight into account, but if some of the availability zones have less available hosts than others then each host in that zone will be responsible for more shards than hosts in the other zones and will thus be subjected to heavier load.
+
+### Upgrading hosts in a deployment
+
+When an M3DB node is restarted it has to perform a bootstrap process before it can serve reads. During this time the node will continue to accept writes, but will not be available for reads.
+
+Obviously, there is also a small window of time during between when the process is stopped and then started again where it will also be unavailable for writes.
+
+## Deployment across multiple availability zones in a region
+
+For deployment in a region, it is recommended to set the `isolationGroup` host attribute to the name of the availability zone a host is in.
+
+In this configuration, shards are distributed among hosts such that each will not be placed more than once in the same availability zone. This allows an entire availability zone to be lost at any given time, as it is guaranteed to only affect one replica of data.
+
+For example, in a multi-zone deployment with four shards spread over three availability zones:
+
+![Replication Region](replication_region.png)
+
+Typically, deployments have many more than four shards - this is a simple example that illustrates how M3DB maintains availability while losing an availability zone, as two of three replicas are still intact.
+
+## Deployment in a single zone
+
+For deployment in a single zone, it is recommended to set the `isolationGroup` host attribute to the name of the rack a host is in or another logical unit that separates groups of hosts in your zone.
+
+In this configuration, shards are distributed among hosts such that each will not be placed more than once in the same defined rack or logical unit. This allows an entire unit to be lost at any given time, as it is guaranteed to only affect one replica of data.
+
+For example, in a single-zone deployment with three shards spread over four racks:
+
+![Replication Single Zone](replication_single_zone.png)
+
+Typically, deployments have many more than three shards - this is a simple example that illustrates how M3DB maintains availability while losing a single rack, as two of three replicas are still intact.
+
+## Deployment across multiple regions
+
+For deployment across regions, it is recommended to set the `isolationGroup` host attribute to the name of the region a host is in.
+
+As mentioned previously, latency and bandwidth costs may increase when using clusters that span regions.
+
+In this configuration, shards are distributed among hosts such that each will not be placed more than once in the same region. This allows an entire region to be lost at any given time, as it is guaranteed to only affect one replica of data.
+
+For example, in a multi-region deployment with four shards spread over five regions:
+
+![Replication Global](replication_global.png)
+
+Typically, deployments have many more than four shards - this is a simple example that illustrates how M3DB maintains availability while losing up to two regions, as three of five replicas are still intact.
diff --git a/docs/operational_guide/replication_global.png b/docs/operational_guide/replication_global.png
diff --git a/docs/operational_guide/replication_region.png b/docs/operational_guide/replication_region.png
diff --git a/docs/operational_guide/replication_single_zone.png b/docs/operational_guide/replication_single_zone.png
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -69,6 +69,7 @@ pages:
     - "M3DB on Kubernetes": "how_to/kubernetes.md"
     - "M3Query": "how_to/query.md"
   - "Operational Guides":
+    - "Replication and Deployment in Zones": "operational_guide/replication_and_deployment_in_zones.md"
     - "Tuning Availability, Consistency, and Durability": "operational_guide/availability_consistency_durability.md"
     - "Placement/Topology": "operational_guide/placement.md"
     - "Placement/Topology Configuration": "operational_guide/placement_configuration.md"