elastic · Tim-Brooks · Mar 21, 2019 · Mar 6, 2019 · Mar 6, 2019 · Mar 10, 2019
diff --git a/docs/reference/ccr/overview.asciidoc b/docs/reference/ccr/overview.asciidoc
@@ -17,7 +17,7 @@ follower index. This simplifies state management on the leader index and means
 that {ccr} does not interfere with indexing on the leader index.
 
 [float]
-=== Configuring replication
+=== Initiating replication
 
 Replication can be configured in two ways:
 
@@ -29,6 +29,41 @@ Replication can be configured in two ways:
 
 NOTE: You must also <<ccr-requirements,configure the leader index>>.
 
+When you initiate replication either manually or through an auto-follow pattern, the
+follower index is created on the local cluster. Once the follower index is created,
+the <<remote-recovery, remote recovery>> process copies all of the Lucene segment
+files from the remote cluster to the local cluster.
+
+When initiating following manually using the API, by default the recovery process
+will be asynchronous in relationship to the
+{ref}/ccr-put-follow.html[create follower request]. The request will return prior
+to the <<remote-recovery, remote recovery>> process being completed. If you
+would like to wait on the process to complete, you can use the
+`wait_for_active_shards` parameter to do so.
-`wait_for_active_shards` parameter to do so.
+`wait_for_active_shards` parameter. For example:
-`wait_for_active_shards` parameter to do so.
+`wait_for_active_shards` parameter. For example:
+
+//////////////////////////
+
+[source,js]
+--------------------------------------------------
+PUT /follower_index/_ccr/follow?wait_for_active_shards=1
+{
+  "remote_cluster" : "remote_cluster",
+  "leader_index" : "leader_index"
+}
+--------------------------------------------------
+// CONSOLE
+// TESTSETUP
+// TEST[setup:remote_cluster_and_leader_index]
+
+[source,js]
+--------------------------------------------------
+POST /follower_index/_ccr/pause_follow
+--------------------------------------------------
+// CONSOLE
+// TEARDOWN
+
+//////////////////////////
+
 [float]
 === The mechanics of replication
 
@@ -56,7 +91,7 @@ If a read request fails, the cause of the failure is inspected. If the
 cause of the failure is deemed to be a failure that can be recovered from (for 
 example, a network failure), the follower shard task enters into a retry
 loop. Otherwise, the follower shard task is paused and requires user
-intervention before the it can be resumed with the
+intervention before it can be resumed with the
 {ref}/ccr-post-resume-follow.html[resume follower API].
 
 When operations are received by the follower shard task, they are placed in a
@@ -69,6 +104,10 @@ limits, no additional read requests are sent by the follower shard task. The
 follower shard task resumes sending read requests when the write buffer no
 longer exceeds its configured limits.
 
+NOTE: The intricacies of how operations are replicated from the leader are
+governed by settings that can be configured in the
+{ref}/ccr-put-follow.html[create follower request].
+
 Mapping updates applied to the leader index are automatically retrieved
 as-needed by the follower index.
 
@@ -102,9 +141,71 @@ Using these APIs in tandem enables you to adjust the read and write parameters
 on the follower shard task if your initial configuration is not suitable for
 your use case.
 
+[float]
+=== Leader index retaining operations for replication
+
+If the follower is unable to replicate operations from a leader for a period of
+time, the following process can fail due to the leader lacking a complete history
+of operations necessary for replication.
+
+Operations replicated to the follower are identified using a sequence number
+generated when the operation was initially performed. Lucene segment files are
+occasionally merged in order to optimize searches and save space. When these
+merges occur, it is possible for operations associated with deleted or updated
+documents to be pruned during the merge. When the follower requests the sequence
+number for a pruned operation, the process will fail due to the operation missing
+on the leader.
+
+This scenario is not possible in an append-only workflow. As documents are never
+deleted or updated, the underlying operation will not be pruned.
+
+Elasticsearch attempts to mitigate this potential issue for update workflows using
+a Lucene feature called soft deletes. When a document is updated or deleted, the
+underlying operation is retained in the Lucene index for a period of time. This
+period of time is governed by the `index.soft_deletes.retention_lease.period`
+setting which can be <<ccr-requirements,configured on the leader index>>.
+
+When a follower initiates the index following, it acquires a retention lease from
+the leader. This informs the leader that it should not allow a soft delete to be
+pruned until either the follower indicates that it has received the operation or
+the lease expires. It is valuable to have monitoring in place to detect a follower
+replication issue prior to the lease expiring so that the problem can be remedied
+before the follower falls fatally behind.
+
+[float]
+=== Remedying a follower that has fallen behind
+
+If a follower falls sufficiently behind a leader that it can no longer replicate
+operations this can be detected using the
-operations this can be detected using the
+operations this can be detected in {kib} or by using the
-operations this can be detected using the
+operations this can be detected in {kib} or by using the
+{ref}/ccr-get-follow-stats.html[get follow stats API]. It will be reported as a
+`indices[].fatal_exception`.
+
+In order to restart the follower, the following process must be paused, the follower
+index closed, and the create follower API called again.
+
+["source","js"]
+----------------------------------------------------------------------
+POST /follower_index/_ccr/pause_follow
+
+POST /follower_index/_close
+
+PUT /follower_index/_ccr/follow?wait_for_active_shards=1
+{
+  "remote_cluster" : "remote_cluster",
+  "leader_index" : "leader_index"
+}
+----------------------------------------------------------------------
+// CONSOLE
+
+Calling the create follower API is a destructive action. All of the existing Lucene
+segment files will be deleted on the follower cluster. The
+<<remote-recovery, remote recovery>> process will be used to copy the Lucene segment
+files from the leader again. After the follower index has been reinitialized, the
+following process will be started again.
+
 [float]
 === Terminating replication
 
 You can terminate replication with the
 {ref}/ccr-post-unfollow.html[unfollow API]. This API converts a follower index
-to a regular (non-follower) index.
+to a regular (non-follower) index.
diff --git a/docs/reference/ccr/requirements.asciidoc b/docs/reference/ccr/requirements.asciidoc
@@ -32,11 +32,13 @@ Whether or not soft deletes are enabled on the index. Soft deletes can only be
 configured at index creation and only on indices created on or after 6.5.0. The
 default value is `true`.
 
-`index.soft_deletes.retention.operations`::
+`index.soft_deletes.retention_lease.period`::
 
-The number of soft deletes to retain. Soft deletes are collected during merges
-on the underlying Lucene index yet retained up to the number of operations
-configured by this setting. The default value is `0`.
+The maximum period to retain a shard history retention lease before it is considered
+expired. Shard history retention leases ensure that soft deletes are retained during
+merges on the Lucene index. If a soft delete is merged away before it can be replicated
+to a follower the following process will fail due to incomplete history on the leader.
+The default value is `12h`.
 
 For more information about index settings, see {ref}/index-modules.html[Index modules].