Expand `following` documentation in ccr overview #39936

Tim-Brooks · 2019-03-11T19:14:41Z

This commit expands the ccr overview page to include more information
about the lifecycle of following an index. It adds information linking
to the remote recovery documentation. And describes how an index can
fall-behind and how to fix it when this happens.

…ocumentation

elasticmachine · 2019-03-11T19:14:43Z

Pinging @elastic/es-distributed

Tim-Brooks · 2019-03-11T19:20:54Z

Personally I prefer this approach (expanding overview) to the separate page (introduction in #39768). But I am opening to hearing other opinions and we can take whatever if the consensus.

jasontedor · 2019-03-12T02:20:47Z

I like this approach indeed.

Tim-Brooks · 2019-03-12T17:19:55Z

I am closing the other PR in favor of this PR. The primary objection to this approach is that it makes the overview quite deep. However, the overview is already the place where we describe the replication/following lifecycle. If we want to split that up in the future, we can dedicate a different PR to that.

jasontedor

This is looking good. I left a few nits, and one comment to think about.

docs/reference/ccr/overview.asciidoc

jasontedor · 2019-03-14T19:45:28Z

docs/reference/ccr/overview.asciidoc

+When a follower initiates the index following, it acquires a retention lease from
+the leader. This informs the leader that it should not allow a soft delete to be
+pruned until either the follower indicates that it has received the operation or
+the lease expires after `12 hours`. It is valuable to have monitoring in place to


I find the formatting on "12 hours" odd. Is there a reason that you chose to format it this way, as opposed to "twelve hours" (without being formatted as code) or 12h. Another concern I have is if we change the defaults from 12h to some other value and we would have to maintain all the pages in the docs. Maybe the best option is to add docs for the index.soft_deletes.retention_lease.period setting (or link to the CCR requirements page) and let the default be only specified there?

I linked the requirements page.

jasontedor

LGTM.

lcawl · 2019-03-19T20:29:10Z

docs/reference/ccr/overview.asciidoc

@@ -17,7 +17,7 @@ follower index. This simplifies state management on the leader index and means
 that {ccr} does not interfere with indexing on the leader index.

 [float]
-=== Configuring replication
+=== Initiating replication


I'm curious about the intended change in meaning here. Is it meant to cover both configuring and starting CCR?

I guess so. The actual create follower and auto follower actions seem more associated with initiating replication (although you can configure certain things using parameters). That is why I made that change. Thoughts?

I find "initiating" a bit unclear, so my vote would be simpler verbs like "configuring" or "setting up", but it's not a deal breaker.

I changed it back to configuring

lcawl · 2019-03-19T20:32:17Z

docs/reference/ccr/overview.asciidoc

@@ -29,6 +29,41 @@ Replication can be configured in two ways:

 NOTE: You must also <<ccr-requirements,configure the leader index>>.



I think it would be helpful to point out that all of those tasks can be done via Kibana too and add a link to {kibana-ref}/working-remote-clusters.html#managing-cross-cluster-replication[Managing {ccr}]

I have added an additional note. Let me know if that looks good.

I added a new commit that mentions Kibana directly in the bullet points, since I couldn't add a suggestion on that section.

docs/reference/ccr/overview.asciidoc

lcawl · 2019-03-19T21:10:13Z

docs/reference/ccr/overview.asciidoc

+=== Remedying a follower that has fallen behind
+
+If a follower falls sufficiently behind a leader that it can no longer replicate
+operations this can be detected using the


I presume this exception would be visible in Kibana too?:

Suggested change

operations this can be detected using the

operations this can be detected in {kib} or by using the

I have made this change. @cjcenizal a leader falling behind if reported as a fatal exception in the stats api. Is that shown on the Kibana ui?

I'm going to tag in @jen-huang and @sebelga to answer your question, since I'm on PTO today and focusing on stack upgrade testing for the rest of the week.

@tbrooks8 It doesn't appear we show indicies[].fatal_exception in the UI. I've created a bug ticket for this: elastic/kibana#33628

lcawl · 2019-03-19T21:11:39Z

docs/reference/ccr/overview.asciidoc

+before the follower falls fatally behind.
+
+[float]
+=== Remedying a follower that has fallen behind


I think this content could actually be helpful in the Troubleshooting section (i.e. https://www.elastic.co/guide/en/elastic-stack-overview/master/troubleshooting.html)

I have added a task to the meta issue #35975

docs/reference/ccr/overview.asciidoc

…owing

…bine_over_following

lcawl

LGTM and builds successfully

Tim-Brooks · 2019-03-21T18:07:50Z

thanks @lcawl

This commit expands the ccr overview page to include more information about the lifecycle of following an index. It adds information linking to the remote recovery documentation. And describes how an index can fall-behind and how to fix it when this happens.

Tim-Brooks added 5 commits March 6, 2019 13:06

WIP

def58a4

WIP

ec7a030

Changes

97aadd5

Merge remote-tracking branch 'upstream/master' into index_following_d…

a5c4a1c

…ocumentation

Changes

9caf363

Tim-Brooks added >docs General docs changes :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features v6.7.0 v8.0.0 v7.2.0 v7.0.0-beta1 labels Mar 11, 2019

Tim-Brooks requested review from martijnvg, ywelsch, jasontedor, dnhatn and lcawl March 11, 2019 19:14

Fix case

6c7690e

Tim-Brooks mentioned this pull request Mar 11, 2019

Add documentation about index following #39768

Closed

jasontedor reviewed Mar 14, 2019

View reviewed changes

Changes

5ebf4c1

Tim-Brooks requested a review from jasontedor March 15, 2019 16:41

jasontedor approved these changes Mar 16, 2019

View reviewed changes