Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand following documentation in ccr overview #39936

Merged
merged 13 commits into from
Mar 21, 2019

Conversation

Tim-Brooks
Copy link
Contributor

This commit expands the ccr overview page to include more information
about the lifecycle of following an index. It adds information linking
to the remote recovery documentation. And describes how an index can
fall-behind and how to fix it when this happens.

@Tim-Brooks Tim-Brooks added >docs General docs changes :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features v6.7.0 v8.0.0 v7.2.0 v7.0.0-beta1 labels Mar 11, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@Tim-Brooks
Copy link
Contributor Author

Personally I prefer this approach (expanding overview) to the separate page (introduction in #39768). But I am opening to hearing other opinions and we can take whatever if the consensus.

@jasontedor
Copy link
Member

I like this approach indeed.

@Tim-Brooks
Copy link
Contributor Author

I am closing the other PR in favor of this PR. The primary objection to this approach is that it makes the overview quite deep. However, the overview is already the place where we describe the replication/following lifecycle. If we want to split that up in the future, we can dedicate a different PR to that.

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good. I left a few nits, and one comment to think about.

docs/reference/ccr/overview.asciidoc Outdated Show resolved Hide resolved
docs/reference/ccr/overview.asciidoc Outdated Show resolved Hide resolved
When a follower initiates the index following, it acquires a retention lease from
the leader. This informs the leader that it should not allow a soft delete to be
pruned until either the follower indicates that it has received the operation or
the lease expires after `12 hours`. It is valuable to have monitoring in place to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the formatting on "12 hours" odd. Is there a reason that you chose to format it this way, as opposed to "twelve hours" (without being formatted as code) or 12h. Another concern I have is if we change the defaults from 12h to some other value and we would have to maintain all the pages in the docs. Maybe the best option is to add docs for the index.soft_deletes.retention_lease.period setting (or link to the CCR requirements page) and let the default be only specified there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I linked the requirements page.

@Tim-Brooks Tim-Brooks requested a review from jasontedor March 15, 2019 16:41
Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -17,7 +17,7 @@ follower index. This simplifies state management on the leader index and means
that {ccr} does not interfere with indexing on the leader index.

[float]
=== Configuring replication
=== Initiating replication
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about the intended change in meaning here. Is it meant to cover both configuring and starting CCR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so. The actual create follower and auto follower actions seem more associated with initiating replication (although you can configure certain things using parameters). That is why I made that change. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find "initiating" a bit unclear, so my vote would be simpler verbs like "configuring" or "setting up", but it's not a deal breaker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it back to configuring

@@ -29,6 +29,41 @@ Replication can be configured in two ways:

NOTE: You must also <<ccr-requirements,configure the leader index>>.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to point out that all of those tasks can be done via Kibana too and add a link to {kibana-ref}/working-remote-clusters.html#managing-cross-cluster-replication[Managing {ccr}]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added an additional note. Let me know if that looks good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new commit that mentions Kibana directly in the bullet points, since I couldn't add a suggestion on that section.

=== Remedying a follower that has fallen behind

If a follower falls sufficiently behind a leader that it can no longer replicate
operations this can be detected using the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume this exception would be visible in Kibana too?:

Suggested change
operations this can be detected using the
operations this can be detected in {kib} or by using the

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made this change. @cjcenizal a leader falling behind if reported as a fatal exception in the stats api. Is that shown on the Kibana ui?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to tag in @jen-huang and @sebelga to answer your question, since I'm on PTO today and focusing on stack upgrade testing for the rest of the week.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tbrooks8 It doesn't appear we show indicies[].fatal_exception in the UI. I've created a bug ticket for this: elastic/kibana#33628

before the follower falls fatally behind.

[float]
=== Remedying a follower that has fallen behind
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this content could actually be helpful in the Troubleshooting section (i.e. https://www.elastic.co/guide/en/elastic-stack-overview/master/troubleshooting.html)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a task to the meta issue #35975

@Tim-Brooks Tim-Brooks requested a review from lcawl March 20, 2019 22:27
Copy link
Contributor

@lcawl lcawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and builds successfully

@Tim-Brooks
Copy link
Contributor Author

thanks @lcawl

@Tim-Brooks Tim-Brooks merged commit 3d90fbc into elastic:master Mar 21, 2019
Tim-Brooks added a commit that referenced this pull request Mar 21, 2019
This commit expands the ccr overview page to include more information
about the lifecycle of following an index. It adds information linking
to the remote recovery documentation. And describes how an index can
fall-behind and how to fix it when this happens.
Tim-Brooks added a commit that referenced this pull request Mar 21, 2019
This commit expands the ccr overview page to include more information
about the lifecycle of following an index. It adds information linking
to the remote recovery documentation. And describes how an index can
fall-behind and how to fix it when this happens.
Tim-Brooks added a commit that referenced this pull request Mar 21, 2019
This commit expands the ccr overview page to include more information
about the lifecycle of following an index. It adds information linking
to the remote recovery documentation. And describes how an index can
fall-behind and how to fix it when this happens.
@Tim-Brooks Tim-Brooks deleted the combine_over_following branch December 18, 2019 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CCR Issues around the Cross Cluster State Replication features >docs General docs changes v6.7.1 v7.0.0-beta1 v7.2.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants