-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand following
documentation in ccr overview
#39936
Changes from 8 commits
def58a4
ec7a030
97aadd5
a5c4a1c
9caf363
6c7690e
5ebf4c1
730d48f
e47e26c
98ef96e
dde981a
e55399f
9027c86
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -17,7 +17,7 @@ follower index. This simplifies state management on the leader index and means | |||||
that {ccr} does not interfere with indexing on the leader index. | ||||||
|
||||||
[float] | ||||||
=== Configuring replication | ||||||
=== Initiating replication | ||||||
|
||||||
Replication can be configured in two ways: | ||||||
|
||||||
|
@@ -29,6 +29,41 @@ Replication can be configured in two ways: | |||||
|
||||||
NOTE: You must also <<ccr-requirements,configure the leader index>>. | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be helpful to point out that all of those tasks can be done via Kibana too and add a link to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have added an additional note. Let me know if that looks good. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added a new commit that mentions Kibana directly in the bullet points, since I couldn't add a suggestion on that section. |
||||||
When you initiate replication either manually or through an auto-follow pattern, the | ||||||
follower index is created on the local cluster. Once the follower index is created, | ||||||
the <<remote-recovery, remote recovery>> process copies all of the Lucene segment | ||||||
files from the remote cluster to the local cluster. | ||||||
|
||||||
When initiating following manually using the API, by default the recovery process | ||||||
Tim-Brooks marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
will be asynchronous in relationship to the | ||||||
Tim-Brooks marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
{ref}/ccr-put-follow.html[create follower request]. The request will return prior | ||||||
Tim-Brooks marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
to the <<remote-recovery, remote recovery>> process being completed. If you | ||||||
Tim-Brooks marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
would like to wait on the process to complete, you can use the | ||||||
`wait_for_active_shards` parameter to do so. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're right, nevermind! |
||||||
|
||||||
////////////////////////// | ||||||
|
||||||
[source,js] | ||||||
-------------------------------------------------- | ||||||
PUT /follower_index/_ccr/follow?wait_for_active_shards=1 | ||||||
{ | ||||||
"remote_cluster" : "remote_cluster", | ||||||
"leader_index" : "leader_index" | ||||||
} | ||||||
-------------------------------------------------- | ||||||
// CONSOLE | ||||||
// TESTSETUP | ||||||
// TEST[setup:remote_cluster_and_leader_index] | ||||||
|
||||||
[source,js] | ||||||
-------------------------------------------------- | ||||||
POST /follower_index/_ccr/pause_follow | ||||||
-------------------------------------------------- | ||||||
// CONSOLE | ||||||
// TEARDOWN | ||||||
|
||||||
////////////////////////// | ||||||
|
||||||
[float] | ||||||
=== The mechanics of replication | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would suggest putting this content into a separate page (akin to the "How it works" pages in the other sections), since it's quite low level for an overview. That can be done in a separate PR if necessary. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have added a task to the meta issue #35975 |
||||||
|
||||||
|
@@ -56,7 +91,7 @@ If a read request fails, the cause of the failure is inspected. If the | |||||
cause of the failure is deemed to be a failure that can be recovered from (for | ||||||
example, a network failure), the follower shard task enters into a retry | ||||||
loop. Otherwise, the follower shard task is paused and requires user | ||||||
intervention before the it can be resumed with the | ||||||
intervention before it can be resumed with the | ||||||
{ref}/ccr-post-resume-follow.html[resume follower API]. | ||||||
|
||||||
When operations are received by the follower shard task, they are placed in a | ||||||
|
@@ -69,6 +104,10 @@ limits, no additional read requests are sent by the follower shard task. The | |||||
follower shard task resumes sending read requests when the write buffer no | ||||||
longer exceeds its configured limits. | ||||||
|
||||||
NOTE: The intricacies of how operations are replicated from the leader are | ||||||
governed by settings that can be configured in the | ||||||
Tim-Brooks marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
{ref}/ccr-put-follow.html[create follower request]. | ||||||
Tim-Brooks marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
Mapping updates applied to the leader index are automatically retrieved | ||||||
as-needed by the follower index. | ||||||
|
||||||
|
@@ -102,9 +141,71 @@ Using these APIs in tandem enables you to adjust the read and write parameters | |||||
on the follower shard task if your initial configuration is not suitable for | ||||||
your use case. | ||||||
|
||||||
[float] | ||||||
=== Leader index retaining operations for replication | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the "mechanics of replication" section is turned into a separate page, I think this info about how to retain operations might be a good fit there too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have added a task to the meta issue #35975 |
||||||
|
||||||
If the follower is unable to replicate operations from a leader for a period of | ||||||
time, the following process can fail due to the leader lacking a complete history | ||||||
of operations necessary for replication. | ||||||
|
||||||
Operations replicated to the follower are identified using a sequence number | ||||||
generated when the operation was initially performed. Lucene segment files are | ||||||
occasionally merged in order to optimize searches and save space. When these | ||||||
merges occur, it is possible for operations associated with deleted or updated | ||||||
documents to be pruned during the merge. When the follower requests the sequence | ||||||
number for a pruned operation, the process will fail due to the operation missing | ||||||
on the leader. | ||||||
|
||||||
This scenario is not possible in an append-only workflow. As documents are never | ||||||
deleted or updated, the underlying operation will not be pruned. | ||||||
|
||||||
Elasticsearch attempts to mitigate this potential issue for update workflows using | ||||||
a Lucene feature called soft deletes. When a document is updated or deleted, the | ||||||
underlying operation is retained in the Lucene index for a period of time. This | ||||||
period of time is governed by the `index.soft_deletes.retention_lease.period` | ||||||
setting which can be <<ccr-requirements,configured on the leader index>>. | ||||||
|
||||||
When a follower initiates the index following, it acquires a retention lease from | ||||||
the leader. This informs the leader that it should not allow a soft delete to be | ||||||
pruned until either the follower indicates that it has received the operation or | ||||||
the lease expires. It is valuable to have monitoring in place to detect a follower | ||||||
replication issue prior to the lease expiring so that the problem can be remedied | ||||||
before the follower falls fatally behind. | ||||||
|
||||||
[float] | ||||||
=== Remedying a follower that has fallen behind | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this content could actually be helpful in the Troubleshooting section (i.e. https://www.elastic.co/guide/en/elastic-stack-overview/master/troubleshooting.html) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have added a task to the meta issue #35975 |
||||||
|
||||||
If a follower falls sufficiently behind a leader that it can no longer replicate | ||||||
operations this can be detected using the | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I presume this exception would be visible in Kibana too?:
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have made this change. @cjcenizal a leader falling behind if reported as a fatal exception in the stats api. Is that shown on the Kibana ui? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm going to tag in @jen-huang and @sebelga to answer your question, since I'm on PTO today and focusing on stack upgrade testing for the rest of the week. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tbrooks8 It doesn't appear we show |
||||||
{ref}/ccr-get-follow-stats.html[get follow stats API]. It will be reported as a | ||||||
`indices[].fatal_exception`. | ||||||
|
||||||
In order to restart the follower, the following process must be paused, the follower | ||||||
Tim-Brooks marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
index closed, and the create follower API called again. | ||||||
Tim-Brooks marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
["source","js"] | ||||||
---------------------------------------------------------------------- | ||||||
POST /follower_index/_ccr/pause_follow | ||||||
|
||||||
POST /follower_index/_close | ||||||
|
||||||
PUT /follower_index/_ccr/follow?wait_for_active_shards=1 | ||||||
{ | ||||||
"remote_cluster" : "remote_cluster", | ||||||
"leader_index" : "leader_index" | ||||||
} | ||||||
---------------------------------------------------------------------- | ||||||
// CONSOLE | ||||||
|
||||||
Calling the create follower API is a destructive action. All of the existing Lucene | ||||||
Tim-Brooks marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
segment files will be deleted on the follower cluster. The | ||||||
Tim-Brooks marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
<<remote-recovery, remote recovery>> process will be used to copy the Lucene segment | ||||||
Tim-Brooks marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
files from the leader again. After the follower index has been reinitialized, the | ||||||
Tim-Brooks marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
following process will be started again. | ||||||
Tim-Brooks marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
[float] | ||||||
=== Terminating replication | ||||||
|
||||||
You can terminate replication with the | ||||||
{ref}/ccr-post-unfollow.html[unfollow API]. This API converts a follower index | ||||||
to a regular (non-follower) index. | ||||||
to a regular (non-follower) index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about the intended change in meaning here. Is it meant to cover both configuring and starting CCR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess so. The actual
create follower
andauto follower
actions seem more associated with initiating replication (although you can configure certain things using parameters). That is why I made that change. Thoughts?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find "initiating" a bit unclear, so my vote would be simpler verbs like "configuring" or "setting up", but it's not a deal breaker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it back to
configuring