-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calc follower vs leader indexing lag based on shard global checkpoints #104015
Conversation
Documentation preview: |
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I left a couple of nits/suggestions
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ccr/action/FollowStatsAction.java
Outdated
Show resolved
Hide resolved
@@ -74,6 +74,8 @@ task. In this situation, the following task must be resumed manually with the | |||
|
|||
`index`:: | |||
(string) The name of the follower index. | |||
`follower_to_leader_lagging_ops_count`:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming nit: how about total_global_checkpoint_lag
? The follower_to_leader
bit seems redundant since this is the CCR stats API, and I think it would be helpful to include global_checkpoint
in the name in case we decide to add some other metrics in future.
Also you're missing a blank line above this item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"total_global_checkpoint_lag" sounds good, thx
@@ -219,6 +221,7 @@ The API returns the following results: | |||
"indices" : [ | |||
{ | |||
"index" : "follower_index", | |||
"follower_to_leader_lagging_ops_count" : 256, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"follower_to_leader_lagging_ops_count" : 256, | |
"lag_ops_count" : 256, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will use "total_global_checkpoint_lag" as per suggestion above
(builder, params) -> builder.startObject().field("index", indexEntry.getKey()).startArray("shards") | ||
(builder, params) -> builder.startObject() | ||
.field("index", indexEntry.getKey()) | ||
.field("follower_to_leader_lagging_ops_count", calcFollowerToLeaderLaggingOps(indexEntry.getValue())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised that this is per index, not per shard. Any reason why it deviates from the rest of the stats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reason is be an accumulative stat across all shards per index. main motto is stated by parent issue:
These should be exposed as simple user-friendly metrics without going through extra mental arithmetic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also looks good to me, and nice that it is as simple as this! Will approve after handling reviewer comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. One suggestion for the documentation.
Co-authored-by: Iraklis Psaroudakis <[email protected]>
Calculate follower vs leader indexing lag based on shard global checkpoints. Calculation based on doc stats after doing a shard refresh is not desired - it can significantly slow down overall process
Closes #89991
Sample API output: