CCR status should show lagging / doc count #89991
Labels
:Distributed Indexing/CCR
Issues around the Cross Cluster State Replication features
>enhancement
Team:Distributed (Obsolete)
Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Description
Today, CCR stats gives a lot of low-level details, but that isn’t as useful as document count. At the end of the day, users want to know if the follower is keeping up with the leader and if there's a delay, what the differences are in document count, or how much time it will take for the followers to catch up with the leader. These should be exposed as simple user-friendly metrics without going through extra mental arithmetic.
follower_max_seq_no
andleader_max_seq_no
on a shard indicates some operation hasn't been processed.leader_global_checkpoint
andfollower_global_checkpoint
indicates some lag. Although we have seen cases where the checkpoint values report identical from_ccr/stats
while the doc counts are different. It seems somexpack/ccr/shard_follow_task
tasks could be stuck when the connection fails, and the global_checkpoints alone may not be a reliable source on lagging.Sync Lag (Ops)
according to [Monitoring] CCR UI kibana#23013 was described as following:leader_max_seq_no
subtracted against the delta of the max and minfollower_global_checkpoint
between the time period for each shard, then subtract those two from each other and take the max.N
, and then time how long it takes for the follower’s global checkpoint to be≥N
.Related: #86798
The text was updated successfully, but these errors were encountered: