-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CCR] Introduce leader index name & last fetch time stats to stats api response #33155
[CCR] Introduce leader index name & last fetch time stats to stats api response #33155
Conversation
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment about using the relative time provider, so we would report how long it has been since the most recent fetch. In general, I think this changes like this would be better as two separate PRs, these are separate changes.
private final Queue<Translog.Operation> buffer = new PriorityQueue<>(Comparator.comparing(Translog.Operation::seqNo)); | ||
private final LinkedHashMap<Long, ElasticsearchException> fetchExceptions; | ||
|
||
ShardFollowNodeTask(long id, String type, String action, String description, TaskId parentTask, Map<String, String> headers, | ||
ShardFollowTask params, BiConsumer<TimeValue, Runnable> scheduler, final LongSupplier relativeTimeProvider) { | ||
ShardFollowTask params, BiConsumer<TimeValue, Runnable> scheduler, final LongSupplier relativeTimeProvider, | ||
LongSupplier currentTimeSupplier) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use the relativeTimeSupplier
and merely report the time (elapsed) since the last fetch.
@@ -89,6 +89,7 @@ protected void assertEqualInstances(final ShardFollowNodeTask.Status expectedIns | |||
anyOf(instanceOf(ElasticsearchException.class), instanceOf(IllegalStateException.class))); | |||
assertThat(entry.getValue().getCause().getMessage(), containsString(expected.getCause().getMessage())); | |||
} | |||
assertThat(newInstance.timeSinceLastFetch(), equalTo(expectedInstance.timeSinceLastFetch())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦♂️
@jasontedor I've updated the PR to use report time since last fetch instead of the timestamp the last fetch occurred. Next time I will split a change like this into 2 PRs. |
because that would be confusing for the consumer of the stats api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looking good, I left a comment.
timeSinceLastFetchMillis = TimeUnit.NANOSECONDS.toMillis(relativeTimeProvider.getAsLong() - lastFetchTime); | ||
} else { | ||
// To avoid confusion when ccr didn't yet execute a fetch: | ||
timeSinceLastFetchMillis = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it should be 0, I think it should be -1 so that we can distinguish a fetch just happened less than 1ms ago from a fetch just happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means a serialization change too, to support -1.
@jasontedor I've addressed that comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one important comment, the rest is cosmetic. We will get this in on the next iteration.
@@ -412,7 +416,21 @@ public ShardId getFollowShardId() { | |||
|
|||
@Override | |||
public synchronized Status getStatus() { | |||
final String leaderIndex; | |||
if (params.getLeaderClusterAlias() != null) { | |||
leaderIndex = params.getLeaderClusterAlias() + ":" + params.getLeaderShardId().getIndexName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should avoid re-computing this value every time we get status (think of a monitoring system polling the stats every second). We are creating unnecessary garbage on every poll for every shard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, that's a calculation done under a lock, a lock that synchronizes a lot of other methods in this class.
@@ -648,7 +679,14 @@ public long numberOfOperationsIndexed() { | |||
return fetchExceptions; | |||
} | |||
|
|||
private final long timeSinceLastFetchMillis; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for keeping these in the same order as the constructor parameters, etc. 🙏
Status( | ||
String leaderIndex, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be final
.
@@ -648,7 +679,14 @@ public long numberOfOperationsIndexed() { | |||
return fetchExceptions; | |||
} | |||
|
|||
private final long timeSinceLastFetchMillis; | |||
|
|||
public long timeSinceLastFetch() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add Millis
to the name.
Thanks @jasontedor. I've updated the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
* ccr: Fix ShardFollowNodeTask.Status equals and hash code (elastic#33189) Make soft-deletes settings final (elastic#33172) Only fetch mapping updates when necessary (elastic#33182)
No description provided.