-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: CommandRunner metric has correct metric displayed when thread dies #4653
fix: CommandRunner metric has correct metric displayed when thread dies #4653
Conversation
3a80fc6
to
6d5be45
Compare
if (currentCommand == null) { | ||
return CommandRunnerStatus.RUNNING; | ||
return lastPollTime.get() == null || | ||
Duration.between(lastPollTime.get(), clock.instant()).toMillis() | ||
< NEW_CMDS_TIMEOUT.toMillis() * 3 | ||
? CommandRunnerStatus.RUNNING : CommandRunnerStatus.ERROR; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just get rid of this and set the status to ERROR if we haven't polled in a while?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only start polling after we processPriorCommands
so this could still be useful in the case of the CommandRunner getting stuck on start up.
6d5be45
to
9251d03
Compare
aef4638
to
310350d
Compare
private final Duration commandRunnerHealthTimeout; | ||
private final Clock clock; | ||
|
||
public enum CommandRunnerStatus { | ||
RUNNING, | ||
STUCK, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this additional status? Once it's here we have to support it forever. Why is it not enough to know that the runner is up or down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it would be helpful to be able to tell the difference between if the CommandRunner thread is stuck on a particular Command or the thread itself has died.
If we do force the server to shut down whenever the CommandRunner thread dies, then this additional status won't be needed.
310350d
to
b69ad12
Compare
5cb1af0
to
11c0849
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Fixes: #4652
Add a new variable to CommandRunner that keeps track of the last time the command topic was polled. If there's too long of a duration between the current time when
CommandRunner.checkCommandRunnerStatus
is called and the last poll, that means the thread is inERROR
state.Testing done
Killed the command runner thread and watched the metric. It transitioned to error state after 15 seconds.
Reviewer checklist