-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
concurrent-api: save the timestamp of the SingleToFuture.get() calls #3051
concurrent-api: save the timestamp of the SingleToFuture.get() calls #3051
Conversation
Motivation: It can be very difficult to know if a thread is slow or completely stuck without taking multiple thread dumps and comparing them. Modifications: Try to mitigate this by saving the time stamp of the last blocking `.get()` call. This should then show up in heap dumps and we can at least compare them to see if they've all started blocking at about the same time or if there is a big spread. The latter would suggest they are truly stuck.
@@ -45,6 +45,9 @@ abstract class SourceToFuture<T> implements Future<T> { | |||
@Nullable | |||
private volatile Object value; | |||
|
|||
// The timestamp of the last `.get()` call. This is intended to help with debugging stuck threads via heap dumps. | |||
private long lastGetTimestamp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me if this needs some stronger ordering semantics or not. I imagine there is some sort of store guarantee from latch.await()
but I'm not certain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current approach lgtm, we should not have cases when multiple threads trying to invoke get()
at the same future. Most cases call get()
once and if they need to "retry" (in case of get timeout), they likely gonna do it from the same thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative implementation might be a bit more desirable: #3052. The benefits of that PR are that each call to future.get()
is tracked but only if we're using the BlockingUtils
, which should narrow down the usage to blocking client and server operations. It technically an allocate another object (which is likely a drop in the ocean) but that object should be trivial to stack allocate so chances are good it doesn't have any allocation overhead.
@@ -45,6 +45,9 @@ abstract class SourceToFuture<T> implements Future<T> { | |||
@Nullable | |||
private volatile Object value; | |||
|
|||
// The timestamp of the last `.get()` call. This is intended to help with debugging stuck threads via heap dumps. | |||
private long lastGetTimestamp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current approach lgtm, we should not have cases when multiple threads trying to invoke get()
at the same future. Most cases call get()
once and if they need to "retry" (in case of get timeout), they likely gonna do it from the same thread.
@@ -91,7 +94,12 @@ public final boolean isDone() { | |||
public final T get() throws InterruptedException, ExecutionException { | |||
final Object value = this.value; | |||
if (value == null) { | |||
latch.await(); | |||
lastGetTimestamp = System.currentTimeMillis(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any thoughts on System.currentTimeMillis()
vs System.nanoTime()
? I don't think for debugging precision matters, but I'm hesitant with possibility of System.currentTimeMillis()
to return t2 < t1 for two consecutive calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only using it as a timestamp so I don't think we need monotonic guarantees.
This should be mutually exclusive with an alternative, #3052. |
Motivation:
It can be very difficult to know if a thread is slow or completely stuck without taking multiple thread dumps and comparing them.
Modifications:
Try to mitigate this by saving the time stamp of the last blocking
.get()
call. This should then show up in heap dumps and we can at least compare them to see if they've all started blocking at about the same time or if there is a big spread. The latter would suggest they are truly stuck.