-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JS API should attempt to reconnect to the gRPC server #3502
Conversation
...client-api/src/main/java/io/deephaven/web/client/api/grpc/MultiplexedWebsocketTransport.java
Outdated
Show resolved
Hide resolved
} | ||
return; | ||
} | ||
assert success != null; | ||
|
||
// welp; the server is gone -- let everyone know | ||
connectionLost(); | ||
|
||
info.notifyConnectionError(new ResponseStreamWrapper.Status() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the notifyConnectionError here, and instead going with a disconnect event exposes a missing case that we need to consider for DHC, how to signal that the server has shut down and told the client this, but how to potentially handle the server coming back again afterwards. We probably need a new event, or metadata on the disconnect event, but likewise should probably welcome the server back by prompting another login (right now we'll get the "reauth failed" event) by the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subscribers of TerminationNotification get a tombstone from the server with abnormalTermination
set to false
. We can probably use this to indicate a normal / expected shutdown? Though, I personally get these on my machines as the DH instances restart whenever they are redeployed. Either way, maybe this is the path you want to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do get "expected shutdown", but if you are testing an application or some server side debugging, you're going to edit, restart, edit, restart - an expected shutdown isn't the same as "never come back without user interaction".
Success or failure we can tell the client that the disconnect happened, but the question then is do we say "the server is down (here's the error|this was on purpose)" or just say "its down, it might be up again soon, please dismiss the message"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a new event on IdeConnection, EVENT_SHUTDOWN that will emit just the status message from the shutdown, rather than the full grpc status. It will only fire if the client gets notification that the server is explicitly going away, as opposed to losing that stream (or any other stream).
@@ -366,7 +366,7 @@ public void record(LogBufferRecord record) { | |||
} | |||
// since the subscribe() method auto-replays all existing logs, filter to just once this consumer probably | |||
// hasn't seen | |||
if (record.getTimestampMicros() < request.getLastSeenLogTimestamp()) { | |||
if (record.getTimestampMicros() <= request.getLastSeenLogTimestamp()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a note, not really feedback or a request other than to consider whether or not we should change the API.
Interesting. The listener could miss log messages that occurred at the same timestamp, but I see how the alternative guarantees at least one duplicate message. Maybe a better API would count the log messages and you just provide an offset. (This is probably why Kafka uses an offset instead of timestamp for new subscriptions.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's fair. Dup messages were confusing me, and the comment made it appear that this was "filtering to [ones] this consumer probably hadn't seen" meant something other than "will definitely always dup the last message seen, and maybe more". @devinrsmith any thoughts? I think you touched this last?
I know we will sometimes have gaps since this is implemented as a ring buffer, and maybe we tacitly accept that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "guarantees" around the log buffer API are sketchy at best - the consumer never knows if they actually missed log messages or not. I'm happy to "break" the current API w/ the change as Colin has here... all else equal, if the client wants to they can request timestamp-1, and potentially handle dubious de-duping logic if necessary themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so I'll revert this and fix the comment instead?
web/client-api/src/main/java/io/deephaven/web/client/api/WorkerConnection.java
Show resolved
Hide resolved
} | ||
return; | ||
} | ||
assert success != null; | ||
|
||
// welp; the server is gone -- let everyone know | ||
connectionLost(); | ||
|
||
info.notifyConnectionError(new ResponseStreamWrapper.Status() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subscribers of TerminationNotification get a tombstone from the server with abnormalTermination
set to false
. We can probably use this to indicate a normal / expected shutdown? Though, I personally get these on my machines as the DH instances restart whenever they are redeployed. Either way, maybe this is the path you want to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit out of my depth when it comes to reviewing JS API code.
Is there an sample client code to show EVENT_DISCONNECTED / EVENT_RECONNECTED event handling? Is there any "easy" way to simulate server disconnections?
public static final String EVENT_DISCONNECT = "disconnect"; | ||
public static final String EVENT_RECONNECT = "reconnect"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see these being used - are these meant to be referenced by client code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we declare constants for events that can be fired on the class that can fire them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we decorate the event names on the JS types, so we can easily document what events are emitted from a type. We could probably do better with types (esp with incoming generated TS types), but for now this is how we do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(sorry for the dup, i had this open and it didnt show the old message, so thought i hadnt responded...)
I opened a draft PR for UI to update for this: deephaven/web-client-ui#1149 |
"worker shutdown" is definitely a "we might come back" scenario? At the very least, DnD PQs will have to anticipate this, and will expect (one day) to be able to attempt reconnect here (though at present the JS API can't support this, only due to re-auth concerns).
This is consistent with dhe workers, both PQs and ad hoc, right? Its actually N times, based on how long you are disconnected for, how fast the reconnect retries to connect again. The original idea was that the UI could get a sense of how long it had been since the last try. On the other hand, Tables (and formerly treetables, need to look closer at DHC here) only fire once, and suppress all later events until reconnected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good with my Web UI changes: deephaven/web-client-ui#1149
I added one more commit to remove a logged error message that can occur when the user's session is resumed via a new connection. |
As long as a user's session on the server remains active, this should enable a client to reconnect to that session, and continue using already-exported objects. Some of the work is restored here for #3501 to be finished as well, but the client would need a way to create a new session without user interaction.
The dh.IdeConnection.HACK_CONNECTION_FAILURE event is now deprecated, and should be removed. Clients should instead use the EVENT_DISCONNECTED and EVENT_RECONNECTED events to signal to the user that an object isn't available.
Also fixes a bug where the last seen log item is replayed incorrectly.
Fixes #730
Fixes #225