Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JS API should attempt to reconnect to the gRPC server #3502

Merged
merged 18 commits into from
Mar 24, 2023

Conversation

niloc132
Copy link
Member

@niloc132 niloc132 commented Mar 7, 2023

As long as a user's session on the server remains active, this should enable a client to reconnect to that session, and continue using already-exported objects. Some of the work is restored here for #3501 to be finished as well, but the client would need a way to create a new session without user interaction.

The dh.IdeConnection.HACK_CONNECTION_FAILURE event is now deprecated, and should be removed. Clients should instead use the EVENT_DISCONNECTED and EVENT_RECONNECTED events to signal to the user that an object isn't available.

Also fixes a bug where the last seen log item is replayed incorrectly.

Fixes #730
Fixes #225

}
return;
}
assert success != null;

// welp; the server is gone -- let everyone know
connectionLost();

info.notifyConnectionError(new ResponseStreamWrapper.Status() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the notifyConnectionError here, and instead going with a disconnect event exposes a missing case that we need to consider for DHC, how to signal that the server has shut down and told the client this, but how to potentially handle the server coming back again afterwards. We probably need a new event, or metadata on the disconnect event, but likewise should probably welcome the server back by prompting another login (right now we'll get the "reauth failed" event) by the user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subscribers of TerminationNotification get a tombstone from the server with abnormalTermination set to false. We can probably use this to indicate a normal / expected shutdown? Though, I personally get these on my machines as the DH instances restart whenever they are redeployed. Either way, maybe this is the path you want to use?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do get "expected shutdown", but if you are testing an application or some server side debugging, you're going to edit, restart, edit, restart - an expected shutdown isn't the same as "never come back without user interaction".

Success or failure we can tell the client that the disconnect happened, but the question then is do we say "the server is down (here's the error|this was on purpose)" or just say "its down, it might be up again soon, please dismiss the message"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new event on IdeConnection, EVENT_SHUTDOWN that will emit just the status message from the shutdown, rather than the full grpc status. It will only fire if the client gets notification that the server is explicitly going away, as opposed to losing that stream (or any other stream).

@@ -366,7 +366,7 @@ public void record(LogBufferRecord record) {
}
// since the subscribe() method auto-replays all existing logs, filter to just once this consumer probably
// hasn't seen
if (record.getTimestampMicros() < request.getLastSeenLogTimestamp()) {
if (record.getTimestampMicros() <= request.getLastSeenLogTimestamp()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a note, not really feedback or a request other than to consider whether or not we should change the API.

Interesting. The listener could miss log messages that occurred at the same timestamp, but I see how the alternative guarantees at least one duplicate message. Maybe a better API would count the log messages and you just provide an offset. (This is probably why Kafka uses an offset instead of timestamp for new subscriptions.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's fair. Dup messages were confusing me, and the comment made it appear that this was "filtering to [ones] this consumer probably hadn't seen" meant something other than "will definitely always dup the last message seen, and maybe more". @devinrsmith any thoughts? I think you touched this last?

I know we will sometimes have gaps since this is implemented as a ring buffer, and maybe we tacitly accept that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "guarantees" around the log buffer API are sketchy at best - the consumer never knows if they actually missed log messages or not. I'm happy to "break" the current API w/ the change as Colin has here... all else equal, if the client wants to they can request timestamp-1, and potentially handle dubious de-duping logic if necessary themselves.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so I'll revert this and fix the comment instead?

}
return;
}
assert success != null;

// welp; the server is gone -- let everyone know
connectionLost();

info.notifyConnectionError(new ResponseStreamWrapper.Status() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subscribers of TerminationNotification get a tombstone from the server with abnormalTermination set to false. We can probably use this to indicate a normal / expected shutdown? Though, I personally get these on my machines as the DH instances restart whenever they are redeployed. Either way, maybe this is the path you want to use?

Copy link
Member

@devinrsmith devinrsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit out of my depth when it comes to reviewing JS API code.

Is there an sample client code to show EVENT_DISCONNECTED / EVENT_RECONNECTED event handling? Is there any "easy" way to simulate server disconnections?

Comment on lines +25 to +26
public static final String EVENT_DISCONNECT = "disconnect";
public static final String EVENT_RECONNECT = "reconnect";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see these being used - are these meant to be referenced by client code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we declare constants for events that can be fired on the class that can fire them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we decorate the event names on the JS types, so we can easily document what events are emitted from a type. We could probably do better with types (esp with incoming generated TS types), but for now this is how we do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(sorry for the dup, i had this open and it didnt show the old message, so thought i hadnt responded...)

@mofojed
Copy link
Member

mofojed commented Mar 14, 2023

I opened a draft PR for UI to update for this: deephaven/web-client-ui#1149
I'm not sure how best to test this, but I did just try with just killing the connection to see what happens. Couple notes:

  • No error details are passed with the EVENT_DISCONNECT. Would be useful if there are possible different disconnect scenarios (such as "socket going down and we might come back" vs. "worker shutdown")
  • The disconnect event seems to be getting fired 3 times:
    image
    image
    image

@niloc132
Copy link
Member Author

No error details are passed with the EVENT_DISCONNECT. Would be useful if there are possible different disconnect scenarios (such as "socket going down and we might come back" vs. "worker shutdown")

"worker shutdown" is definitely a "we might come back" scenario? At the very least, DnD PQs will have to anticipate this, and will expect (one day) to be able to attempt reconnect here (though at present the JS API can't support this, only due to re-auth concerns).

The disconnect event seems to be getting fired 3 times:...

This is consistent with dhe workers, both PQs and ad hoc, right? Its actually N times, based on how long you are disconnected for, how fast the reconnect retries to connect again. The original idea was that the UI could get a sense of how long it had been since the last try.

On the other hand, Tables (and formerly treetables, need to look closer at DHC here) only fire once, and suppress all later events until reconnected.

mofojed
mofojed previously approved these changes Mar 23, 2023
Copy link
Member

@mofojed mofojed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with my Web UI changes: deephaven/web-client-ui#1149

@niloc132
Copy link
Member Author

I added one more commit to remove a logged error message that can occur when the user's session is resumed via a new connection.

mofojed
mofojed previously approved these changes Mar 23, 2023
@niloc132 niloc132 merged commit 9a5501b into deephaven:main Mar 24, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Mar 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Complete, test JS API reconnects JS API should only request logs it hasn't seen yet
4 participants