-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complete, test JS API reconnects #730
Comments
Added basic error handling, but not proper reconnects or reauth. Three error types are handled for the main streams or polling calls: auth failure which suggests either that the server was restarted or the session timed out somehow - in either case, a new auth is required to resume working Code.INTERNAL, which our grpc-web client library will pass in cases where the response doesn't make sense (broken payloads) Code.UNKNOWN, which our grpc-web client library will pass in cases where the transport doesn't seem to work (no response before connection died) A new event was added, temporarily, to inform the UI that we are giving up if one of these happens, and the page needs to be reloaded. This will be removed when we have proper reconnect, and inform the UI again that the reconnect succeeded or failed. Also modified a few errors being sent from the server to hopefully better reflect the intent of the gRPC status codes, and avoid using "INTERNAL" when "something bad" happens. The UI will need a few small changes to handle this, and to handle a few cases where API calls can fail but not be registered in the UI. Fixes #1065, but #730 needs to formally address this.
Fetch/h2-based grpc-web clients do not support bidirectional streaming at this time, but the websocket client we are using from https://github.com/improbable-eng/grpc-web/ does. This lets us keep a single websocket open instead of making many calls. The client has support now to abstract how a bidirectional stream is created, either by a single websocket, or by a pair of fetch calls. Additionally, autocomplete was seeing problems that appeared related to the server-side grpc handling payloads out of order. These methods have been consolidated into a single bidirectional stream to ensure ordering is sane. The server now has support to only implement a bidirectional stream once, and mark the "open"/"next" rpc methods as part of that, and automatically have their requests enqueued to be handled serially. Other improvements got slipped into this work as well, such as only interacting with autocomplete from the gRPC service and never from DB, so that dependency was cut (as well as the dependency on gRPC utils, which only existed to get jsr305 annotations into various projects). The client is incomplete in this patch, we plan on continuing this work as part of #730. Fixes #929 Fixes #834 Fixes #1049
Note Devin says that adding an additional container to the docker-compose and restarting that is enough to cause docker to "hiccup" the entire network stack. This causes the UI to give up, but it sounds like an ideal test-bed for implementing the trivial reconnect scenario. |
The data flow is: crypto -> kafka -> grpc-api. Restarting just the crypto service container causes the web ui message to display
a few seconds after crypto restarts (and I notice that new data is coming in to tables for those few seconds). It's followed by messages on the server of the form
but, probably harmless according to #1205. |
As long as a user's session on the server remains active, this should enable a client to reconnect to that session, and continue using already-exported objects. Some of the work is restored here for #3501 to be finished as well, but the client would need a way to create a new session without user interaction. The dh.IdeConnection.HACK_CONNECTION_FAILURE event is now deprecated, and should be removed. Clients should instead use the EVENT_DISCONNECTED and EVENT_RECONNECTED events to signal to the user that an object isn't available. Alternatively, to detect shutdown, use EVENT_SHUTDOWN. Also fixes a bug where the last seen log item is replayed incorrectly. Fixes #730 Fixes #225
Since replacing the websocket and gwt-rpc serialization with http2/grpc/protobuf, we haven't had an actual server deployed suitable for testing reconnecting a session after some kind of network loss, so the work is incomplete still. Presently we only are testing with the websocket proxy transport (so as to be useful on localhost or in other ssl-less environments, since http2 can't be used in the browser without ssl).
While we're probably interested in websocket proxy reconnects working correctly, it isn't as much of a priority as actual http2 reconnects, so the first step of this task is to get the stack running on a server with ssl and no websockets, then bounce envoy or the network connection to periodically break the streams and make sure that we are able to restore the session, or in cases where the session cannot correctly come back, restore tables which can be re-fetched.
The text was updated successfully, but these errors were encountered: