-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pub/Sub connections can stop functioning without error #295
Comments
Use the PING command to detect broken connections. |
While this is possible it would add quite a bit of complexity to any use Pub/Sub, as it would require an new go routine and all the synchronisation which goes along with that. If Pub/Sub Recieve() was channel based then it would be easier as we could use a select to multiplex the recieve with a ticker that could trigger a Ping() something like. t := time.NewTicker(time.Minute)
defer t.Stop()
for {
select {
case v := <-psc.Receive():
// Handle message, pong, error etc
case <- t.C:
if err := psc.Ping(“”); err != nil {
// Handle recovery
}
}
} Either way having to explicitly add Ping() support adds extra complexity to every use of Pub/Sub. With this in mind I think the cleanest solution is to to enable keep-alive on the connections. This approach also means all users of redigo will benefit as soon as they update, with no need to add any additional code. This is also the method used the redis-server as of v3.2, so I would like to here your thoughts on our PR which adds connection keep-alive |
I will accept the PR, but PING is the more robust solution to the problem. PING woks through proxies and detects stuck servers. Regarding channel based API: It's a few lines of application code to start a goroutine that reads from a connection and sends to a channel. |
Thanks @garyburd I know that keep-alive can have issues with proxies, although not sure how many users that would actually impact, so do you think it would be worth building it in to Recieve or providing a simple way to facilitate managing a “pinger”? Either way I think it would be good to improve the docs around Pub/Sub to mention this, as its far from obvious so I wouldn’t be surprised if a lot of users are unaware of this potential problem. |
The pinger is a four or five extra lines of code for the application. Building the pinger into Receive adds an unnecessary goroutine. I'll add a pinger to the example. |
I agree that many people do not know how to check for liveness of the connection and server. Another problem is that it's not well understood how to cancel receive on a connection in pubsub mode. I added #296 to address these issues. |
Sounds good @garyburd 👍 |
I updated the pubsub example to include health checks with PING. https://godoc.org/github.com/garyburd/redigo/redis#example-PubSubConn |
If a network event occurs which partitions the server from the client then the client can end up with a connection which is half-closed, in this state the connection will remain broken unless a write is performed.
This isn't an issue for normal connections which issue commands as this involves a write, however for Pub/Sub connections this is a serious problem as they will currently never recover as there is no way for such connections to detect the fact that the TCP session is in a broken state.
We had this happen when we experienced VMWare switch migration, all Pub/Sub connections where shutdown by the server but our golang daemons never noticed, they just remained stuck in Receive().
The text was updated successfully, but these errors were encountered: