-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add rejectReadOnly option (to fix AWS Aurora failover) #604
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be awesome if you could add a test, i.e. by manually (temporarily) setting the MySQL server to read-only mode.
README.md
Outdated
``` | ||
|
||
|
||
RejectreadOnly causes mysql driver to reject read-only connections. This is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/mysql driver/the driver/
Please keep the explanation general, i.e. independent from AWS Aurora, which can however be mentioned as an example. For example explain that failover instances might be read-only and that this option purges them from the connection pool. Please also don't mention internals like "returning a driver.ErrBadConnection".
@julienschmidt Thanks for the prompt response! Please take another look. |
(This is not comment about PR, just an advice) I recommend to set max lifetime to 10sec~1min. |
Thanks @methane; that's good advice! I don't think we set it for now; we should as you said. In case of AWS Aurora though, the Aurora cluster terminates all connections during the failover; we just get connected to the wrong replica right after the failover :( But yeah, setting the max life time would definitely prevent the service from being stuck in a read-only connection forever. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except for some minor documentation stuff this PR seems fine to me now.
README.md
Outdated
``` | ||
|
||
|
||
RejectreadOnly causes the driver to reject read-only connections. This is for a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/RejectreadOnly/rejectReadOnly
/
README.md
Outdated
Note that this should be a fairly rare case, as an automatic failover normally | ||
happens when the primary is down, and the race condition shouldn't happen | ||
unless it comes back up online as soon as the failover is kicked off. On the | ||
other hand, when this happens, an mysql application can get stuck on a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/mysql/MySQL/
dsn.go
Outdated
@@ -53,6 +53,7 @@ type Config struct { | |||
InterpolateParams bool // Interpolate placeholders into query string | |||
MultiStatements bool // Allow multiple statements in one query | |||
ParseTime bool // Parse time values to time.Time | |||
RejectReadOnly bool // Reject read-only connections; see README.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the "; see README.md", all of these params are explained in the README.md
(did a force push to trigger travis build) |
Thanks! |
@julienschmidt Thanks for accepting the patch! |
Description
Hi there, we use this (awesome) driver with AWS Aurora's MySQL-compatible Cluster. In some recent manual tests with Aurora's automatic failover, we noticed that there might be a race condition in Aurora's failover implementation, where our servers (mysql client) get connected to a read-only replica after failover.
This should be a rare case, since normally during a failover the primary is unreachable before failover to the read replica, and reconnecting should only succeed with the new primary (writable) instance). However, in an event where the old primary (new read-only) replica recovers before failover completes, this could still happen. In addition, if we manually trigger a failover from AWS's web control panel, both replicas are still accessible, making this is very easy to reproduce.
This PR adds a
rejectReadOnly
option to DSN, which when set totrue
, causes the driver to drop the connection if error 1792 (ER_CANT_EXECUTE_IN_READ_ONLY_TRANSACTION
) is received. This way, the connection pool (fromdatabase/sql
) purges the connection, and makes a new one for the next statement, bringing the storage layer back to healthy state.This is apparently an AWS bug, and they should be fixing it. We are making this PR because 1) it's a much longer process to get AWS to fix the bug (assuming they acknowledge), while adding an option to the driver is a quick fix for everybody; 2) if the same behavior happens with other service providers, this can be a useful remedy too.
Checklist