add rejectReadOnly option (to fix AWS Aurora failover) #604

songgao · 2017-05-29T21:01:49Z

Description

Hi there, we use this (awesome) driver with AWS Aurora's MySQL-compatible Cluster. In some recent manual tests with Aurora's automatic failover, we noticed that there might be a race condition in Aurora's failover implementation, where our servers (mysql client) get connected to a read-only replica after failover.

This should be a rare case, since normally during a failover the primary is unreachable before failover to the read replica, and reconnecting should only succeed with the new primary (writable) instance). However, in an event where the old primary (new read-only) replica recovers before failover completes, this could still happen. In addition, if we manually trigger a failover from AWS's web control panel, both replicas are still accessible, making this is very easy to reproduce.

This PR adds a rejectReadOnly option to DSN, which when set to true, causes the driver to drop the connection if error 1792 (ER_CANT_EXECUTE_IN_READ_ONLY_TRANSACTION) is received. This way, the connection pool (from database/sql) purges the connection, and makes a new one for the next statement, bringing the storage layer back to healthy state.

This is apparently an AWS bug, and they should be fixing it. We are making this PR because 1) it's a much longer process to get AWS to fix the bug (assuming they acknowledge), while adding an option to the driver is a quick fix for everybody; 2) if the same behavior happens with other service providers, this can be a useful remedy too.

Checklist

Code compiles correctly
Created tests which fail without the change (if possible)
All tests passing
Extended the README / documentation, if necessary
Added myself / the copyright holder to the AUTHORS file

coveralls · 2017-05-29T21:05:57Z

Coverage decreased (-0.6%) to 73.943% when pulling 83d56b7 on keybase:master into 382e13d on go-sql-driver:master.

julienschmidt

It would be awesome if you could add a test, i.e. by manually (temporarily) setting the MySQL server to read-only mode.

julienschmidt · 2017-05-29T21:23:44Z

README.md

+```
+
+
+RejectreadOnly causes mysql driver to reject read-only connections. This is


s/mysql driver/the driver/

Please keep the explanation general, i.e. independent from AWS Aurora, which can however be mentioned as an example. For example explain that failover instances might be read-only and that this option purges them from the connection pool. Please also don't mention internals like "returning a driver.ErrBadConnection".

songgao · 2017-05-29T22:11:05Z

@julienschmidt Thanks for the prompt response! Please take another look.

coveralls · 2017-05-29T22:12:54Z

Coverage decreased (-0.04%) to 74.531% when pulling 36053f8 on keybase:master into 382e13d on go-sql-driver:master.

coveralls · 2017-05-29T22:51:02Z

Coverage decreased (-0.04%) to 74.531% when pulling a2e9438 on keybase:master into 382e13d on go-sql-driver:master.

coveralls · 2017-05-29T22:54:22Z

Coverage decreased (-0.04%) to 74.531% when pulling be8ab32 on keybase:master into 382e13d on go-sql-driver:master.

coveralls · 2017-05-29T22:57:26Z

Coverage decreased (-0.04%) to 74.531% when pulling d2fefe8 on keybase:master into 382e13d on go-sql-driver:master.

methane · 2017-05-30T00:53:45Z

(This is not comment about PR, just an advice)
@songgao Do you use DB.SetConnMaxLifetime()?
Without it, database/sql can keep connection for very long time.
Very slow failover is one of many troubles of long living connections.

I recommend to set max lifetime to 10sec~1min.

songgao · 2017-05-30T01:02:15Z

Thanks @methane; that's good advice! I don't think we set it for now; we should as you said.

In case of AWS Aurora though, the Aurora cluster terminates all connections during the failover; we just get connected to the wrong replica right after the failover :( But yeah, setting the max life time would definitely prevent the service from being stuck in a read-only connection forever.

coveralls · 2017-05-30T18:19:13Z

Coverage decreased (-0.04%) to 74.846% when pulling fb11b2c on keybase:master into 44fa292 on go-sql-driver:master.

julienschmidt

Except for some minor documentation stuff this PR seems fine to me now.

julienschmidt · 2017-05-30T18:34:22Z

README.md

+```
+
+
+RejectreadOnly causes the driver to reject read-only connections. This is for a


s/RejectreadOnly/rejectReadOnly/

julienschmidt · 2017-05-30T18:34:59Z

README.md

+Note that this should be a fairly rare case, as an automatic failover normally
+happens when the primary is down, and the race condition shouldn't happen
+unless it comes back up online as soon as the failover is kicked off. On the
+other hand, when this happens, an mysql application can get stuck on a


s/mysql/MySQL/

julienschmidt · 2017-05-30T18:36:42Z

dsn.go

@@ -53,6 +53,7 @@ type Config struct {
 	InterpolateParams       bool // Interpolate placeholders into query string
 	MultiStatements         bool // Allow multiple statements in one query
 	ParseTime               bool // Parse time values to time.Time
+	RejectReadOnly          bool // Reject read-only connections; see README.md


Remove the "; see README.md", all of these params are explained in the README.md

coveralls · 2017-05-31T01:45:24Z

Coverage decreased (-0.04%) to 74.846% when pulling 91f7309 on keybase:master into 44fa292 on go-sql-driver:master.

songgao · 2017-05-31T02:21:45Z

(did a force push to trigger travis build)

coveralls · 2017-05-31T02:25:33Z

Coverage decreased (-0.04%) to 74.846% when pulling bf068b4 on keybase:master into 44fa292 on go-sql-driver:master.

julienschmidt · 2017-05-31T05:52:55Z

Thanks!

songgao · 2017-05-31T06:11:34Z

@julienschmidt Thanks for accepting the patch!

songgao added 3 commits May 25, 2017 16:41

add RejectReadOnly

a3bdfa2

update README.md

6b8624c

close connection explicitly before returning ErrBadConn for 1792 (#2)

83d56b7

julienschmidt requested changes May 29, 2017

View reviewed changes

add test and improve doc

d2fefe8

Merge remote-tracking branch 'upstream/master'

fb11b2c

julienschmidt requested changes May 30, 2017

View reviewed changes

doc/comment changes

bf068b4

julienschmidt approved these changes May 31, 2017

View reviewed changes

julienschmidt merged commit b644770 into go-sql-driver:master May 31, 2017

julienschmidt added the enhancement label May 31, 2017

julienschmidt added this to the v1.4 milestone May 31, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add rejectReadOnly option (to fix AWS Aurora failover) #604

add rejectReadOnly option (to fix AWS Aurora failover) #604

songgao commented May 29, 2017 •

edited

Loading

coveralls commented May 29, 2017 •

edited

Loading

julienschmidt left a comment

julienschmidt May 29, 2017

songgao commented May 29, 2017

coveralls commented May 29, 2017 •

edited

Loading

coveralls commented May 29, 2017 •

edited

Loading

coveralls commented May 29, 2017 •

edited

Loading

coveralls commented May 29, 2017 •

edited

Loading

methane commented May 30, 2017

songgao commented May 30, 2017

coveralls commented May 30, 2017 •

edited

Loading

julienschmidt left a comment

julienschmidt May 30, 2017

julienschmidt May 30, 2017

julienschmidt May 30, 2017

coveralls commented May 31, 2017 •

edited

Loading

songgao commented May 31, 2017

coveralls commented May 31, 2017 •

edited

Loading

julienschmidt commented May 31, 2017

songgao commented May 31, 2017

		```


		RejectreadOnly causes mysql driver to reject read-only connections. This is

		```


		RejectreadOnly causes the driver to reject read-only connections. This is for a

add rejectReadOnly option (to fix AWS Aurora failover) #604

add rejectReadOnly option (to fix AWS Aurora failover) #604

Conversation

songgao commented May 29, 2017 • edited Loading

Description

Checklist

coveralls commented May 29, 2017 • edited Loading

julienschmidt left a comment

Choose a reason for hiding this comment

julienschmidt May 29, 2017

Choose a reason for hiding this comment

songgao commented May 29, 2017

coveralls commented May 29, 2017 • edited Loading

coveralls commented May 29, 2017 • edited Loading

coveralls commented May 29, 2017 • edited Loading

coveralls commented May 29, 2017 • edited Loading

methane commented May 30, 2017

songgao commented May 30, 2017

coveralls commented May 30, 2017 • edited Loading

julienschmidt left a comment

Choose a reason for hiding this comment

julienschmidt May 30, 2017

Choose a reason for hiding this comment

julienschmidt May 30, 2017

Choose a reason for hiding this comment

julienschmidt May 30, 2017

Choose a reason for hiding this comment

coveralls commented May 31, 2017 • edited Loading

songgao commented May 31, 2017

coveralls commented May 31, 2017 • edited Loading

julienschmidt commented May 31, 2017

songgao commented May 31, 2017

songgao commented May 29, 2017 •

edited

Loading

coveralls commented May 29, 2017 •

edited

Loading

coveralls commented May 29, 2017 •

edited

Loading

coveralls commented May 29, 2017 •

edited

Loading

coveralls commented May 29, 2017 •

edited

Loading

coveralls commented May 29, 2017 •

edited

Loading

coveralls commented May 30, 2017 •

edited

Loading

coveralls commented May 31, 2017 •

edited

Loading

coveralls commented May 31, 2017 •

edited

Loading