-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MySQL HA: Return an error if we fail to get a lock on standby #8229
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Not important but we could potentially test this.
…nto b-fix-mysql-lock-panic * 'b-fix-mysql-lock-panic' of github.com:hashicorp/vault: (52 commits) Fix minor typo in doc string (#8277) Update gen_openapi.sh (#8273) update dependencies (#8271) docs: update vault k8s to 0.2.0 (#8269) Fix flaky test of api renewer by moving away from legacy api. (#8265) Clean AlibabaCloud physical backend code (#8186) Update GH issue template to point to forum (#8226) Fix default max_open_connections for db plugins (#8262) Fix broken link (#8259) Removing timing-dependent aspects of test. (#8261) Changelog++ Added flag to disable X-Vault-Token header proxy if client passes the token (#8101) changelog++ changelog++ test: fix TestAgent_Template_Basic (#8257) docs: fix api path for merge entity identity doc (#8258) Bump etcd client API dep (#8037) Add Consul TLS options to access API endpoint (#8253) Docs: Add nomad TLS options (#8254) Update CHANGELOG.md ...
@michelvocks you're totally right, a regression test was easier than I expected and added in 8ea43ab , please take another look when you can! Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Commit 16e3711 dismissed @michelvocks' approval, however that commit simply fixes a race in the test so I'm going to merge as is |
* return an error if we fail to get a lock on standby * Add regression test * minor refactoring to remove a race condition in the test
if this created new probleb with mysql Mysql Hashicorp #10318 |
Standby node leaving all connection with mysql as established while trying to acquire lock pi@serlog:~ $ sudo netstat | grep mysql |grep -i serlog|grep -i established On standby vault node I have following output when I trace a connection pi@vaultf2:~ $ sudo lsof -i tcp:59860 pi@vaultf2:~ $ ps -ef|grep -i 9180 |
On a standby node, attempting to grab a lock with
NewMySQLLock
can fail if the MySQL fails to respond (in this case, is completely down), and return anil
MySQLBackend
. If the method returns an error, we pass it on to thefailLock
channel which returns the error, but at present we don't abort the internalattemptLock
method, instead we continue on an eventually get a nil pointer deference panic as shown in #8203.In this PR, if
NewMySQLLock
returns an error we send on thefailLock
channel as normal, but we also callreturn
. I checked the Postgresql storage backend and it too returns after failed lock:vault/physical/postgresql/postgresql.go
Lines 420 to 435 in ac33c32
Fixes #8203