Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySQL HA: Return an error if we fail to get a lock on standby #8229

Merged
merged 5 commits into from
Feb 5, 2020

Conversation

catsby
Copy link
Contributor

@catsby catsby commented Jan 23, 2020

On a standby node, attempting to grab a lock with NewMySQLLock can fail if the MySQL fails to respond (in this case, is completely down), and return a nil MySQLBackend. If the method returns an error, we pass it on to the failLock channel which returns the error, but at present we don't abort the internal attemptLock method, instead we continue on an eventually get a nil pointer deference panic as shown in #8203.

In this PR, if NewMySQLLock returns an error we send on the failLock channel as normal, but we also call return. I checked the Postgresql storage backend and it too returns after failed lock:

for {
select {
case <-stop:
return
case <-ticker.C:
gotlock, err := l.writeItem()
switch {
case err != nil:
errors <- err
return
case gotlock:
close(success)
return
}
}
}

Fixes #8203

michelvocks
michelvocks previously approved these changes Jan 24, 2020
Copy link
Contributor

@michelvocks michelvocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Not important but we could potentially test this.

catsby and others added 3 commits February 3, 2020 06:31
…nto b-fix-mysql-lock-panic

* 'b-fix-mysql-lock-panic' of github.com:hashicorp/vault: (52 commits)
  Fix minor typo in doc string (#8277)
  Update gen_openapi.sh (#8273)
  update dependencies (#8271)
  docs: update vault k8s to 0.2.0 (#8269)
  Fix flaky test of api renewer by moving away from legacy api. (#8265)
  Clean AlibabaCloud physical backend code (#8186)
  Update GH issue template to point to forum (#8226)
  Fix default max_open_connections for db plugins (#8262)
  Fix broken link (#8259)
  Removing timing-dependent aspects of test. (#8261)
  Changelog++
  Added flag to disable X-Vault-Token header proxy if client passes the token (#8101)
  changelog++
  changelog++
  test: fix TestAgent_Template_Basic (#8257)
  docs: fix api path for merge entity identity doc (#8258)
  Bump etcd client API dep (#8037)
  Add Consul TLS options to access API endpoint (#8253)
  Docs: Add nomad TLS options (#8254)
  Update CHANGELOG.md
  ...
@catsby
Copy link
Contributor Author

catsby commented Feb 4, 2020

LGTM. Not important but we could potentially test this.

@michelvocks you're totally right, a regression test was easier than I expected and added in 8ea43ab , please take another look when you can! Thanks

@catsby catsby requested a review from michelvocks February 4, 2020 17:26
michelvocks
michelvocks previously approved these changes Feb 5, 2020
Copy link
Contributor

@michelvocks michelvocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@catsby
Copy link
Contributor Author

catsby commented Feb 5, 2020

Commit 16e3711 dismissed @michelvocks' approval, however that commit simply fixes a race in the test so I'm going to merge as is

@catsby catsby merged commit 66bf106 into master Feb 5, 2020
@catsby catsby deleted the b-fix-mysql-lock-panic branch February 5, 2020 20:08
catsby added a commit that referenced this pull request Feb 5, 2020
* return an error if we fail to get a lock on standby

* Add regression test

* minor refactoring to remove a race condition in the test
catsby added a commit that referenced this pull request Feb 14, 2020
…#8300)

* return an error if we fail to get a lock on standby

* Add regression test

* minor refactoring to remove a race condition in the test
@sksingh20
Copy link

if this created new probleb with mysql Mysql Hashicorp #10318
Mysql Hashicorp Vault in HA leaving multiple sleep connections with Mysql

@sksingh20
Copy link

sksingh20 commented May 13, 2021

Standby node leaving all connection with mysql as established while trying to acquire lock

pi@serlog:~ $ sudo netstat | grep mysql |grep -i serlog|grep -i established
tcp 0 0 serlog:mysql 192.168.0.213:58760 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58762 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58834 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58818 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58822 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58694 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58728 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58790 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58736 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58668 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58688 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58750 ESTABLISHED
tcp 0 0 serlog:57480 serlog:mysql ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58714 ESTABLISHED
tcp 0 0 serlog:57482 serlog:mysql ESTABLISHED
tcp 0 0 serlog:mysql serlog:46002 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58800 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58734 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58784 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58708 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58842 ESTABLISHED
tcp 0 0 serlog:mysql serlog:57482 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58840 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58686 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58792 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58812 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58810 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58700 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58774 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58742 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58756 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58768 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58744 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58780 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58680 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58766 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58832 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58816 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58786 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58806 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58838 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58682 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58716 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58828 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58676 ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58722 ESTABLISHED
tcp 0 0 serlog:46002 serlog:mysql ESTABLISHED
tcp 0 0 serlog:mysql 192.168.0.213:58710 ESTABLISHED
tcp 0 0 serlog:mysql serlog:57480 ESTABLISHED
pi@serlog:~ $

On standby vault node I have following output when I trace a connection

pi@vaultf2:~ $ sudo lsof -i tcp:59860
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
vault 9180 root 78u IPv4 63158 0t0 TCP 192.168.0.213:59860->192.168.0.212:mysql (ESTABLISHED)
pi@vaultf2:~ $

pi@vaultf2:~ $ ps -ef|grep -i 9180
root 9180 1 2 18:27 ? 00:00:38 /usr/bin/vault server -config=/etc/vault/config.hcl
pi 10527 664 0 18:52 pts/0 00:00:00 grep --color=auto -i 9180
pi@vaultf2:~ $

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mysql backend HA lock handling is unsafe
3 participants