Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ERROR] core: key rotation periodic upgrade check failed: error="gocql: no hosts available in the pool #9175

Open
jacekjaros opened this issue Jun 9, 2020 · 2 comments
Labels
bug Used to indicate a potential bug storage/cassandra

Comments

@jacekjaros
Copy link

Describe the bug
Vault in random moments loose connection to Cassandra which is used as a secrets storage. When this are happen Vault is unable to recover.

Jun 09 08:25:30 cluster1-vault01 vault[14227]: 2020-06-09T08:25:30.868Z [ERROR] core: key rotation periodic upgrade check failed: error="gocql: no hosts available in the pool"
Jun 09 08:25:40 cluster1-vault01 vault[14227]: 2020-06-09T08:25:40.868Z [ERROR] core: key rotation periodic upgrade check failed: error="gocql: no hosts available in the pool"
Jun 09 08:25:50 cluster1-vault01 vault[14227]: 2020-06-09T08:25:50.868Z [ERROR] core: key rotation periodic upgrade check failed: error="gocql: no hosts available in the pool"
Jun 09 08:26:00 cluster1-vault01 vault[14227]: 2020-06-09T08:26:00.868Z [ERROR] core: key rotation periodic upgrade check failed: error="gocql: no hosts available in the pool"
Jun 09 08:26:10 cluster1-vault01 vault[14227]: 2020-06-09T08:26:10.868Z [ERROR] core: key rotation periodic upgrade check failed: error="gocql: no hosts available in the pool"
Jun 09 08:26:20 cluster1-vault01 vault[14227]: 2020-06-09T08:26:20.868Z [ERROR] core: key rotation periodic upgrade check failed: error="gocql: no hosts available in the pool"
Jun 09 08:26:30 cluster1-vault01 vault[14227]: 2020-06-09T08:26:30.869Z [ERROR] core: key rotation periodic upgrade check failed: error="gocql: no hosts available in the pool"

To Reproduce
Steps to reproduce the behavior:

  1. Run vault server
  2. Wait

Expected behavior
Vault should recover (reconnect to Cassandra?)

Environment:

  • Vault Server Version (retrieve with vault status):
    Version 1.4.2
  • Vault CLI Version (retrieve with vault version):
    Version 1.4.2
  • Server Operating System/Architecture:
    Ubuntu 18.04.4 LTS / x86_64

Vault server configuration file(s):

cluster_name = "dc1"
max_lease_ttl = "768h"
default_lease_ttl = "768h"
disable_clustering = "False"
cluster_addr = "https://cluster1-vault01.mydomain.com:8201"
api_addr = "https://cluster1-vault01.mydomain.com:8200"

plugin_directory = "/usr/local/lib/vault/plugins"

listener "tcp" {
  address = "192.168.1.2:8200"
  cluster_address = "192.168.1.2:8201"
  tls_cert_file = "/etc/vault/tls/global.mydomain.com.crt"
  tls_key_file = "/etc/vault/tls/global.mydoamin.com.key"
  tls_client_ca_file="/etc/vault/tls/MyDomain.crt"
  tls_min_version  = "tls12"
  tls_prefer_server_cipher_suites = "false"
  tls_disable = "false"
}

backend "cassandra" {
  hosts = "dev-cassandra.mydomain.com"
  consistency = "LOCAL_QUORUM"
  protocol_version = "4"
  username = "vault"
  password = "XXXXXX"
  tls = "1"
  pem_bundle_file = "/etc/vault/tls/gossip.pem"
  tls_skip_verify = "1"
  connection_timeout = "5"
}

ha_backend "consul" {
  address = "127.0.0.1:8500"
  path = "vault"
  service = "vault"
  scheme = "http"
}

ui = true

telemetry {
    prometheus_retention_time = "180s"
}

Additional context
Cluster was build on top of 6 nodes. For now we have only one test vault agent which pull single secret so traffic is very low.

@ncabatoff ncabatoff added bug Used to indicate a potential bug storage/cassandra labels Jun 9, 2020
@jacekjaros
Copy link
Author

Hi,

Good news - i was able to find root cause of my issue. Cassandra is passing to client (Vault) list of servers which contain private ip addresses which are not accessable form Vault cluster.
I'm aware that this is Cassandra miss configuration however Vault don't allow me to use use walk around provided by gocql driver which is set DisableInitialHostLookup option to true.

Is there option to implement this parameter in Vault configuration?

Best regards,
Jacek

@kilocaleb
Copy link
Contributor

Hi,

Looks like this option is very helpful on a lot Vault + Cassandra deployments (especially in AWS). Created PR for that
#9733

--
kilocaleb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Used to indicate a potential bug storage/cassandra
Projects
None yet
Development

No branches or pull requests

3 participants