Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul backend with TLS: Bad Certificate #4930

Closed
monwolf opened this issue Jul 16, 2018 · 3 comments · Fixed by #8084
Closed

Consul backend with TLS: Bad Certificate #4930

monwolf opened this issue Jul 16, 2018 · 3 comments · Fixed by #8084
Labels
bug Used to indicate a potential bug storage/consul

Comments

@monwolf
Copy link

monwolf commented Jul 16, 2018

Good monring, I'm trying to setup a cluster of vault (v0.10.3) using consul as backend. In this setup I have 2 tipes of consul, 1 node is the server and the other are members of this consul client. When I tried to run vault in the client node I saw this error message:

Jul 16 07:34:33 ildes01 vault: 2018-07-16T07:34:33.293+0200 [WARN ] storage.consul: reconcile unable to talk with Consul backend: error="service registration failed: Put https://127.0.0.1:8500/v1/agent/service/register: remote error: tls: bad certificate"
Jul 16 07:34:34 ildes01 vault: 2018-07-16T07:34:34.064+0200 [WARN ] storage.consul: check unable to talk with Consul backend: error="Put https://127.0.0.1:8500/v1/agent/check/fail/vault:10.10.0.128:8200:vault-sealed-check?note=Vault+Sealed: remote error: tls: bad certificate"

This error didn't happen on the consul server. In the next line I pasted the output of run consul members to show the state of my cluster.

consul_ssl  members
Node              Address            Status  Type    Build  Protocol  DC       Segment
des01      10.10.0.125:8301  alive   server  1.2.0  2        bardock  <all>
ildes01    10.10.0.128:8301  alive   client  1.2.0  2          bardock  <default>

I generated the SSL certificates using cfssl and cfssljson in my ansible playbook:

- name: Generate server private key and certificate
  command: >
     bash -c "echo '{\"CN\":\"{{ item }}\",\"key\":{\"algo\":\"rsa\",\"size\":2048}}' |
     cfssl gencert -ca=consul-ca.pem -ca-key=consul-ca-key.pem
     -config=cfssl.json -hostname=\"{{ item }},{{ item }}.local.com,{{ item }}.node.global.consul,server.global.nomad,localhost,127.0.0.1,{{ hostvars[item]['ansible_default_ipv4']['address'] }}\" -
     | cfssljson -bare server-{{ item }}"
  args:
    chdir: "{{ consul_ssl_dir }}"
  with_items: "{{ groups['server'] }}"
  when: consul_bootstrap

- name: Generate client private key and certificate
  command: >
     bash -c "echo '{\"CN\":\"{{ item }}\",\"key\":{\"algo\":\"rsa\",\"size\":2048}}' |
     cfssl gencert -ca=consul-ca.pem -ca-key=consul-ca-key.pem
     -config=cfssl.json -hostname=\"{{ item }},{{ item }}.local.com,{{ item }}.node.global.consul,client.global.nomad,localhost,127.0.0.1,{{ hostvars[item]['ansible_default_ipv4']['address'] }}\" -
     | cfssljson -bare client-{{ item }}"
  args:
    chdir: "{{ consul_ssl_dir }}"
  with_items:  "{{ groups['client'] }}"
  when: consul_bootstrap

If I inspect with openssl the certificates I'm able to see all de alternetivenames that I provided.

Server certificate:

X509v3 Subject Alternative Name: 
DNS:des01, DNS:des01.local.com, DNS:des01.node.global.consul, DNS:server.global.nomad, DNS:localhost, IP Address:127.0.0.1, IP Address:10.10.0.125

Client certificate:

X509v3 Subject Alternative Name: 
DNS:ildes01, DNS:ildes01.local.com, DNS:ildes01.node.global.consul, DNS:client.global.nomad, DNS:localhost, IP Address:127.0.0.1, IP Address:10.10.0.128

Reproduction Steps

Steps to reproduce this issue, eg:

  1. Create consul client and server with tls verify incoming and outgoing:

Client configuration:

{
    "server": false,
    "node_name": "ildes01",
    "enable_debug": true,
    "datacenter": "bardock",
    "data_dir": "/opt/consul/data",
    "encrypt": "XXXXXX",
    "disable_update_check": true,
    "bind_addr":"0.0.0.0",
    "advertise_addr": "10.10.0.128",
    "addresses": {
        "https": "0.0.0.0"
    },
    "ports": {
        "https": 8500,
        "http": -1
    },
    "key_file": "/opt/consul/ssl/client-ildes01-key.pem",
    "cert_file": "/opt/consul/ssl/client-ildes01.pem",
    "ca_file": "/opt/consul/ssl/consul-ca.pem",
    "verify_incoming": true,
    "verify_outgoing": true,
    "retry_join":[
        "10.10.0.125"
    ]
}

Server configuration:

{
    "bootstrap": true,
        "server": true,
        "node_name": "des01",
    "datacenter": "bardock",
    "data_dir": "/opt/consul/data",
    "encrypt": "XXXX",
    "disable_update_check": true,
    "bind_addr":"0.0.0.0",
    "advertise_addr": "10.10.0.125",
    "addresses": {
        "https": "0.0.0.0"
    },
    "ports": {
        "https": 8500,
        "http": -1
    },
    "key_file": "/opt/consul/ssl/server-des01-key.pem",
    "cert_file": "/opt/consul/ssl/server-des01.pem",
    "ca_file": "/opt/consul/ssl/consul-ca.pem",
    "verify_incoming": true,
    "verify_outgoing": true,
    "retry_join":[
        "10.10.0.125"
    ]
}
  1. Create vault configuration on each node:

Server config:

storage "consul" {
  address = "127.0.0.1:8500"
  path = "vault/"
  token = "XXXX"
  scheme = "https"
  tls_skip_verify = 0
    tls_cert_file = "/opt/consul/ssl/server-des01.pem"
  tls_key_file = "/opt/consul/ssl/server-des01-key.pem"
    tls_ca_file = "/opt/consul/ssl/consul-ca.pem"
}

listener "tcp" {
  address = "0.0.0.0:8200"
  cluster_address  = "10.10.0.125:8201"
  tls_disable = 0
    tls_cert_file = "/opt/consul/ssl/server-des01.pem"
  tls_key_file = "/opt/consul/ssl/server-des01-key.pem"
  }

api_addr = "https://10.10.0.125:8200"
cluster_addr = "https://10.10.0.125:8201"

ui=true

Client config:

storage "consul" {
  address = "127.0.0.1:8500"
  path = "vault/"
  token = "XXXXX"
  scheme = "https"
  tls_skip_verify = 0
    tls_cert_file = "/opt/consul/ssl/client-ildes01.pem"
  tls_cert_file = "/opt/consul/ssl/client-ildes01-key.pem"
    tls_ca_file = "/opt/consul/ssl/consul-ca.pem"
}

listener "tcp" {
  address = "0.0.0.0:8200"
  cluster_address  = "10.10.0.128:8201"
  tls_disable = 0
    tls_cert_file = "/opt/consul/ssl/client-ildes01.pem"
  tls_key_file = "/opt/consul/ssl/client-ildes01-key.pem"
  }

api_addr = "https://10.10.0.128:8200"
cluster_addr = "https://10.10.0.128:8201"
  1. Start vault:

/usr/bin/vault server -config=/opt/vault/conf

Log Fragments

After run vault in the client node I saw this logs:

Jul 16 07:34:29 ildes01 systemd: Started Vault Service.
Jul 16 07:34:29 ildes01 systemd: Starting Vault Service...
Jul 16 07:34:29 ildes01 vault: ==> Vault server configuration:
Jul 16 07:34:29 ildes01 vault: Api Address: https://10.10.0.128:8200
Jul 16 07:34:29 ildes01 vault: Cgo: disabled
Jul 16 07:34:29 ildes01 vault: Cluster Address: https://10.10.0.128:8201
Jul 16 07:34:29 ildes01 vault: Listener 1: tcp (addr: "0.0.0.0:8200", cluster address: "10.10.0.128:8201", tls: "enabled")
Jul 16 07:34:29 ildes01 vault: Log Level: info
Jul 16 07:34:29 ildes01 vault: Mlock: supported: true, enabled: true
Jul 16 07:34:29 ildes01 vault: Storage: consul (HA available)
Jul 16 07:34:29 ildes01 vault: Version: Vault v0.10.3
Jul 16 07:34:29 ildes01 vault: Version Sha: c69ae68faf2bf7fc1d78e3ec62655696a07454c7
Jul 16 07:34:29 ildes01 vault: ==> Vault server started! Log data will stream in below:
Jul 16 07:34:29 ildes01 vault: 2018-07-16T07:34:29.213+0200 [WARN ] storage.consul: reconcile unable to talk with Consul backend: error="service registration failed: Put https://127.0.0.1:8500/v1/agent/service/register: remote error: tls: bad certificate"
Jul 16 07:34:30 ildes01 vault: 2018-07-16T07:34:30.231+0200 [WARN ] storage.consul: reconcile unable to talk with Consul backend: error="service registration failed: Put https://127.0.0.1:8500/v1/agent/service/register: remote error: tls: bad certificate"
Jul 16 07:34:31 ildes01 vault: 2018-07-16T07:34:31.250+0200 [WARN ] storage.consul: reconcile unable to talk with Consul backend: error="service registration failed: Put https://127.0.0.1:8500/v1/agent/service/register: remote error: tls: bad certificate"
Jul 16 07:34:32 ildes01 vault: 2018-07-16T07:34:32.272+0200 [WARN ] storage.consul: reconcile unable to talk with Consul backend: error="service registration failed: Put https://127.0.0.1:8500/v1/agent/service/register: remote error: tls: bad certificate"
Jul 16 07:34:33 ildes01 vault: 2018-07-16T07:34:33.293+0200 [WARN ] storage.consul: reconcile unable to talk with Consul backend: error="service registration failed: Put https://127.0.0.1:8500/v1/agent/service/register: remote error: tls: bad certificate"
Jul 16 07:34:34 ildes01 vault: 2018-07-16T07:34:34.064+0200 [WARN ] storage.consul: check unable to talk with Consul backend: error="Put https://127.0.0.1:8500/v1/agent/check/fail/vault:10.10.0.128:8200:vault-sealed-check?note=Vault+Sealed: remote error: tls: bad certificate"

May be I need some other SAN or flag in the certificate? I spend few hours reviewing your documentation for my alls seems good, but It don't start. Could you help me with this issue?

@jefferai
Copy link
Member

The Consul logs will likely have a more detailed explanation of the problem.

@monwolf
Copy link
Author

monwolf commented Jul 17, 2018

Good monring thanks for the advise, I tried to execute consul in "trace mode" but I'm not able to see anything wrong:

 /usr/bin/consul agent -config-dir=/opt/consul/conf -log-level=trace
WARNING: LAN keyring exists but -encrypt given, using keyring
==> Starting Consul agent...
==> Consul agent running!
           Version: 'v1.2.0'
           Node ID: 'c6e560ae-551c-1dc9-41f6-aaaed240cff3'
         Node name: 'ildes01'
        Datacenter: 'bardock' (Segment: '')
            Server: false (Bootstrap: false)
       Client Addr: [127.0.0.1] (HTTP: -1, HTTPS: 8500, DNS: 8600)
      Cluster Addr: 10.10.0.128 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: true, TLS-Outgoing: true, TLS-Incoming: true

==> Log data will now stream in as it occurs:

    2018/07/17 07:42:36 [INFO] serf: EventMemberJoin: ildes01 10.10.0.128
    2018/07/17 07:42:36 [DEBUG] agent: restored service definition "_nomad-task-uya2ltnegrulmybmhl6e7f3krkqypb2b" from "/opt/consul/data/services/1eeec430722bec3dc8bc18122a17917c"
    2018/07/17 07:42:36 [DEBUG] agent: restored service definition "_nomad-client-iisg2jfy4ykv57yhf2oxzqrrxfdeghy4" from "/opt/consul/data/services/dbca3984643c6d3aaabc42121670215d"
    2018/07/17 07:42:36 [DEBUG] agent: restored health check "9e71d1d465ef90c6d1ce95ec006a390969014166" from "/opt/consul/data/checks/052736bd31672306e8254efc01cfc810"
    2018/07/17 07:42:36 [DEBUG] agent/proxy: managed Connect proxy manager started
    2018/07/17 07:42:36 [WARN] agent/proxy: running as root, will not start managed proxies
    2018/07/17 07:42:36 [INFO] agent: Started DNS server 127.0.0.1:8600 (tcp)
    2018/07/17 07:42:36 [INFO] agent: Started DNS server 127.0.0.1:8600 (udp)
    2018/07/17 07:42:36 [INFO] agent: Started HTTPS server on [::]:8500 (tcp)
    2018/07/17 07:42:36 [INFO] agent: started state syncer
    2018/07/17 07:42:36 [INFO] agent: Retry join LAN is supported for: aliyun aws azure digitalocean gce os scaleway softlayer triton
    2018/07/17 07:42:36 [INFO] agent: Joining LAN cluster...
    2018/07/17 07:42:36 [INFO] agent: (LAN) joining: [10.10.0.125]
    2018/07/17 07:42:36 [WARN] manager: No servers available
    2018/07/17 07:42:36 [ERR] agent: failed to sync remote state: No known Consul servers
    2018/07/17 07:42:36 [DEBUG] memberlist: Initiating push/pull sync with: 10.10.0.125:8301
    2018/07/17 07:42:36 [WARN] memberlist: Refuting a suspect message (from: ildes01)
    2018/07/17 07:42:36 [INFO] serf: EventMemberJoin: des01 10.10.0.125
    2018/07/17 07:42:36 [DEBUG] serf: Refuting an older leave intent
    2018/07/17 07:42:36 [INFO] agent: (LAN) joined: 1 Err: <nil>
    2018/07/17 07:42:36 [DEBUG] agent: systemd notify failed: No socket
    2018/07/17 07:42:36 [INFO] agent: Join LAN completed. Synced with 1 initial agents
    2018/07/17 07:42:36 [INFO] consul: adding server des01 (Addr: tcp/10.10.0.125:8300) (DC: bardock)
    2018/07/17 07:42:36 [DEBUG] http: Request GET /v1/kv/config/openid-server.properties?recurse&wait=55s&index=110507 (22.661137ms) from=172.17.0.2:54594
    2018/07/17 07:42:36 [DEBUG] http: Request GET /v1/kv/config/openid-server.yaml?recurse&wait=55s&index=110507 (1.256058ms) from=172.17.0.2:54598
    2018/07/17 07:42:36 [DEBUG] serf: messageJoinType: ildes01
    2018/07/17 07:42:36 [DEBUG] serf: messageJoinType: ildes01
    2018/07/17 07:42:36 [DEBUG] serf: messageJoinType: ildes01
    2018/07/17 07:42:36 [DEBUG] serf: messageJoinType: ildes01
    2018/07/17 07:42:37 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically
    2018/07/17 07:42:37 [INFO] agent: Synced service "_nomad-task-uya2ltnegrulmybmhl6e7f3krkqypb2b"
    2018/07/17 07:42:37 [INFO] agent: Synced service "_nomad-client-iisg2jfy4ykv57yhf2oxzqrrxfdeghy4"
    2018/07/17 07:42:37 [DEBUG] agent: Check "9e71d1d465ef90c6d1ce95ec006a390969014166" in sync
    2018/07/17 07:42:37 [DEBUG] agent: Node info in sync
    2018/07/17 07:42:37 [DEBUG] agent: Service "_nomad-task-uya2ltnegrulmybmhl6e7f3krkqypb2b" in sync
    2018/07/17 07:42:37 [DEBUG] agent: Service "_nomad-client-iisg2jfy4ykv57yhf2oxzqrrxfdeghy4" in sync
    2018/07/17 07:42:37 [DEBUG] agent: Check "9e71d1d465ef90c6d1ce95ec006a390969014166" in sync
    2018/07/17 07:42:37 [DEBUG] agent: Node info in sync
    2018/07/17 07:42:38 [DEBUG] agent: Check "9e71d1d465ef90c6d1ce95ec006a390969014166" is passing
    2018/07/17 07:42:38 [DEBUG] agent: Service "_nomad-task-uya2ltnegrulmybmhl6e7f3krkqypb2b" in sync
    2018/07/17 07:42:38 [DEBUG] agent: Service "_nomad-client-iisg2jfy4ykv57yhf2oxzqrrxfdeghy4" in sync
    2018/07/17 07:42:38 [INFO] agent: Synced check "9e71d1d465ef90c6d1ce95ec006a390969014166"
    2018/07/17 07:42:38 [DEBUG] agent: Node info in sync
    2018/07/17 07:42:38 [DEBUG] memberlist: Stream connection from=10.10.0.130:32429
    2018/07/17 07:42:43 [DEBUG] memberlist: Stream connection from=10.10.0.127:61038

@monwolf
Copy link
Author

monwolf commented Jul 31, 2018

Sorry for the delay, I was on holiday. I've been able to discover the issue, I had a typo in my config file:

  tls_cert_file = "/opt/consul/ssl/client-ildes01.pem"
  tls_cert_file = "/opt/consul/ssl/client-ildes01-key.pem"

I doubled the property tls_cert_file without set tls_cert_file . I think this behaviour could be "handled" by the application and show a warning message when you put a certificate without key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Used to indicate a potential bug storage/consul
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants