Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Raft TLS Helm examples #12982

Closed
wants to merge 1 commit into from
Closed

Add Raft TLS Helm examples #12982

wants to merge 1 commit into from

Conversation

in0rdr
Copy link

@in0rdr in0rdr commented Nov 1, 2021

We were having a hard time to find the connections/links to the Raft concepts and the discussion on that page related to TLS server name verification, especially during the initial bootstrap phase.

This request creates the links between the examples for Helm Chart usage and the concepts related to the server name verification.

Additionally, it includes the examples that we used to understand the relevance of matching server name in Raft stanza and actual TLS certificate.

It might prove helpful others as well.

Delimitation: No example is given for the third alternative, using a certificate with a CIDR range.

Co-authored-by: Pascal Reeb [email protected]

Co-authored-by: Pascal Reeb <[email protected]>
@vercel vercel bot temporarily deployed to Preview – vault-storybook November 1, 2021 13:20 Inactive
@in0rdr in0rdr marked this pull request as ready for review November 1, 2021 13:24
@in0rdr in0rdr requested a review from taoism4504 as a code owner November 1, 2021 13:24
@hghaf099 hghaf099 added the docs label Feb 4, 2022
@robmonte robmonte self-requested a review February 16, 2022 08:00
@robmonte
Copy link
Member

robmonte commented Feb 16, 2022

Hello @in0rdr - Thank you for your contribution! After reviewing the new document you wrote, I have some questions and feedback:

We followed along with the doc by starting with the two examples linked at the top (build cluster, create TLS certificate). We aren't quite sure what led to the error you described after that, as we did not encounter it. Could you please elaborate on the steps that were taken leading up to the expected error? It would be helpful to include where that error presented itself as well.

Additionally, it would be useful to readers if the doc included more concrete steps that show exactly how and when you are specifying the raft config via the helm chart, as part of the above elaboration.

Lastly, two small code changes. The first is minor, we use shell-session for the terminal language tag instead of just shell.
The second is that the file website/data/docs-nav-data.json actually needs to have this new page linked in it or the website cannot build. It simply needs:

{
   "title": "HA Cluster with TLS",
   "path": "platform/k8s/helm/examples/ha-tls"
},

added in the file with the other HA cluster examples (under "HA Cluster with Raft" and above "HA Enterprise Cluster with Raft"), located here:

)

@in0rdr
Copy link
Author

in0rdr commented Feb 17, 2022

Thanks for the review. I setup a completely new "kind" cluster (single node) to again understand our initial aim with this PR.

We recognized there exists no guide that clearly shows how to setup a Vault cluster on Kubernetes with Raft integrated storage and TLS certificates. The primary purpose of this PR therefore was to address this gap in the documentation and provide guidance on how to setup a Vault Cluster with Helm, Raft integrated storage and TLS.

Now, for the concrete instructions, this will obviously be a mix of the two guides mentioned in the beginning:

However, there are some changes needed to make it work in harmony (last but not least the concrete steps you mentioned).

Following along both guides, I ended up with a mixed value file that looked like this:

global:
  enabled: true
  tlsDisable: false

server:
  injector:
    enabled: false
  ha:
    enabled: true
    raft:
      enabled: true
      config: |
        listener "tcp" {
          address = "[::]:8200"
          cluster_address = "[::]:8201"
          tls_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
          tls_key_file  = "/vault/userconfig/vault-server-tls/vault.key"
          tls_client_ca_file = "/vault/userconfig/vault-server-tls/vault.ca"
        }

        storage "raft" {
          path = "/vault/data"
        }
  
        #storage "file" {
        #  path = "/vault/data"
        #}
  extraEnvironmentVars:
    VAULT_CACERT: /vault/userconfig/vault-server-tls/vault.ca

  extraVolumes:
    - type: secret
      name: vault-server-tls # Matches the ${SECRET_NAME} from above

  # Affinity Settings
  # Commenting out or setting as empty the affinity variable, will allow
  # deployment to single node services such as Minikube
  # This should be either a multi-line string or YAML matching the PodSpec's affinity field.
  affinity:


#  standalone:
#    enabled: true

Did you end up with something similar?

How I constructed this YAML:

  • I started the YAML from the starter file given in standalone-tls#3-helm-configuration
  • Then I made a short "excursion" into the other guide ha-with-raft to verify that we need to set the ha.raft.config section in order to override the default ha.raft.config.
  • The above ha.raft.config is equal to the default config, with the additional tls_ parameters in the listener stanza, as recommended by the standalone-tls document
  • One minor detail was to disable the default affinity setting that assumed I had a real multi-node k8s cluster running, which I was not. I think that will become clear to the reader also from the comment in the default values.yml file. But while we are at it, we might as well add an explicit comment here in the guide to point out the possibility of single-node cluster for development setups and quickly reference the affinity configuration (if this exists somewhere).
  • Moreover, I clearly needed to decide now to disable the standalone config section, because there can only be one storage backend configured. I tried to run the Helm chart without any storage config section first. But that did not work, because the Vault Pod showed "A storage backend must be specified" w/ "CrashLoopBackOff".
  • Because of the above mentioned choice to use ha.raft (not only ha but also with raft), we clearly need to specify the config section inside ha.raft and not in standalone (as suggested in the standalone-tls doc), but that change should be straight forward and completely logical.
  • Last but not least, we clearly need to tlsDisable: false and skip the tls_disable = true in the listener stanza, but again, this will be obvious, since we really want to enable TLS in this instance

One further note. I followed all the steps regarding the creation of the CSR and the signing of the CertificateSigningRequest k8s objects as described in the standalone-tls document, just to make sure, these certificates are actually available in the cluster (the k8s secret holding the three files, ca, server key and cert). I will not go into details here, because this is described at length in the original guide (It might be interesting to mention though that my v1.21.1 kind server version was not happy with a CertificateSigningRequest w/o "signerName", so I decided to go along with the signerName: kubernetes.io/legacy-unknown to make it work, this could probably be addressed in a separate PR).

I hope that you could understand how we came up with the above YAML config file as a basis to reconstruct the error which presented itself regarding the certificates. If not, we should make this very clear initially.

Now, back to the error that we experienced about the TLS certificate only being signed for the k8s cluster internal names, not the Pod name.

If you throw the above YAML at at k8s cluster, this should deploy just fine, also on single-node "kind" clusters:

helm install vault hashicorp/vault -f custom-values.yaml

I skipped the basic setup of Helm and the additional HashiCorp repo here (prerequisite).

We can then go on to initialize and unseal the first cluster node vault-0:

kubectl exec -it vault-0 -- sh
/ $ vault operator init -key-shares=1 -key-threshold=1
/ $ vault operator unseal <UNSEAL_KEY_HERE>

This works fine for the first node, because no communication to other nodes (where the TLS issue appears) is involved.

So we go ahead and join the second Vault Pod, vault-1:

kubectl exec -it vault-1 -- sh
/ $ vault operator raft join https://vault-0.vault-internal:8200
Error joining the node to the Raft cluster: Error making API request.

URL: POST https://127.0.0.1:8200/v1/sys/storage/raft/join
Code: 500. Errors:

* failed to join raft cluster: failed to join any raft leader node

This is were we hit our error, the main reason for opening this PR and were we were not quite clear how to proceed.

The issue at hand will be visible in the log of the node we tried to join, vault-1:

$ kubectl logs vault-1
2022-02-17T22:00:58.001Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault-0.vault-internal:8200
2022-02-17T22:00:58.010Z [WARN]  core: join attempt failed: error="error during raft bootstrap init call: Put \"https://vault-0.vault-internal:8200/v1/sys/storage/raft/bootstrap/challenge\": x509: certificate is valid for vault-server-tls, vault-server-tls.vault-namespace, vault-server-tls.vault-namespace.svc, vault-server-tls.vault-namespace.svc.cluster.local, not vault-0.vault-internal"
2022-02-17T22:00:58.010Z [ERROR] core: failed to join raft cluster: error="failed to join any raft leader node"

As you can see, by default, the join attempt was made with the Pod name "vault-0" which is not part of the certificate Common Name (CN) or Subject Alternate Name (SAN). This is an issue, because during bootstrapping of the Raft cluster, the first request is made to the HTTP API Port of the other node directly, and if the certificate names do not match the requested name this will fail w/ the error above. The general process is also nicely described here:

https://www.vaultproject.io/docs/concepts/integrated-storage#vault-networking-recap

This is also the part, were we would like to clarify the documentation, especially, regarding the configuration of the ha.raft.config.storage section.

Basically, we found two ways to make the configuration work so that we can join the Vault Pods by their Pod names and still use the CSR and the suggested CN/SAN from the existing guide:

  1. Use leader_tls_servername (see code example in this PR). The main thing to be careful here is to make it match to any of the names advertised in the CN/SAN of the Vault server certificate
  1. Use only one retry_join block with reference to the load balancer name. During the bootstrap challenge, the nodes will connect back to the cluster from the outside, through the load balancer, were the CN/SAN should also match the DNS name of the load balancer in most cases. The nodes which are not ready yet (not in 200 unsealed and OK) state will then probably be hit first in nondeterministic order, which is fine, as long as they eventually get joined to the cluster.

Both ways turned out to work just fine, with the first option (leader_tls_servername) looking a bit more deterministic. We experienced some time-outs with option 2 (one retry_join block) during repeated unsealing (succeeds eventually after repeated attempts), so it is probably not the preferred choice. But this might also be specific to our test environment or so ("kind").

But in any case, we found these two configuration options relevant, when attempting to create a cluster with Raft integrated storage and TLS on Kubernetes.

Thanks for pointing out the details regarding the formatting (shell-session, docs-nav-data.json). We can certainly make this work once we discussed the main points of the article and the general story line (also, if it's worth to write down this information in the first place).

@schavis schavis added the docs-abandoned Possibly abandoned docs PR. To be closed by content team if no activity from creator after 6 months label Aug 17, 2023
@schavis
Copy link
Contributor

schavis commented Aug 23, 2023

This PR appears abandoned. If there is no activity on the PR by September 8, it will be closed without merging.

@in0rdr
Copy link
Author

in0rdr commented Aug 25, 2023

Hi @schavis can we please catch up and not silently close this PR here?

Last time I checked, I was waiting for some feedback from your side.

I'm still convinced that this additional deployment example for "HA with Raft storage backend + TLS" in the docs for "Helm chart examples" would be very helpful deployment guidance for anyone trying to deploy a highly available HashiCorp Vault cluster on Kubernetes with TLS.

In that sense, it's the lovechild of the "Standalone with TLS" and the "HA with Raft" example :)

What do you think?

@schavis
Copy link
Contributor

schavis commented Aug 30, 2023

Hi @schavis can we please catch up and not silently close this PR here?

@in0rdr No one is "silently" closing PRs. I explicitly left a comment (plus a healthy window to respond) to determine if the PR was still useful. I'll review the doc and provide feedback today.

@schavis schavis requested review from schavis and removed request for taoism4504 August 30, 2023 17:04
Copy link
Contributor

@schavis schavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for drafting these examples :)

I made suggestions to keep things in line with our style guide and to also help improve the scannability of the document. Let me know if you have any questions or concerns.

Describes how to set up a Raft HA Vault cluster with TLS certificate
---

# Raft HA Server with TLS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Raft HA Server with TLS
# Raft HA server with TLS

Style correction: use sentence case for headings

Comment on lines +11 to +13
Follow the steps from the example [HA Vault Cluster with Integrated Storage](/docs/platform/k8s/helm/examples/ha-with-raft) to build the cluster.

Follow the examples and instructions in [Standalone Server with TLS](/docs/platform/k8s/helm/examples/standalone-tls) to create a TLS certificate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Follow the steps from the example [HA Vault Cluster with Integrated Storage](/docs/platform/k8s/helm/examples/ha-with-raft) to build the cluster.
Follow the examples and instructions in [Standalone Server with TLS](/docs/platform/k8s/helm/examples/standalone-tls) to create a TLS certificate.
## Before you start
1. Follow the steps in [HA Vault Cluster with Integrated Storage](/docs/platform/k8s/helm/examples/ha-with-raft) to build your cluster.
1. Follow the steps in [Standalone Server with TLS](/docs/platform/k8s/helm/examples/standalone-tls) to create a TLS certificate.

Suggest making it clear that these are (essentially) prerequisites to following the instructions you provide below.

Follow the steps from the example [HA Vault Cluster with Integrated Storage](/docs/platform/k8s/helm/examples/ha-with-raft) to build the cluster.

Follow the examples and instructions in [Standalone Server with TLS](/docs/platform/k8s/helm/examples/standalone-tls) to create a TLS certificate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Bootstrap your Raft cluster

Suggest adding a heading to make it obvious what the suggestions below will have folks do


Follow the examples and instructions in [Standalone Server with TLS](/docs/platform/k8s/helm/examples/standalone-tls) to create a TLS certificate.

Before cluster initialization and without proper configuration, the following warning is to be expected:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Before cluster initialization and without proper configuration, the following warning is to be expected:
Without proper configuration, you will see the following warning before cluster initialization:

style correction: write in active voice

core: join attempt failed: error="error during raft bootstrap init call: Put "https://vault-${N}.${SERVICE}:8200/v1/sys/storage/raft/bootstrap/challenge": x509: certificate is valid for ${SERVICE}, ${SERVICE}.${NAMESPACE}, ${SERVICE}.${NAMESPACE}.svc, ${SERVICE}.${NAMESPACE}.svc.cluster.local, not vault-${N}.${SERVICE}"
```

The concepts for [Integrated Storage and TLS](/docs/concepts/integrated-storage#integrated-storage-and-tls) elaborate three possible solutions to mitigate this TLS verification warning and bootstrap the Raft cluster, two of which are discussed in further detail here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The concepts for [Integrated Storage and TLS](/docs/concepts/integrated-storage#integrated-storage-and-tls) elaborate three possible solutions to mitigate this TLS verification warning and bootstrap the Raft cluster, two of which are discussed in further detail here.
The concepts overview for [Integrated Storage and TLS](/docs/concepts/integrated-storage#integrated-storage-and-tls) covers the various options for mitigating TLS verification warnings and bootstraping your Raft cluster.
The examples below demonstrate two specific solutions. Both solutions ensure that the common name (CN) used for the `leader_api_addr` in the Raft stanza matches the name(s) listed in the TLS certificate.

I suggest moving this up to the beginning of the document (before the suggested "Before you start" section) as it's a useful introduction to what your document will cover. Also, I would add the last sentence from your doc here as well since it provides a good summary of what your specific examples do.

Style correction: write in active voice.


The concepts for [Integrated Storage and TLS](/docs/concepts/integrated-storage#integrated-storage-and-tls) elaborate three possible solutions to mitigate this TLS verification warning and bootstrap the Raft cluster, two of which are discussed in further detail here.

The warning disappears when the the [expected TLS servername](/docs/concepts/integrated-storage#autojoin-with-tls-servername) is correctly configured using [`leader_tls_servername`](/docs/configuration/storage/raft#leader_tls_servername) in the Raft stanza (`${CN}` in the example below):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The warning disappears when the the [expected TLS servername](/docs/concepts/integrated-storage#autojoin-with-tls-servername) is correctly configured using [`leader_tls_servername`](/docs/configuration/storage/raft#leader_tls_servername) in the Raft stanza (`${CN}` in the example below):
## Solution 1: Use auto-join and set the TLS server in your Raft configuration
The join warning disappears if you use autojoin and set the expected TLS server name (`${CN}`) with [`leader_tls_servername`](/docs/configuration/storage/raft#leader_tls_servername) in the Raft stanza for your Vault configuration.
For example:

Suggest making it easier for folks scanning the doc to differentiate between the two solutions you're demonstrating.

Style correction: write in active voice


The warning disappears when the the [expected TLS servername](/docs/concepts/integrated-storage#autojoin-with-tls-servername) is correctly configured using [`leader_tls_servername`](/docs/configuration/storage/raft#leader_tls_servername) in the Raft stanza (`${CN}` in the example below):

```hcl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to highlight the important line for folks so they know which part of the Raft config you're talking about. You can highlight the relevant line by wrapping the CodeBlockConfig component around your code example (with whitespace before/after each tag) and noting the specific line(s) you want to highlight. For example:

<CodeBlockConfig highlight="6,14,22">

YOUR_CODE_BLOCK

</CodeBlockConfig>

}
```

Alternatively, specify one `retry_join` which references the [name of the load balancer](/docs/concepts/integrated-storage#load-balancer-instead-of-autojoin) instead:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Alternatively, specify one `retry_join` which references the [name of the load balancer](/docs/concepts/integrated-storage#load-balancer-instead-of-autojoin) instead:
## Solution 2: Add a load balancer to your Raft configuration
If you have a load balancer for your Vault cluster, you can add a single `retry_join` stanza to your Raft configuraiton and use the load balancer address for `leader_api_addr`.
For example:

Same suggestion as above: make it easier on folks scanning quickly to find the start of your second recommended solution

```

Alternatively, specify one `retry_join` which references the [name of the load balancer](/docs/concepts/integrated-storage#load-balancer-instead-of-autojoin) instead:
```hcl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to highlight the relevant line for this example:

<CodeBlockConfig highlight="5">

YOUR_CODE_BLOCK

</CodeBlockConfig>

}
```

Both configuration options ensure that the the common name (CN) used for the `leader_api_addr` in the Raft stanza matches the name(s) listed in the TLS certificate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Both configuration options ensure that the the common name (CN) used for the `leader_api_addr` in the Raft stanza matches the name(s) listed in the TLS certificate.

Suggest moving this statement to earlier in the doc (I added it to a previous suggestion).

@schavis schavis added pr/no-milestone and removed docs-abandoned Possibly abandoned docs PR. To be closed by content team if no activity from creator after 6 months labels Aug 30, 2023
@schavis
Copy link
Contributor

schavis commented Sep 1, 2023

Would you mind closing this PR now the update is captured in PR #22714?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants