Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External key wrapping for keyring #14852

Closed
schmichael opened this issue Oct 7, 2022 · 1 comment · Fixed by #23580
Closed

External key wrapping for keyring #14852

schmichael opened this issue Oct 7, 2022 · 1 comment · Fixed by #23580

Comments

@schmichael
Copy link
Member

Nomad 1.4.0 shipped with Variables which are stored encrypted with server-specific keys.

The next step toward securing variables would be to allow the use of external root encryption keys (the key-encrypting-key used to decrypt the server-local data-encryption-key) so that they are never on the server either in memory or on disk.

Luckily HashiCorp has a library for this and we already use it: https://github.com/hashicorp/go-kms-wrapping 🎉

Enabling the use of external KMSes (key management services) should be as easy as plumbing through the configuration options, testing, and documentation. Supporting Vault and major cloud KMSes being the primary targets for support.

Security Model

The end result of this change is that servers would be safe from offline attacks to retrieve variables. For example in Nomad 1.4 if an attacker has root access on a server, even if that server is not running, the attacker can access the plaintext KEK (key-encryption-key), decrypt the DEK (data-encryption-key), and then decrypt all variables.

If Nomad used a KMS as proposed in this issue the KEK would not be present on the server regardless of whether the Nomad agent was running or not. The KEK would never leave the KMS so that the DEK could not be decrypted without compromising the KMS itself, and therefore no variables could be decrypted.

Shortcomings

Nomad operators should still consider root* access to online Nomad servers as having full access to all Nomad Variables. There are a variety of ways root could access variables even when a KMS was in use:

  1. root users can reset ACL management tokens and management tokens have access to all variables.
  2. root users have access to the running Nomad server agent's memory and therefore could extract the DEK.
  3. Most cloud KMSes use intrinsic instance identity/credentials for auth and therefore obtaining root on the Nomad server instance may grant use of the KEK via the KMS.

I'm sure a motivated attacker can come up with even more ways to extract secrets from a live Nomad server agent, but the point is that while using a KMS would significantly protect variables from a wide variety of attacks, it is not a panacea. Specifically Shortcoming #3 above is something Nomad operators would need to prevent as Nomad has no ability to deauth or unenroll itself from a KMS on server shutdown.

* or whatever user the Nomad server agent is running as. root is not required (or recommended) for Nomad server agents.

Out of scope

Nomad could rely on a KMS for per-variable encryption and decryption to prevent extracting the plaintext DEK from memory, but this would only prevent Shortcoming #2 above: the other shortcomings would remain. Relying on an external KMS for every variable operation would also incur significant performance and potentially monetary costs for operators.

If this level of security is desirable for a variable it is recommended that variable be stored in Vault or the KMS directly. The costs of using a KMS for every Nomad variable operation vastly outweigh the benefits when compared to an app using a KMS directly.

@schmichael schmichael added this to the 1.6.0 milestone Mar 8, 2023
@schmichael schmichael self-assigned this Mar 8, 2023
tgross added a commit that referenced this issue Mar 27, 2023
When cluster administrators restore from Raft snapshot, they also need to ensure the
keyring is in place. For on-prem users doing in-place upgrades this is less of a
concern but for typical cloud workflows where the whole host is replaced, it's
an important warning (at least until #14852 has been implemented).
tgross added a commit that referenced this issue Mar 28, 2023
When cluster administrators restore from Raft snapshot, they also need to ensure the
keyring is in place. For on-prem users doing in-place upgrades this is less of a
concern but for typical cloud workflows where the whole host is replaced, it's
an important warning (at least until #14852 has been implemented).
tgross added a commit that referenced this issue Mar 28, 2023
When cluster administrators restore from Raft snapshot, they also need to ensure the
keyring is in place. For on-prem users doing in-place upgrades this is less of a
concern but for typical cloud workflows where the whole host is replaced, it's
an important warning (at least until #14852 has been implemented).
tgross added a commit that referenced this issue Mar 28, 2023
When cluster administrators restore from Raft snapshot, they also need to ensure the
keyring is in place. For on-prem users doing in-place upgrades this is less of a
concern but for typical cloud workflows where the whole host is replaced, it's
an important warning (at least until #14852 has been implemented).

Co-authored-by: Tim Gross <[email protected]>
tgross added a commit that referenced this issue Apr 28, 2023
When cluster administrators restore from Raft snapshot, they also need to ensure the
keyring is in place. For on-prem users doing in-place upgrades this is less of a
concern but for typical cloud workflows where the whole host is replaced, it's
an important warning (at least until #14852 has been implemented).
@tgross tgross modified the milestones: 1.6.0, 1.6.x Jun 21, 2023
@tgross tgross modified the milestones: 1.6.x, 1.7.x Oct 27, 2023
@tgross tgross modified the milestones: 1.7.x, 1.8.x Jun 4, 2024
@tgross tgross added the hcc/jira label Jul 2, 2024
@tgross tgross assigned tgross and unassigned schmichael Jul 9, 2024
tgross added a commit that referenced this issue Jul 17, 2024
In Nomad 1.4.0, we shipped support for encrypted Variables and signed Workload
Identities, but the key material is protected only by a AEAD encrypting the
KEK. Add support for Vault transit encryption and external KMS from major cloud
providers. The servers call out to the external service to decrypt each key in
the on-disk keystore.

Ref: https://hashicorp.atlassian.net/browse/NET-10334
Fixes: #14852
tgross added a commit that referenced this issue Jul 17, 2024
In #23580 we're implementing support for encrypting Nomad's key material with
external KMS providers or Vault Transit. This changeset breaks out the
documentation from that PR to keep the review manageable and present it to a
wider set of reviewers.

Ref: https://hashicorp.atlassian.net/browse/NET-10334
Ref: #14852
Ref: #23580
tgross added a commit that referenced this issue Jul 17, 2024
In #23580 we're implementing support for encrypting Nomad's key material with
external KMS providers or Vault Transit. This changeset breaks out the E2E
infrastructure and testing from that PR to keep the review manageable.

Ref: https://hashicorp.atlassian.net/browse/NET-10334
Ref: #14852
Ref: #23580
tgross added a commit that referenced this issue Jul 18, 2024
In Nomad 1.4.0, we shipped support for encrypted Variables and signed Workload
Identities, but the key material is protected only by a AEAD encrypting the
KEK. Add support for Vault transit encryption and external KMS from major cloud
providers. The servers call out to the external service to decrypt each key in
the on-disk keystore.

Ref: https://hashicorp.atlassian.net/browse/NET-10334
Fixes: #14852
@tgross tgross modified the milestones: 1.8.x, 1.8.3 Jul 18, 2024
tgross added a commit that referenced this issue Jul 18, 2024
In Nomad 1.4.0, we shipped support for encrypted Variables and signed Workload
Identities, but the key material is protected only by a AEAD encrypting the
KEK. Add support for Vault transit encryption and external KMS from major cloud
providers. The servers call out to the external service to decrypt each key in
the on-disk keystore.

Ref: https://hashicorp.atlassian.net/browse/NET-10334
Fixes: #14852
tgross added a commit that referenced this issue Jul 18, 2024
…) into release/1.8.x (#23620)

In Nomad 1.4.0, we shipped support for encrypted Variables and signed Workload
Identities, but the key material is protected only by a AEAD encrypting the
KEK. Add support for Vault transit encryption and external KMS from major cloud
providers. The servers call out to the external service to decrypt each key in
the on-disk keystore.

Ref: https://hashicorp.atlassian.net/browse/NET-10334
Fixes: #14852

Co-authored-by: Tim Gross <[email protected]>
tgross added a commit that referenced this issue Jul 19, 2024
In #23580 we're implementing support for encrypting Nomad's key material with
external KMS providers or Vault Transit. This changeset breaks out the E2E
infrastructure and testing from that PR to keep the review manageable.

Ref: https://hashicorp.atlassian.net/browse/NET-10334
Ref: #14852
Ref: #23580
tgross added a commit that referenced this issue Jul 19, 2024
In #23580 we're implementing support for encrypting Nomad's key material with
external KMS providers or Vault Transit. This changeset breaks out the
documentation from that PR to keep the review manageable and present it to a
wider set of reviewers.

Ref: https://hashicorp.atlassian.net/browse/NET-10334
Ref: #14852
Ref: #23580
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants