Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workload identity : lack of usable user_claim when using Nomad namespaces and Vault entities #23510

Closed
the-nando opened this issue Jul 6, 2024 · 11 comments · Fixed by #23675
Closed
Assignees
Labels
hcc/jira stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/workload-identity type/enhancement

Comments

@the-nando
Copy link
Contributor

I'm working on migrating some clusters from the legacy Vault token based integration to the new workload identity based one.

My aim is to be able to create a single Vault entity per workload, set entity specific policies and use that in addition to the generic role's token policy.

The tutorial suggests to use "user_claim": "/nomad_job_id" and a templated Vault policy utilising the claim mapped metadata, something along the lines of:

path "secrets/data/{{identity.entity.aliases.AUTH_METHOD_ACCESSOR.metadata.nomad_namespace}}/{{identity.entity.aliases.AUTH_METHOD_ACCESSOR.metadata.nomad_job_id}}" {
  capabilities = ["read"]
}

To cater for jobs which may require additional ad-hoc policies, I want to pre-create Vault identities for workloads that will have one or more additional identity policies.
To get this to work I would use an entity-alias based on the user_claim to map it to that entity. This would allow me to setup a default token workload policy, like in the tutorial, with templated paths and for any exception I could just create a policy with the same name as the one we assign to the entity.

The problem is that the user_claim isn't unique when one uses /nomad_job_id in combination with Nomad namespace as the Job ID isn't unique within a Nomad cluster.
The implication on the Vault side is that any job by the same name will get assigned the same implied identity which is a potential security risk and that could lead to unintended access to Vault resources.

A workaround is to create a Vault JWT role per workload and configure bound_claims:

"bound_claims": {
  "nomad_namespace": "myns",
  "nomad_job_id": "myjob"
}

But this invalidates completely the features of Vault entity management. Furthermore, to my knowledge, a JWT user claim must be unique within the system. It would be perhaps better to recommend users to use "user_claim": "/sub" if they don't intend to use bound_claims.

What I would like, is to be able to use a unique claim, something like nomad_workload_id: "<namespace>:::<job_id>" which can then be leverage on the Vault side to configure entities and aliases accordingly. "/sub" wouldn't work as it contains additional details, like region/taskgroup/task/identity, which are something Vault operator may not know upfront for each job.
Can such user_claim be made available?

@tgross
Copy link
Member

tgross commented Jul 9, 2024

Hi @the-nando! So just to summarize the problem as you see it here, it's not that bound claims don't work, but that the user_claim on the Vault side can't be composed from multiple fields (i.e. Nomad namespace + Nomad job ID)?

(For what it's worth, I suspect our intent here is that there's a 1:1 mapping between Nomad namespace and Vault namespace, but I realize that's not always going to be feasible. Especially because Nomad namespaces are in CE and Vault namespaces are in ENT.)

@the-nando
Copy link
Contributor Author

Hey @tgross 👋 Bound claims works as intended but user_claim doesn't allow me to easily and uniquely identify a (job,namespace) on the Vault side without resorting to /sub which carries more information making its use impractical from a Vault operator point of view (identity aliases, etc.).

It would also be worth adding a note in the tutorial mentioning the possible implications of using /job_id in combinations with namespaces or, perhaps, suggest to use /sub instead.

@tgross tgross added hcc/jira stage/accepted Confirmed, and intend to work on. No timeline committment though. labels Jul 9, 2024
@tgross tgross moved this from Needs Triage to Needs Roadmapping in Nomad - Community Issues Triage Jul 9, 2024
@tgross
Copy link
Member

tgross commented Jul 9, 2024

Ok thanks @the-nando. I'll get this surfaced for roadmapping.

@tgross tgross moved this from Needs Roadmapping to In Progress in Nomad - Community Issues Triage Jul 23, 2024
@tgross tgross self-assigned this Jul 23, 2024
tgross added a commit that referenced this issue Jul 23, 2024
When defining Vault entities the `user_claim` must be unique. When writing Vault
binding rules for use with Nomad workload identities the binding rule won't be
able to create a 1:1 mapping because the selector language allows accessing only
a single field. The `nomad_job_id` claim isn't sufficient to uniquely identify a
job because of namespaces. It's possible to create a JWT auth role with
`bound_claims` to avoid this becoming a security problem, but this doesn't allow
for correct accounting of user claims.

Add a new claim `nomad_workload_id` that uniquely identifies a Nomad job by
using the namespaced job ID (with a separator that cannot appear inside a
namespace name). This will allow any external consumer of WI to use a single
claim field for binding rules, so long as that consumer is ok with sharing the
binding rule across groups within a job or tasks within a group (at which point
they'll need to go look at the task/service fields).

Fixes: #23510
Ref: https://hashicorp.atlassian.net/browse/NET-10372
Ref: https://hashicorp.atlassian.net/browse/NET-10387
@tgross
Copy link
Member

tgross commented Jul 23, 2024

I've got a draft PR up here #23675. The implementation is easy, but I want to do some testing with Vault to make sure it's getting us what we want so that'll need some E2E testing.

@schmichael
Copy link
Member

without resorting to /sub which carries more information making its use impractical from a Vault operator point of view (identity aliases, etc.)

Hi @the-nando, I was wondering if you could elaborate on this. I can understand that agreeing upon a fully qualified name ahead of time might be a hassle, but I'm worried about the security implications of relying purely on <namespace>:<job> as multiple regions may have overlapping namespaces and job names (especially in circumstances where there are dev/staging/prod clusters; I'd hate for a misconfiguration to end up granting prod Vault access to dev region Jobs).

If we do add a new field would it make sense to make it <region>:<namespace>:<job> to "fully" namespace the identity from Nomad's perspective?

Out of curiosity would #19438 (custom claims) also address this? It would not prevent multiple jobs from sharing a value, but perhaps there's no concern with job submitters being able to do that.

If custom claims would address your use case, I have a slight preference for it since it seems very difficult to articulate to users when to use sub vs nomad_job_id vs the new nomad_workload_id. Any claim we add to Nomad also has to live more or less forever, so I'd like to be very confident and conservative in what we hardcode.

@tgross
Copy link
Member

tgross commented Jul 25, 2024

Out of curiosity would #19438 (custom claims) also address this? It would not prevent multiple jobs from sharing a value, but perhaps there's no concern with job submitters being able to do that.

Custom claims as described in #19438 could totally solve it but they make the ergonomics for job authors not very nice, as now the job author is responsible for describing the claim for all their jobs. Maybe not bad for "this one job needs it" but if there was a case where many many jobs need third-party auth that needs a claim like $region:$namespace:$job it becomes painful for authors.

However, along those lines what if we made this a server configuration? Ex. cluster administrators could specify extra claims in their vault.$cluster_identity block or a new server.identity block. Then the extra claims would be applied to all identities signed without the job author getting involved. It'd need to have some kind of templating over the allocation/job. Something like this:

server {
  identity {
    extra_claims = {
      "example"  = "${region}:${namespace}:${id}"
      "whatever" = "${region}:${namespace}:${id}"
    }
  }
}

vault {
  default_identity {
    aud = ["vault.io"]
    ttl = 1h
    extra_claims = {
      "example"  = "${region}:${namespace}:${id}"
      "whatever" = "${region}:${namespace}:${id}"
    }
  }
}

If we did this, we could allow job authors to have identity.extra_claims blocks too so they can override the default. But that lets job authors have their jobs masquerade as other jobs. Which sounds bad?

@the-nando
Copy link
Contributor Author

the-nando commented Jul 25, 2024

Hi @schmichael /sub includes region/taskgroups/task/identity which is something often not known by Vault operators upfront for a given job and makes pre-provisioning Vault identity-aliases cumbersome.
I'm basically after the simplest unique (within a federated Nomad cluster) user_claim which would allow me to identify a given Nomad job in Vault for the purpose of provisioning entities and entity aliases in a similar manner to what @tgross did for the E2E test in #23675.

but I'm worried about the security implications of relying purely on : as multiple regions may have overlapping namespaces and job names (especially in circumstances where there are dev/staging/prod clusters; I'd hate for a misconfiguration to end up granting prod Vault access to dev region Jobs).

I do have overlapping namespaces and job names across clusters but they are connected to different Vault clusters.
Within a single cluster I treat, as far as Vault access is concerned, all (namespace,job) the same regardless of which region they run into.
Prefixing the claim by region would require additional entity-aliases but it's something I can live with.

Out of curiosity would #19438 (custom claims) also address this? It would not prevent multiple jobs from sharing a value, but perhaps there's no concern with job submitters being able to do that.

@tgross thanks for the input on the custom claims, your answer sums up my point of view as well. A generic solution for custom claims is more versatile and welcome, as long as that be can be controlled at server's configuration level as well. Introducing changes to job specs is often a non-trivial exercise when running hundreds of them deployed by different teams.
IMHO being able to configure identity claims in the job spec is a security hazard when coupled with the Vault integration and templated policies. I understand the point being discussed in #19438 in reference to how a job used to be able to pass arbitrary policies but that doesn't make it less of a potential problem. In my setup I already use Sentinel to control which policies a given job can specify and I can easily extend that to forbid configuring extra_claims at job level.

@tgross
Copy link
Member

tgross commented Jul 29, 2024

Ok, so @schmichael and I had a chat and I think we've settled on the idea of introducing a extra claims block that accepts template strings in the server configuration. So in the Vault block you'll do something like this:

vault {
  address = "https://vault.example.com:8200"
  enabled = true

  default_identity {
    aud = ["vault.io"]
    ttl = "1h"
    extra_claims {
      nomad_workload_id = "${job.namespace}:${job.id}"
      some_other_claim  = "foo"
    }
  }
}

We'll need to do a little investigation to see the exact objects we can expose in those templates, but that's the gist of things.

This allows us to avoid adding lots more claims to the JWT that some users might not need, while giving cluster admins the flexibility they need to meet their requirements for controls. We'll also probably want to add the same feature for a top-level server.default_identity, but we can do that in follow-up work. That'll cover a lot of the remaining use cases described in #19438.

tgross added a commit that referenced this issue Jul 31, 2024
Upcoming work to add extensibility to identity claims for Vault (ref #23510)
will require exposing server configuration and more objects from state to the
process of creating an `IdentityClaims` struct.

Depending on how we inject these parameters into the constructor, we end up
creating circular dependencies or a lot more logic in the setup in the plan
applier and alloc endpoint. There are three contexts where we call
`NewIdentityClaims`: the plan applier (where we only care about the default
identity), signing task identities, and signing service identities. Each needs
different parameters. So we'll refactor the constructor as a builder with
methods that the caller can decide to use (or not) depending on context. I've
pulled this work out of #23675 to make it easier to review separately.

Ref: #23510
Ref: #23675
Ref: https://hashicorp.atlassian.net/browse/NET-10372
Ref: https://hashicorp.atlassian.net/browse/NET-10387
tgross added a commit that referenced this issue Aug 5, 2024
Upcoming work to add extensibility to identity claims for Vault (ref #23510)
will require exposing server configuration and more objects from state to the
process of creating an `IdentityClaims` struct.

Depending on how we inject these parameters into the constructor, we end up
creating circular dependencies or a lot more logic in the setup in the plan
applier and alloc endpoint. There are three contexts where we call
`NewIdentityClaims`: the plan applier (where we only care about the default
identity), signing task identities, and signing service identities. Each needs
different parameters. So we'll refactor the constructor as a builder with
methods that the caller can decide to use (or not) depending on context. I've
pulled this work out of #23675 to make it easier to review separately.

Ref: #23510
Ref: #23675
Ref: https://hashicorp.atlassian.net/browse/NET-10372
Ref: https://hashicorp.atlassian.net/browse/NET-10387
tgross added a commit that referenced this issue Aug 5, 2024
Although we encourage users to use Vault roles, sometimes they're going to want
to assign policies based on entity and pre-create entities and aliases based on
claims. This allows them to use single default role (or at least small number of
them) that has a templated policy, but have an escape hatch from that.

When defining Vault entities the `user_claim` must be unique. When writing Vault
binding rules for use with Nomad workload identities the binding rule won't be
able to create a 1:1 mapping because the selector language allows accessing only
a single field. The `nomad_job_id` claim isn't sufficient to uniquely identify a
job because of namespaces. It's possible to create a JWT auth role with
`bound_claims` to avoid this becoming a security problem, but this doesn't allow
for correct accounting of user claims.

Add support for an `extra_claims` block on the server's `default_identity`
blocks for Vault. This allows a cluster administrator to add a custom claim on
all allocations. The values for these claims are interpolatable with a limited
subset of fields, similar to how we interpolate the task environment.

Fixes: #23510
Ref: https://hashicorp.atlassian.net/browse/NET-10372
Ref: https://hashicorp.atlassian.net/browse/NET-10387
tgross added a commit that referenced this issue Aug 5, 2024
)

Although we encourage users to use Vault roles, sometimes they're going to want
to assign policies based on entity and pre-create entities and aliases based on
claims. This allows them to use single default role (or at least small number of
them) that has a templated policy, but have an escape hatch from that.

When defining Vault entities the `user_claim` must be unique. When writing Vault
binding rules for use with Nomad workload identities the binding rule won't be
able to create a 1:1 mapping because the selector language allows accessing only
a single field. The `nomad_job_id` claim isn't sufficient to uniquely identify a
job because of namespaces. It's possible to create a JWT auth role with
`bound_claims` to avoid this becoming a security problem, but this doesn't allow
for correct accounting of user claims.

Add support for an `extra_claims` block on the server's `default_identity`
blocks for Vault. This allows a cluster administrator to add a custom claim on
all allocations. The values for these claims are interpolatable with a limited
subset of fields, similar to how we interpolate the task environment.

Fixes: #23510
Ref: https://hashicorp.atlassian.net/browse/NET-10372
Ref: https://hashicorp.atlassian.net/browse/NET-10387
@github-project-automation github-project-automation bot moved this from In Progress to Done in Nomad - Community Issues Triage Aug 5, 2024
@tgross
Copy link
Member

tgross commented Aug 5, 2024

#23675 has been merged and will ship in the upcoming Nomad 1.8.3 (with backports to Nomad Enterprise 1.7.x and 1.6.x)

@the-nando
Copy link
Contributor Author

Thanks a LOT @tgross!

Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
hcc/jira stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/workload-identity type/enhancement
Projects
Development

Successfully merging a pull request may close this issue.

3 participants