Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add arbitrary claims to a job's workload identity #19438

Open
vftaylor opened this issue Dec 11, 2023 · 16 comments
Open

Add arbitrary claims to a job's workload identity #19438

vftaylor opened this issue Dec 11, 2023 · 16 comments

Comments

@vftaylor
Copy link

Exposing Nomad workload identity JWTs inside jobs greatly enhances the flexibility that operators have when configuring jobs to authenticate to external systems like Vault, Consul etc. Currently, a job's identity JWT is limited to 4 main claims: nomad_namespace, nomad_job_id, nomad_allocation_id, nomad_task. While a good start, even more flexibility could be obtained if an operator could configure Nomad to add arbitrary keys from the jobspec as claims to a job's identity JWT. Even allowing a subset of the jobspec keys, especially the "Meta" key, would be good enough.

Proposal

Within the new identity block, allow an operator to specify arbitrary keys from the jobspec that would be added as claims to a job's JWT. The primary use case for me would be to have the contents of the "Meta" block for a job baked into the JWT. This would allow dramatically more flexibility when using templated Vault policies, since a richer source of metadata would be available for use. Example jobspec:

job "docs" {
  group "example" {
    task "api" {

      ####
      other stuff
      ####

      identity {
        name         = "example"
        aud          = ["oidc.example.com"]
        file         = true
        ttl          = "1h"
        extra_claims = ["job.meta"]       <----------------- some suitable way to identify keys of interest
      }
    }
  }
  meta {
    foo = bar
  }
}

Would give the following JWT:

{
  "aud": "oidc.example.com",
  "exp": 1702121612,
  "iat": 1702118012,
  "jti": "8a5f5fb9-2a0c-a4f7-7583-b75c4a1b0766",
  "nbf": 1702118012,
  "nomad_allocation_id": "c30140d4-6106-2350-ff90-6cc9a3b9e7ab",
  "nomad_job_id": "docs",
  "nomad_namespace": "default",
  "nomad_task": "api",
  "sub": "foo",
  "job.meta": {
    "foo": "bar"               <-------------------------- and it comes out as a claim
  }
}

Use-cases

  • Have much more control over using Vault policies that are templated with metadata claims.
  • Provide richer information to external systems by securely passing the JWTs from Nomad jobs.

Attempted Solutions

There's no real way to satisfy this requirement using existing Nomad features. This functionality requires cryptographic guarantees from the workload identity engine itself.

@tgross
Copy link
Member

tgross commented Dec 12, 2023

Hi @vftaylor! As it turns out, we've had this same request through various backchannels. The main thing that looks tricky with this is making sure that job submitters can't use this to escalate privileges in a way the cluster administrator doesn't expect. So we're talking through the feasibility.

@vftaylor
Copy link
Author

vftaylor commented Dec 12, 2023

The main thing that looks tricky with this is making sure that job submitters can't use this to escalate privileges in a way the cluster administrator doesn't expect.

@tgross I agree with the concern. Several thoughts:

  • Before workload identity, a job submitter could escalate privileges in Vault (for example) by specifying arbitrary Vault policies in the jobspec. Is this scenario a lot different?
  • To mitigate the privilege escalation concern, you could make it so that "customised" workload identities are only allowed in tokens with specific aud claims. That way, the ultimate recipient of the token has a way to distinguish super-trustworthy vs. semi-trustworthy tokens.

@benvanstaveren
Copy link

Was requested to come discuss here; currently our existing way of doing things would be covered by having multiple roles that receive different policies, but the only real distinguishing feature in there is application names (i.e. every application currently has 1 policy for itself, and 1 policy for each database it needs to access); this does translate back to jwt roles (given that I can compose the role out of several policies).

However, we do at this point in time use a separate service to authenticate apps against eachother, this service also talks to Nomad to extract a few things from the job's meta block, where we define such fun things as "who owns this app", and some other stuff I don't want to discuss openly (secrets... terrible, terrible secrets - but not that secret since they're in a jobspec file :P). Ideally I would love to be able to declare as a configuration option (marked as "if you do this, you could open yourself up to privilege escalation and other such fun things" in the docs, obviously) that I can either copy the entire meta block of a jobspec into a meta key in the claim mappings, or only a subset - just so we can get rid of that service-auth-service thing.

I figure the easiest bit would be to just flat out copy the meta block.

As far as privilege escalation goes, though, I don't see how this would allow for it unless specifically implemented on the consuming end; in our case, we want to use the workload identity to get rid of said service that authenticates services to eachother and use the identity, a consuming service won't blindly accept the claim mappings - at least, it shouldn't, but I've got my developers trained right and the attitude adjustment stick is never far away. Also with the way we use Nomad, nobody gets to submit jobs, it goes through an API where I do a painful amount of scrubbing on the supplied data before it generates a job, so just in my (super special snowflake maybe) case, it's a trivial non-issue. Can't say if that floats for other people, though...

And what @vftaylor said, you could already do this by specifying additional policies in the vault.policies list, unless you're thinking of something entirely different re: escalation.

@Lord-Y
Copy link

Lord-Y commented Jan 15, 2024

I'm also waiting for these extra claims too. That's a must need. @benvanstaveren when you are using workload identity, you cannot use vault.policies. You will see a warning like:

Job Warnings:
1 warning:

* Task xxxx has a Vault block with policies but uses workload identity to authenticate with Vault, policies will be ignored

@benvanstaveren
Copy link

I'm also waiting for these extra claims too. That's a must need. @benvanstaveren when you are using workload identity, you cannot use vault.policies. You will see a warning like:

Job Warnings:
1 warning:

* Task xxxx has a Vault block with policies but uses workload identity to authenticate with Vault, policies will be ignored

I know I can't use vault.policies - I'm just illustrating how we do it now. I don't honestly think we'll be upgrading past 1.8 if the current implementation of workload identity stays the way it is and the docs aren't updated to very explicitly and clearly explain how to get it set up on multiple federated clusters and still keep the ability to emergency-schedule things on a different cluster.

@Lord-Y
Copy link

Lord-Y commented Mar 8, 2024

@benvanstaveren you need to use vault.role instead. Then create the role into your jwtpath/role. In this role associate the policy that you want. I've been able to do that since maybe 3 weeks and it works. I think vault.role came in version 1.7.4 ... I don't remember.

@EtienneBruines
Copy link
Contributor

The main thing that looks tricky with this is making sure that job submitters can't use this to escalate privileges in a way the cluster administrator doesn't expect. So we're talking through the feasibility.

For a lot of use-cases, having additional claims somewhere would suffice, even if it's in a mandatory metadata field that we cannot 'escape' out of.

{
  "aud": "oidc.example.com",
  "exp": 1702121612,
  "iat": 1702118012,
  "jti": "8a5f5fb9-2a0c-a4f7-7583-b75c4a1b0766",
  "nbf": 1702118012,
  "nomad_allocation_id": "c30140d4-6106-2350-ff90-6cc9a3b9e7ab",
  "nomad_job_id": "docs",
  "nomad_namespace": "default",
  "nomad_task": "api",
  "metadata": {
    "foo": "bar"
  }
}

That should prevent privilege escalation, while still enabling access to custom values.


Our use-case:

  • Nomad generates identity token for Vault
  • Vault parses this identity token and saves whatever it can as user_claims into the metadata of that auth alias
  • Vault generates a OIDC-token for the Nomad task (secret), which makes use of the claims that were saved into the alias metadata

But nowhere does Vault allow us to (for example) add a prefix/suffix to it. Not even capitalization. So enabling Nomad to specify more useful values to the feature-lacking Vault would help a lot.


Then again, if Nomad allows custom values at root level, we could forego Vault for our use-case and then we can make our application (SurrealDB) just accept the identity tokens that Nomad generates directly.

@shoeffner
Copy link

Before workload identity, a job submitter could escalate privileges in Vault (for example) by specifying arbitrary Vault policies in the jobspec.

How does this work? As far as my experience goes, this is not possible, because you must pass a VAULT_TOKEN with the correct policies when you are attempting to schedule a job with arbitrary Vault policies.

@tgross
Copy link
Member

tgross commented Jul 29, 2024

How does this work? As far as my experience goes, this is not possible, because you must pass a VAULT_TOKEN with the correct policies when you are attempting to schedule a job with arbitrary Vault policies.

Workload Identity removes the requirement for users to submit the Vault token, in lieu of creating a trust relationship between Nomad and Vault such that Nomad gets the Vault token via signed Workload Identities. That's entirely the purpose of WI.

@shoeffner
Copy link

Thanks, my question referred to the state before WI though. I don't see how the privilege escalation would work right now, because as far as I know I need to provide a token which already proves that I have access to a certain policy.

Workload identities weaken this assumption it seems, as a user I can now schedule any workload and get whatever the job identity might have access to, unless I am misunderstanding:

So before WI a user would have to:

  • specify arbitrary vault policies
  • get a vault token with those policies
  • present this token to nomad to prove they are allowed to request these policies and pass them to the job

With WI:

  • A user can specify one pre-defined role
  • Does not need to prove they are allowed to schedule a job with access to this role, they just schedule it (given they are allowed to schedule a job in the given namespace)

How would a user prove that they are allowed to schedule that particular workload? The only two things I can come up with are:

  • namespace policies
  • sentinel policies

Namespaces are extremely inflexible, we would need to create namespaces per role to limit access properly (and create hundreds of roles to allow the same flexibility we had before with policies).
Sentinel policies are an enterprise feature which our organization cannot afford right now, so they are not an option.

I don't see any other ACLs listed in https://developer.hashicorp.com/nomad/tutorials/access-control/access-control-create-policy#write-the-policy-rules which would allow to control this.

This would not be resolved with arbitrary claims added from a user side either, but it might possibly be resolved with Nomad adding, e.g., a user identity, or roles and groups derived from Nomad token.

@tgross
Copy link
Member

tgross commented Jul 29, 2024

Thanks, my question referred to the state before WI though. I don't see how the privilege escalation would work right now, because as far as I know I need to provide a token which already proves that I have access to a certain policy.

Workload identities weaken this assumption it seems

Right. Namespace policies, Sentinel policies, and templated Vault policies on Vault roles (example from tutorial) are the intended controls in the Workload Identity workflow.

Namespaces are extremely inflexible, we would need to create namespaces per role to limit access properly (and create hundreds of roles to allow the same flexibility we had before with policies)

There's definitely a bit more up-front work involved, but the resulting workflow for job authors is easier (they don't need a Nomad token, Vault token, and Consul token to submit a job).

In any case, this discussion isn't about whether we're going to have users submitting their own Vault tokens (which as published late last year is the legacy workflow we're removing in Nomad 1.10). This discussion is around features intended to add additional flexibility so that cluster admins have finer-grained control over policies, just as you're suggesting you'd want. An alternate proposal is over in #23510.

@tgross
Copy link
Member

tgross commented Jul 29, 2024

I wanted to follow-up on this issue with something we've discussed over in #23510. From my most recent comment there (#23510 (comment)):

Ok, so @schmichael and I had a chat and I think we've settled on the idea of introducing a extra claims block that accepts template strings in the server configuration. So in the Vault block you'll do something like this:

vault {
  address = "https://vault.example.com:8200"
  enabled = true

  default_identity {
    aud = ["vault.io"]
    ttl = "1h"
    extra_claims {
      nomad_workload_id = "${job.namespace}:${job.id}"
      some_other_claim  = "foo"
    }
  }
}

We'll need to do a little investigation to see the exact objects we can expose in those templates, but that's the gist of things.

This allows us to avoid adding lots more claims to the JWT that some users might not need, while giving cluster admins the flexibility they need to meet their requirements for controls. We'll also probably want to add the same feature for a top-level server.default_identity, but we can do that in follow-up work. That'll cover a lot of the remaining use cases described in #19438.

I think that'll mostly cover what folks want to do here, and if there are use cases leftover, we can discuss what the best way forward is within that context.

@EtienneBruines
Copy link
Contributor

EtienneBruines commented Aug 13, 2024

vault {
 address = "https://vault.example.com:8200"
 enabled = true

 default_identity {
   aud = ["vault.io"]
   ttl = "1h"
   extra_claims {
     nomad_workload_id = "${job.namespace}:${job.id}"
     some_other_claim  = "foo"
   }
 }
}

Even though it's not mentioned in the release notes, I believe this is released as part of v1.8.3 #23675 🎉

@tgross
Copy link
Member

tgross commented Aug 13, 2024

Almost! Only vault.default_identity can have extra_claims as of 1.8.3, which we think will get folks very far. We're keeping this issue open for a more broadly-reaching ability to add these.

@schematis
Copy link

schematis commented Dec 6, 2024

Are there any plans for making extra_claims available at the job level? Our use case involves review apps, which are named something like <project-name>-<branch-name>. Currently with WI we would need to have one entry in vault per review app instead of all review apps for a project being able to share a single entry. It would be great if we could define a static string to a claim (such as parent_project = "foo") within the job to pass to the templated policy in vault.

@tgross
Copy link
Member

tgross commented Dec 6, 2024

This issue covers exactly that kind of thing, but it's still unclear how we want to expose that in a way that doesn't allow for easy privilege escalation. The vault.extra_claims we added protects your from having job authors creating arbitrary claims by templating to a narrow subset of fields. But there's currently no option there to do so from other metadata beyond the interpolations shown there in the docs.

I suspect the answer will be to let cluster admins add claim templates for keys in meta blocks. So for example that would let you have a jobspec with:

meta {
  project = "example"
  branch = "final-final-2"
}

And then have a vault.extra_claims template like:

vault {
  extra_claims = "${meta.project}-${meta.branch}"
}

In any case, this isn't currently on the roadmap and is still under discussion. I am going to mark it so that it can get roadmapped though, as it seems like a valuable project.

@tgross tgross added the hcc/jira label Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Needs Roadmapping
Development

No branches or pull requests

7 participants