title | gep-number | creation-date | status | authors | reviewers | ||
---|---|---|---|---|---|---|---|
Workload Identity - Trust Based Authentication |
26 |
2024-02-26 |
implementable |
|
|
Gardener issues and distributes JSON Web Tokens that can be used for authentication with external services. Gardener also exposes metadata documents in an OIDC compatible way, if needed, in the public internet. This allows Gardener users to establish trust towards Gardener through their service providers leveraging identity federation and parts of the OIDC protocol. By employing the JWTs and the trust federation, static credentials are no longer needed for authentication with the service provider, i.e. they can be replaced by tokens issued by Gardener which will be recognized by the external entity because trust was established beforehand. For example, machine controller manager will no longer require static credentials to create virtual machines in the cloud provider account of the Gardener user.
Cloud services, like AWS, Azure, Alicloud, GCP and others, support identity federation with identity providers external to them via trust configuration using the OIDC protocol. This way, the remotely running workloads can use JWTs issued from the external identity provider to authenticate with the cloud service, hence no static credentials like service keys are used. More details for the cloud providers, can be found at their documentation:
- AWS: Creating OpenID Connect (OIDC) identity providers
- Azure: Workload identity federation
- Alicloud: Overview of OIDC-based SSO
- GCP: Workload identity federation
Gardener is using variety of external services to create different resources needed for the lifecycle of a shoot or seed cluster, resources like virtual machines, volumes, object storage, load balancers, DNS records, etc. Each request to create, update or delete such resource needs to be authenticated with some kind of credentials. Sometimes the resources reside in different cloud accounts and credentials with mixed ownership are used, for example when Gardener Operator and Gardener User are different entities, Gardener Users brings their own credentials for their account and let Gardener use them on their behalf to create resources. These credentials usually have long lifetime (and are often non-expiring), they are reused in different scenarios by various tools, occasionally granted with broader permissions, stored in different locations. Such handling poses various security risks.
The static long-lived credentials can be replaced with short-lived auto-rotated JWTs issued and used only by Gardener, never leaving the Gardener environment except for representing Gardener workloads before an external entity, eliminating the security burden to manage and store static credentials. Static credentials can expire or get accidentally invalidated, which will cause reconciliation flows to fail preventing delivery of updates, fixes, and improvements. This risk is better to be managed in an automated way by Gardener itself.
The JSON Web Tokens are ephemeral and not stored anywhere by the issuer, a feature that Gardener can benefit of as well because it will not have to store the credentials of Gardener users.
- Manage shoot clusters without credentials provided by the user.
- Replace static credentials for gardener system components, e.g. DNS and backup controllers.
- Rotate credentials regularly. Rotating the token signing keys will effectively invalidate all previously issued tokens.
- Offload Gardener users from the burden to store and manage static credentials for their accounts.
- Register Gardener as trusted identity in the shoot clusters.
- The tokens to be usable for authentication with the Gardener API.
- Compatibility with gardenctl integration with cloud provider CLIs. JSON Web Tokens will not be drop-in replacement of the static credentials, therefore gardenctl and other tools will have to adapt to the JWTs as infrastructure credentials.
- OIDC compliance - just as Kubernetes, Gardener goal is not to have full OIDC compliance, but to implement the bare minimum for OIDC compatible trust federation.
In short, Gardener API server will generate JWTs on request by gardenlet.
Gardenlet will ensure that the tokens can be consumed conveniently by the
various components in the seed clusters. For example, it will write them into
secrets. Service provider extensions will be responsible for making adjustments
so that the token is consumable by the service SDK. For shoot clusters, it would
be natural that the cloudprovider
secret is reused as storage target for the
token.
Gardenlet will take care to refresh the token regularly and on time so that the target storage always contains a valid token. As an example a token can be refreshed when it reaches 80% of its lifetime. Of course it will all depend on the validity duration of such tokens. This is why these parameters will be configurable.
The OIDC metadata discovery documents will be served in such network segment, e.g. public internet, so that service providers can be configured to trust Gardener as an OIDC compatible token issuer.
A new resource WorkloadIdentity
in authentication.gardener.cloud
API Group
will be implemented. It will specify different characteristics of the JWT, like
the value for the aud
claim.
Similarly to providerConfig
in other Gardener APIs, WorkloadIdentity
resource will feature a providerConfig
field that will be of byte array type
allowing service provider specific configurations. Usually, the clients for
services supporting identity federation need additional information about the
cloud account and the federated identity in order to successfully use the JWT.
This information is known to the cloud account owners and they will provide it
via this providerConfig
field, for example when AWS is the external service
the AWS IAM Role ARN will be set in the providerConfig
field.
The value of the sub
claim of the OIDC token will be computed by Gardener, it
will have the following format
gardener.cloud:workloadidentity:<workloadidentity-namespace-name>:<workloadidentity-name>:<workloadidentity-uuid>
.
A validation must ensure that the WorkloadIdentity
name and namespace name do
not exceed certain limit. This restrictions is required as per the OIDC
Specification
the sub
claim cannot exceed 255 ASCII chars length. Gardener API server will
write the value of the sub claim in the status.sub
field to make it explicit,
otherwise Gardener users will have to deduce it themselves which could turn out
to be error prone.
apiVersion: authentication.gardener.cloud/v1alpha1
kind: WorkloadIdentity
metadata:
name: banana-testing
namespace: garden-local
uid: 12b580fe-1f74-4195-852b-e1a74b03496a # generated by the API server.
spec:
audiences: # Required field.
- team-foo
targetSystem: # Required field.
type: aws # Required field.
providerConfig: # Optional field of type *runtime.RawExtension, extensions can make it mandatory via admission webhooks.
apiVersion: aws.authentication.gardener.cloud/v1alpha1
kind: Config
iamRoleARN: arn:aws:iam::112233445566:role/gardener-dev
status:
sub: gardener.cloud:workloadidentity:garden-local:banana-testing:12b580fe-1f74-4195-852b-e1a74b03496a
JWTs will be available when the clients send create
requests on the
WorkloadIdentity/token
subresource. As the clients will be providing various
custom information that will be used for the generation of the JWT, yet another
resource TokenRequest
in the API group authentication.gardener.cloud
will be
used, similar to TokenRequest
from authentication.k8s.io/v1
API. It is
envisioned this resource to contain just metadata for the context where the JWT
is being used, e.g. shoot or backup entry identifier. Gardener API server must
verify the provided metadata and it can enhance the JWT with additional
information derived from the context, for example with information for the
project and the seed of the shoot cluster. Gardener API can also add global
information like a garden cluster identity. TokenRequest
will feature optional
field duration
that will allow clients to specify for how long the issued
workload identity token to be valid. This duration will be ensured to be between
certain limits of minimal and maximal validity, in order to avoid frequent token
renewals as well as tokens with too long validity. If the duration field is not
set, a default duration will applied.
TokenRequest
resources will never be persisted in the storage layer, the
generated token will be written in the .status.token
field and returned to the
client as response. The expiration timestamp of the token will be also available
in the status via the .status.expirationTimestamp
field.
apiVersion: authentication.gardener.cloud/v1alpha1
kind: TokenRequest
spec:
contextObject: # Optional field, various metadata about context of use of the token
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
name: foo
namespace: garden-local
uid: 54d09554-6a68-4f46-a23a-e3592385d820
duration: 48h # Optional field, gardener will have default value of token duration if the field is unset.
status:
token: eyJhbGciOiJ....OkBBrVWA # The generated OIDC token
expirationTimestamp: 2024-02-09T16:35:02Z
Currently, shoot clusters set the infrastructure credentials via an intermediate
resource named SecretBinding
which is referring to the actual Kubernetes
secret that contains the static credentials. If SecretBinding
is extended to
refer WorkloadIdentity
as infrastructure credentials, from user experience
point of view SecretBinding
is not the best name for such resource, because it
is no longer limited to referring only secrets as its name implies. Therefore, a
new resource named CredentialsBinding
in the API group
authentication.gardener.cloud
will be implemented. It will have all features
of SecretBinding
, but on top of that will be extended to refer to
WorkloadIdentity
resources via .workloadIdentityRef
field.
CredentialsBinding
will be allowed to set exactly one of the .secretRef
or
.workloadIdentityRef
fields, but not both or none of them.
In a nutshell, the changes introduced compared to SecretBinding
are:
CredentialsBinding.secretRef
field will be optional and mutable.CredentialsBinding.workloadIdentityRef
field will be optional and mutable. It will refer to aWorkloadIdentity
resource by its name and namespace. If the namespace is unset, the namespace of theCredentialsBinding
is will be used.quotas
andprovider
fields have the semantic as their respective counterparts in theSecretBinding
API.providers
will be made mandatory field also via the API specification.- Static validation will ensure that exactly one of
secretRef
orworkloadIdentityRef
fields is set, but not both or none of them. - On update of the
workloadIdentityRef
, extension admission controller should ensure that both the old and the newWorkloadIdentity
are for the same cloud provider account, if such validation is possible for the given extension.
apiVersion: authentication.gardener.cloud/v1alpha1
kind: CredentialsBinding
metadata:
name: my-credentials
namespace: garden-local
provider:
type: aws # {aws,azure,gcp,...}
secretRef: # unlike SecretBindings, this field will be optional and mutable
name: static
# namespace: "...", allow reference across namespaces
workloadIdentityRef:
name: banana-testing
# namespace: "...", allow reference across namespaces
quotas: []
# - name: quota-1
# # namespace: garden-quotas
Shoot API will be extended with new field .spec.credentialsBindingName
whose
value will be the name of a CredentialsBinding
resource from the namespace of
the shoot resource. The shoot field .spec.secretBindingName
and the
SecretBinding
API will be deprecated in favour of CredentialsBinding
, and
eventually removed in a future version of Gardener. A static validation will
ensure exactly one of the fields .spec.secretBindingName
and
.spec.credentialsBindingName
is set. secretBindingName
will be made mutable
to allow already existing shoot clusters to migrate to credentials binding.
Shoots will be able to use workload identity as infrastructure credentials only
via the CredentialsBinding
resource.
Features associated with the SecretBinding like the shoot quotas will be extended to also cover clusters using WorkloadIdentity for authentication.
While the infrastructure credentials for the shoot cluster are the main driver
behind this GEP, various extensions can benefit of this feature as well. For
this purpose, a new optional field named workloadIdentity
will be introduced
in shoot.spec.extensions
, it will refer to workload identity by name assuming
the workloadIdentity resource is in the namespace of the shoot. The seed CRD
extensions.extensions.gardener.cloud
will be also extended to reflect that the
extension is using workload identity letting extension controllers know when a
JWT or other kind of credentials are used.
Similar approach can be taken to provide alternative to
shoot.spec.dns.providers.secretName
, e.g. a new field workloadIdentity
will
extended the shoot.spec.dns.providers
.
spec:
extensions:
- type: some-extension
workloadIdentity:
name: foo
dns:
providers:
- type: some-dns-provider
workloadIdentity:
name: bar
Wherever the Seed API is referring to secrets, it will be extended to refer to
workload identities, as of now these are the fields spec.backup.
and
spec.dns.provider
and they will have new field workloadIdentityRef
holding
the name and the namespace of a workload identity resource. The respective
secretRef
fields will be made optional and validation will ensure only one of
secretRef
and workloadIdentityRef
is used at a time.
apiVersion: core.gardener.cloud/v1beta1
kind: Seed
metadata:
name: seed
spec:
backup:
workloadIdentityRef:
name: backup-workloadidentity
namespace: garden
dns:
provider:
workloadIdentityRef:
name: ingress-workloadidentity
namespace: garden
A new component, positively the metadata server from
GEP-24, will be used to publish the public OIDC
metadata discovery documents /.well-known/openid-configuration
and jwks_uri
.
This component will be only provided with access to the public keys or any other
public information, it will not hold or have access to any private information
related to token generation and signing. To support key rotation, it will serve
also the older set of public keys so that already issued but still valid and not
expired tokens can be used for identity federation with external services.
On key rotation, the new key pair might need to be published but not used to sign the tokens, this is needed to ensure enough time for the external services to discover the new public key. This rotation strategy could be useful for external services that do not automatically rediscover the OIDC issuer metadata when the token is signed with still unknown to them key. The major infrastructure providers do not document publicly how often they are running OIDC rediscovery, but a hands-on experience shows that some are doing it immediately, while others need several minutes. As workload identity is not limited only to the major infrastructure providers, therefore the duration of this period will be configurable and it would be recommended to be at least one day long.
The Kubernetes API server extended by the Gardener API server is already issuing
JWTs for the Kubernetes service accounts. To completely separate workload
identity JWTs from service accounts JWTs, Gardener API will accept an issuer URL
parameter whose value should not be the same as the issuer of the Kubernetes
service accounts. The workload identity issuer url should not be among the
accepted issuers of the Kubernetes API server. Other configuration options for
the Gardener API server will be the private key used to sign the tokens, the
minimal, maximal and the default durations for each token. The private key also
should not be shared with the Kubernetes API server. When gardener-operator
is
used to manage the Garden cluster, it will be also responsible for the Workload
Identity token signing key rotation, a strategy similar to the one for the
Kubernetes Service Account token signing key rotation will be used.
When Gardener API server is using own issuer and signing keys, the service account token authenticator of the Kubernetes API server will reject the workload identity JWTs because:
- the issuer of the tokens is not accepted
- the tokens are not signed by trusted key
- workload identity JWTs are not referring to any Kubernetes service account
- Gardener API will not serve the purpose of authentication or authorization webhook, it will also not implement any authentication or authorization based on the workload identity JWTs, it will just generate and sign them.
Gardener API server will use the global configurations, WorkloadIdentity
and
TokenRequest
specifications to issue JSON Web Tokens. Later, if a use case is
identified, it could feature custom claims in the gardener.cloud
claim
namespace that contain additional information about the context of use of the
token, e.g. metadata about the shoot, seed, project, garden, etc.
A sample payload of a token will look like:
{
"aud": [
"service-foo-provider-bar"
],
"exp": 1707315742,
"iat": 1707312142,
"nbf": 1707312142,
"iss": "https://workload-identity.gardener-local.gardener.cloud",
"sub": "gardener.cloud:workloadidentity:<workloadidentity-namespace>:<workloadidentity-name>:<workloadidentity-uid>",
"gardener.cloud": {
"workloadIdentity": {
"name": "<workloadidentity-name>",
"namespace": "<workloadidentity-namespace>",
"uid": "<workloadidentity-uid>",
},
"shoot": {
"name": "<shoot-name>",
"namespace": "<shoot-namespace>",
"uid": "<shoot-uid>",
},
"project": {
"name": "<project-name>",
"uid": "<project-uid>",
},
"seed": {
"name": "<seed-name>",
"uid": "<seed-uid>",
},
"garden": {
"id": "<garden-cluster-identity>",
},
}
}
Gardenlet will request tokens as per the global configurations and renew them
regularly. It is expected gardenlet to not specify duration for the tokens using
the default token duration, but this can be changed later based on the actual
needs. It will be responsible to provide information for the specific usage of
the token, e.g. shoot name, namespace and UID, via the TokenRequest
API.
gardenlet
will be the only Gardener component authorized to create and refresh
workload identity tokens. Seed Authorizer will be extended to allow gardenlets
to request workload identity tokens only for WorkloadIdentity
that they are
responsible for.
As the tokens will usually have lifetime shorter than the period between two
reconciliations, it is essential that the token creation and management are
decoupled from the current control loops of gardenlet and implemented by a
dedicated controller, also running as part of the gardenlet. Forced renewal of
the tokens will be performed when the resource referring WorkloadIdentity
s is
annotated with gardener.cloud/operation=renew-workload-identity-token
. The
annotation is deliberately not set on the WorkloadIdentity
because single
WorkloadIdentity
can be used by multiple shoots potentially running on
different seeds, i.e. multiple controllers would be responsible to react on the
annotation which is usually fine, but all of them would have to negotiate when
the operation is completed and the annotation to be removed.
Kubernetes Secrets will be the resource holding the JWT, the provider config,
and metadata about the used workload identity. The JWT will be stored under the
token
data key, while the provider config will use the config
data key. The
name and namespace of the used WorkloadIdentity
will be stored in the
annotations workloadidentity.authentication.gardener.cloud/name
and
workloadidentity.authentication.gardener.cloud/namespace
respectively. The
annotation workloadidentity.authentication.gardener.cloud/context-object
, if
present, will hold reference to the object using the workload identity, the
value will be JSON document and have the format
{"apiVersion":"...","kind":"...","name":"...","namespace":"...","uid":"..."}
.
It will be the source for the spec.contextObject
field of the TokenRequest
.
To let the controller(s) easily select or distinguish these secrets, they will
be labeled with
authentication.gardener.cloud/purpose: workload-identity-token-requestor
. The
secrets will be also labeled with
workloadidentity.authentication.gardener.cloud/provider=<WorkloadIdentity.spec.targetSystem.type>
so that the extensions can easily select them and make adjustments via admission
webhooks, e.g. transform the service provider config and the token into
canonical form usable by the respective service provider SDK.
A sample secret bearing workload identity token will look like:
apiVersion: v1
kind: Secret
data:
config: YXBpVmV...bmZpZw==
token: eyJhbGciOiJ....OkBBrVWA
stringData:
credentialsFile: | # Field injected by extension admission webhook, key name is controller by the extension
[default]
role_arn=arn:aws:iam::112233445566:role/gardener-dev
web_identity_token_file=/var/run/workload-identity/aws/token
metadata:
name: cloudprovider
namespace: shoot--local--foo
annotations:
workloadidentity.authentication.gardener.cloud/namespace: garden-local
workloadidentity.authentication.gardener.cloud/name: banana-testing
workloadidentity.authentication.gardener.cloud/context-object: '{"apiVersion":"core.gardener.cloud/v1beta1","kind":"Shoot","name":"foo","namespace":"garden-local","uid":"54d09554-6a68-4f46-a23a-e3592385d820"}'
labels:
authentication.gardener.cloud/purpose: workload-identity-token-requestor
workloadidentity.authentication.gardener.cloud/provider: aws # {aws,azure,gcp,...}
type: Opaque
The secret cloudprovider
that now holds the static credentials will be reused
to store the token and the provider config when the shoot is using workload
identity as infrastructure credentials. For each extension using workload
identity, the secret will be named
workloadidentity-extension-<extension-type>
, and for dns providers
workloadidentity-dns-<dns-provider-type>
.
The reconciliation flow for a component using workload identity tokens will look like this:
- A gardenlet controller creates/updates a
secret
resource in the seed with the above mentioned annotations and labels based on the configuration of theWorkloadIdentity
and resource using it. The workload identity provider config is also written into the secret at this step. - Optionally, extension webhook intercepts the
CREATE
orUPDATE
request for the secret and makes extension specific adjustment to the secret. - The dedicated controller watches these secrets and receives event to
reconcile it. The controller reads the current token from the secret and if
it does not exist or is due for rotation, a new token is requested via
TokenRequest
on the respectiveWorkloadIdentity/token
subresource. The controller writes the returned token into the secret. - Optionally, extension webhook intercepts the
UPDATE
request for the secret and makes extension specific adjustment to the secret. - The
secret
is requeued for reconciliation again when the token will be suitable for renewal. - Secret is mounted as volume to the pods that requires credentials to interact with the external service. Optionally, extension webhooks can change mount paths, mount additional keys from the secret, set environment variables, etc.
Just for completeness, here is how the flow will look like for cloud controller manager when AWS shoot cluster is using workload identity:
- The shoot controller from gardenlet creates the
cloudprovider
secret in the shoot namespace in the seed with the above described labels and annotations, also writing the workload identityproviderConfig
into the secret under theconfig
key. gardener-extension-provider-aws
webhook intercepts the create request and injectscredentialsFile
data key which value is derived from the content of theconfig
key.credentialsFile
content is AWS profile config file containing the ARN of the IAM role to be assumed and path to the web identity token file.- The gardenlet controller dedicated to manage the tokens, reconciles the
secret. A token is issued and written into the
token
key of the secret. - The
gardener-extension-provider-aws
webhook is triggered again and the same adjustments are made. - The
controlplane
controller fromgardener-extension-provider-aws
deploys thecloud-controller-manager
with the required adjustments, e.g. mount thetoken
key from thecloudprovider
secret on path/var/run/workload-identity/aws/token
and set the environment variablesAWS_ROLE_ARN=arn:aws:iam::112233445566:role/gardener-dev
, andAWS_WEB_IDENTITY_TOKEN_FILE=/var/run/workload-identity/aws/token
, or use other options to configure the AWS SDK likeAWS_SHARED_CREDENTIALS_FILE
environment variable having the value of thecredentialsFile
key from the secret.
- Replaces static infrastructure credentials for shoot clusters running on AWS, Azure, Alibaba and GCP.
- Replaces static credentials used by backup controllers to interact with object storage service on AWS, Azure, Alibaba and GCP.
- Replaces static credentials used by dns controllers to interact with DNSaaS on AWS, Azure, Alibaba and GCP.
- Replaces static credentials used by certificate controllers to interact with CERTaaS providers.
- Others, for example Alertmanager receivers.
- Image pull secrets for private images.
It should be possible to run SPIRE server in the Gardener landscape, presumably in the runtime cluster, and SPIRE agents on each seed clusters.
However, this would come with decent overhead:
- Bootstrapping the system and managing the credentials for different agents will need to be automated.
- The solution could easily lock-in to the SPIFFE/SPIRE as identity issuer and make it hard to change the implementation, if needed.
- It is a 3rd party solution not so similar to Kubernetes that needs to be learned, operated and maintained.
- Limited flexibility to inject Gardener own data in the tokens. Eventually could be achieved with custom plugins.
- A standalone API endpoint that need to be properly operated, also all agents will need to be bootstrapped securely.
- SPIRE agents associate the identities with the nodes where the workload is running, but in Gardener we are more interested in the Seeds and the workload itself, not the nodes. With custom plugins it should be possible to make it fit the Gardener case.
- SPIRE server needs a database to store various information about the cluster, operating additional stateful component will require certain investment.
- Seems to be not so friendly when the server and agents are running in different clusters, especially on different clouds, as node attestation done by the server needs somehow to evaluate the nodes with the cloud providers. With custom plugins it should be possible to make it fit Gardener case.
Using a dedicated resource instead of the ServiceAccount
from k8s core API is
preferred because of several reasons:
- Requesting a workload identity token should be accessible only by workloads
running in the Gardener environment, i.e. they should not be exported and used
by other tools, services, application, etc. Gardener users are already granted
with access to create tokens for
ServiceAccounts
in their project namespace and this cannot be restricted without introducing breaking changes. - To ensure that workload identity tokens cannot be used for authentication with
the Gardener API.
WorkloadIdentity
is designed to provide authentication with external services and not with Gardener API.
A Gardener landscape is highly dynamic and seed clusters are added and removed regularly. Also, shoot clusters are migrated between different seeds on demand. Managing trust configuration toward multiple seeds (tens or even hundreds of seeds), is cumbersome work, especially when the ones responsible for the trust configurations are not responsible for the seeds.