Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Istio 1.21+ unable to get OS CA certificate bundle from SPIRE #5687

Open
vassilvk opened this issue Dec 6, 2024 · 9 comments
Open

Istio 1.21+ unable to get OS CA certificate bundle from SPIRE #5687

vassilvk opened this issue Dec 6, 2024 · 9 comments
Assignees
Labels
triage/in-progress Issue triage is in progress

Comments

@vassilvk
Copy link

vassilvk commented Dec 6, 2024

  • Version: 1.11.0
  • Platform: Kubernetes 1.29.6 (EKS)
  • Istio: 1.22
  • Subsystem: server, agent, (istio?)

Not sure if this is an Istio or a SPIRE issue, however, after following this Istio + SPIRE guide, the Istio gateway pods detect the SPIRE-provided SDS API (as expected), however, they fail to obtain certificates from their SPIRE agents.

The gateway proxy logs contain a series of the following warning message:

2024-12-06T18:23:13.287724Z	warning	envoy config external/envoy/source/extensions/config_subscription/grpc/grpc_stream.h:155	StreamSecrets gRPC config stream to sds-grpc closed: 3, workload is not authorized for the requested identities ["file-root:system"]	thread=28

The SPIRE agents log the following corresponding message:

time="2024-12-06T18:24:09Z" level=error msg="Error building stream secrets response" error="rpc error: code = InvalidArgument desc = workload is not authorized for the requested identities [\"file-root:system\"]" method=StreamSecrets pid=1364350 service=SDS.v3 subsystem_name=endpoints

The SPIRE infrastructure was deployed using spire helm chart version 0.24.1 (app version 1.11.0).

The ClusterSPIFFEID CRDs for the Istio gateway pods function correctly - I can see the SPIFFE registration entries for the gateway pods added to the SPIRE server.

What I don't see is an entry for identity file-root:system.

Not sure why the gateway is attempting to obtain a certificate for identity file-root:system, but looking at Istio's code this indicates that the proxy is asking for the OS root certificates.

I understand that this might be a an Istio question, but I figured someone here might have experienced the same.

@vassilvk vassilvk changed the title Istio gateway unable to get secrets from SPIRE's SDS: workload is not authorized for the requested identities ["file-root:system"] Istio gateway unable to get secrets from SPIRE's SDS Dec 6, 2024
@vassilvk
Copy link
Author

vassilvk commented Dec 7, 2024

Found out what causes the issue.

Istio 1.21 switched to verifying server certificates at proxies by default.

This leads to Istio proxies (both gateways and sidecars) trying to procure the operating system's CA by asking the SDS service for the file-root:system resource. When the SDS service is SPIRE, since no file-root:system registration entry exists on the SPIRE server, SPIRE agent fails to provide this secret, essentially breaking Istio's TLS verification and ability to originate HTTPS connections to non-mTLS endpoints.

Setting VERIFY_CERTIFICATE_AT_CLIENT env variable on istiod to false solves the issue for Istio 1.22 (note the docs incorrectly talk about VERIFY_CERT_AT_CLIENT).

The above workaround is not ideal as we need to have proxies verify TLS certificates coming from external servers.
Does it make sense to implement support for the file-root:system resource in SPIRE agent SDS implementation?

@vassilvk vassilvk changed the title Istio gateway unable to get secrets from SPIRE's SDS Istio 1.21+ unable to get secrets from SPIRE's SDS Dec 7, 2024
@vassilvk vassilvk changed the title Istio 1.21+ unable to get secrets from SPIRE's SDS Istio 1.21+ unable to get secrets from SPIRE Dec 7, 2024
@rturner3 rturner3 added the triage/in-progress Issue triage is in progress label Dec 10, 2024
@evan2645
Copy link
Member

Hi @vassilvk , thank you for opening this. I briefly read through the provided links .. can you confirm the exact Istio behavior change? Is it that the gateway previously handled TLS for connections to external services and then spoke MTLS back to the client ... and now, external connections can pass through and sidecars can directly validate TLS certificates presented by external services?

@vassilvk
Copy link
Author

Hi @evan2645,

The difference in Istio behavior is caused by a change in the default value of the VERIFY_CERTIFICATE_AT_CLIENT istiod flag. Prior to 1.21, this flag was set to false. Starting with 1.21, the flag is set to true. When this flag is set to true, all Istio proxies in the mesh will attempt to validate server-side certificates during TLS (not mTLS) handshake.

The use case that leads to issues with SPIRE is as follows:

When the Istio proxy attempts to speak TLS (not mTLS) to an endpoint, when VERIFY_CERTIFICATE_AT_CLIENT for the control plane is set to true, the proxy checks if the DestinationRule for the endpoint includes CA certificates bundle (specified through the DestinationRule's spec.trafficPolicy.tls.caCertificates). If the CA cert bundle is provided through that setting, the proxy uses that to verify the server-provided certificate during the TLS handshake.

So that's all good.

If, however, the caCertificates setting of the DestinationRule is not provided (which is quite common), the proxy will attempt to use the operating system's CA bundle to verify server-side certificates.

When the proxy is configured to pull its secrets from an SDS stream, it will try to get the operating system's CA bundle by asking the SDS service (in our case SPIRE) for a resource called file-root:system. Since SPIRE has no such resource registered as a SPIFFE identity, nor does it recognize it as a well-known resource (the way it recognizes default and ROOTCA), it blows up.

My question is, does it make sense to add support for resource file-root:system to SPIRE agent, similarly to how it is done for default and ROOTCA and have the agent return its own operating system CA bundle when asked to provide this resource.

@evan2645
Copy link
Member

Thank you @vassilvk - yes I think it will make sense to support this, just trying to make sure I fully understand first

When this flag is set to true, all Istio proxies in the mesh will attempt to validate server-side certificates during TLS (not mTLS) handshake.

When this flag is set to false as it was before .. what is the behavior? Gateway validates server-side cert on the client's behalf? Or something else?

@vassilvk
Copy link
Author

vassilvk commented Dec 17, 2024

When this flag is set to false as it was before.. what is the behavior?

When VERIFY_CERTIFICATE_AT_CLIENT is set to false at the control plane level, proxies in the mesh do not validate certificates for destinations whose CA certificate is not provided by the corresponding DestinationRule. At least that's my understanding.

@vassilvk vassilvk changed the title Istio 1.21+ unable to get secrets from SPIRE Istio 1.21+ unable to get OS CA certificate bundle from SPIRE Dec 21, 2024
@sorindumitru
Copy link
Contributor

Could you assign this to me, I'll try to look into what Istio is trying to do

@sorindumitru
Copy link
Contributor

@vassilvk I've tried to reproduce the issue using the details in https://istio.io/latest/docs/ops/integrations/spire, but I wasn't able to see this error. Would you be able to share some more details about the configuration you have when you run into this so I can adjust my local environment?

As for the issue itself, It don't think spire-agent should handle the file-root:system resource. It would likely only be able to answer with the certificates from its own container which might not be the right list of CA. For example it might be missing some private CAs.

@vassilvk
Copy link
Author

vassilvk commented Jan 15, 2025

Would you be able to share some more details about the configuration you have when you run into this so I can adjust my local environment?

Please see reproduction steps below.

As for the issue itself, It don't think spire-agent should handle the file-root:system resource.

When spire-agent registers itself as the secrets provider for an Istio proxy (through SDS API), the proxy expects all secrets to come from it - this maintains a proper abstraction of the secret source. If the proxy asks for a secret, if the secrets provider fails to provide it, the proxy cannot serve requests related to that secret.

It would likely only be able to answer with the certificates from its own container which might not be the right list of CA. For example it might be missing some private CAs.

That's right. There are many ways to mount CA certificates into a container. When customers use private certificates with vanilla Istio, they typically mount these certificates into Istio's control plane. However, when SPIRE is used as the PKI provider for the mesh, replacing Istio's control plane, it becomes logical for customers to configure SPIRE with the private CA certificates instead.

The exact method for achieving this is an open question, as there are multiple approaches. One common option is using a well-known ConfigMap, similar to how Istio utilizes the istio-ca-secret ConfigMap.

But all of this is beyond the point. The main issue I see here is that as it stands now, SPIRE + Istio (current versions) is broken out of the box for the very common use case of HTTPS origination.

Reproduction steps

  • Install Istio 1.21
  • Follow these instructions to replace Istio's control plane PKI with SPIRE. Make sure to configure SPIRE to expose a federation bundle endpoint.
  • Create the following Istio destination rule to the federation bundle endpoint:
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: spire-server-federation
  namespace: spire-server
spec:
  host: spire-server.spire-server.svc.cluster.local
  trafficPolicy:
    tls:
      mode: SIMPLE
  • Create the following virtual service to direct traffic to the federation bundle endpoint:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: spire-server-federation
  namespace: spire-server
spec:
  hosts:
    - spire-server.spire-server.svc.cluster.local
  http:
    - route:
        - destination:
            host: spire-server.spire-server.svc.cluster.local
            port:
              number: 8443
  • Restart any of the Istio proxies (sidecar, or ingress) configured to use SPIRE agent as their secrets provider.
  • Observe that proxy's logs.
  • You should see a log line similar to this one:
warning	envoy config external/envoy/source/extensions/config_subscription/grpc/grpc_stream.h:155	StreamSecrets gRPC config stream to sds-grpc closed: 3, workload is not authorized for the requested identities ["file-root:system"]	thread=28
  • Observe the corresponding spire agent's logs.
  • You should see a log line similar to this:
level=error msg="Error building stream secrets response" error="rpc error: code = InvalidArgument desc = workload is not authorized for the requested identities [\"file-root:system\"]" method=StreamSecrets pid=1364350 service=SDS.v3 subsystem_name=endpoints

@sorindumitru
Copy link
Contributor

@vassilvk Seems like this will be handled in Istio (see #5727 (comment) and the Istio PR)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage/in-progress Issue triage is in progress
Projects
None yet
Development

No branches or pull requests

4 participants