Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow runtime configuration of issuers #141

Merged
merged 2 commits into from
May 16, 2024

Conversation

SgtCoDFish
Copy link
Member

@SgtCoDFish SgtCoDFish commented May 9, 2024

Currently, installing csi-driver-spiffe requires that users have already installed cert-manager and already created an issuer, because issuer config must be passed in at startup.

This presents problems when users wish to install everything and configure it later. It also means that it's not possible to change issuers without risking downtime of the driver, which in turn implies downtime in being able to create any pod which relies on csi-driver-spiffe.

This PR adds support for runtime configuration through a ConfigMap. If enabled, the ConfigMap will be watched and if it exists and the configuration is value, the issuer specified in that ConfigMap will override any issuer configured at runtime.

To support the e2e tests for this, we also add lots of helper functions for generating issuers for e2e tests and retrieving certificate details from pods without copy+pasting boilerplate test code.

EDIT: I forgot to link the design doc which this PR implements.

@cert-manager-prow cert-manager-prow bot added dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 9, 2024
@SgtCoDFish SgtCoDFish force-pushed the runtimeconfiguration branch from 52a607b to ae5573a Compare May 9, 2024 13:36
SgtCoDFish

This comment was marked as outdated.

@SgtCoDFish SgtCoDFish force-pushed the runtimeconfiguration branch from ae5573a to 4e8990d Compare May 9, 2024 17:16
@SgtCoDFish
Copy link
Member Author

/test pull-cert-manager-csi-driver-spiffe-test

@SgtCoDFish SgtCoDFish force-pushed the runtimeconfiguration branch 2 times, most recently from 452f0bb to 9e1ea4f Compare May 10, 2024 14:59
@cert-manager-prow cert-manager-prow bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 10, 2024
@maelvls
Copy link
Member

maelvls commented May 13, 2024

Hey,

Regarding the why, I assume that the main problem this is trying to solve is to avoid creating an immediately-failing daemonset. I'm not sure why restarting the csi-driver-spiffe pods as a consequence of updating the flag --issuer-name is a problem as it should restart almost instantenously. Is it slow due to something like a leader election?

Regarding the how, I'm trying to wrap my head around the user experience of this new feature. I guess what I am missing is an example of a configmap manifest that people will be expected to write, here is an example for anyone looking at this PR:

apiVersion: v1
kind: ConfigMap
metadata:
  name: csidriverspiffe-config
  namespace: cert-manager
data:
  issuer.group: cert-manager.io
  issuer.kind: ClusterIssuer
  issuer.name: ca

Creating the configmap is a one-liner:

kubectl create configmap csidriverspiffe-config -n cert-manager \
  --from-literal=issuer.name=ca \
  --from-literal=issuer.kind=ClusterIssuer \
  --from-literal=issuer.group=cert-manager.io

By the way, is the dot in issuer.name needed, or is it an established pattern used elsewhere? The flags are --issuer-name etc, so I was kind of expecting the keys issuer-name, issuer-kind and so on 😅

@SgtCoDFish
Copy link
Member Author

SgtCoDFish commented May 13, 2024

Regarding the why, I assume that the main problem this is trying to solve is to avoid creating an immediately-failing daemonset.

The main problem this is trying to solve is definitely to enable csi-driver-spiffe to be installed at the same time as cert-manager, with the best UX possible. I think the UX of this works pretty well, but it would feel super janky to install csi-driver-spiffe with some dummy issuer name and cert-manager and then have to reconfigure csi-driver-spiffe entirely when the issuer is created.

I'm not sure why restarting the csi-driver-spiffe pods as a consequence of updating the flag --issuer-name is a problem as it should restart almost instantenously.

It feels like a poor UX to me! This design is what was agreed in the design doc when it was discussed. I should've added the link to the design doc to the original description - I'll do that now.

By the way, is the dot in issuer.name needed, or is it an established pattern used elsewhere? The flags are --issuer-name etc, so I was kind of expecting the keys issuer-name, issuer-kind and so on 😅

It couldn't really matter less! I can change it easily if you'd prefer dashes, I really don't mind! EDIT: Dashes are actually what was used in the design doc - I'll change to dashes now.

@maelvls
Copy link
Member

maelvls commented May 13, 2024

The Helm chart might be slightly off, the flag --issuer-name defaults to spiffe-ca even when using runtimeIssuanceConfigMap. Is that expected? Is that because csi-driver-spiffe needs some value for --issuer-name to be able to start?

What I've done:

  1. kind cluster create (1.29.2)

  2. Followed the getting started guide at https://cert-manager.io/docs/usage/csi-driver-spiffe/, with the exception of the helm upgrade step that I slightly edited:

    helm upgrade -i -n cert-manager cert-manager-csi-driver-spiffe jetstack/cert-manager-csi-driver-spiffe --wait \
       --set "app.logLevel=1" \
       --set "app.trustDomain=my.trust.domain" \
       --set "app.approver.signerName=clusterissuers.cert-manager.io/csi-driver-spiffe-ca" \
       --set "app.runtimeIssuanceConfigMap=csidriverspiffe-config"
  3. Created the CM:

    kubectl create cm csidriverspiffe-config -n cert-manager \
      --from-literal=issuer.name=csi-driver-spiffe-ca \
      --from-literal=issuer.kind=ClusterIssuer \
      --from-literal=issuer.group=cert-manager.io
    
    helm template -n cert-manager deploy/charts/csi-driver-spiffe/ \
     --set "app.logLevel=1" \
     --set "app.trustDomain=my.trust.domain" \
     --set "app.approver.signerName=clusterissuers.cert-manager.io/csi-driver-spiffe-ca" \
     --set app.runtimeIssuanceConfigMap=csidriverspiffe-config

At this point, the pods example-app weren't starting. I found that the csi spiffee approver kept showing these errors related to an unexpected spiffe-ca:

admission webhook webhook.cert-manager.io denied the request: status.conditions: Forbidden: user system:serviceaccount:cert-manager:cert-manager-csi-driver-spiffe-approver does not have permissions to set approved/denied conditions for issuer {spiffe-ca ClusterIssuer cert-manager.io}


I found that there is some `--issuer-name` that keeps appearing in the template, although I used `runtimeIssuanceConfigMap`:


```bash
helm template -n cert-manager deploy/charts/csi-driver-spiffe/ \
     --set "app.logLevel=1" \
     --set "app.trustDomain=my.trust.domain" \
     --set "app.approver.signerName=clusterissuers.cert-manager.io/csi-driver-spiffe-ca" \
     --set "app.runtimeIssuanceConfigMap=csidriverspiffe-config" | grep spiffe-ca

shows:

  resourceNames: ["clusterissuers.cert-manager.io/csi-driver-spiffe-ca"]
          - --issuer-name=spiffe-ca

@SgtCoDFish SgtCoDFish force-pushed the runtimeconfiguration branch from 9e1ea4f to 20a03a8 Compare May 13, 2024 14:06
@SgtCoDFish
Copy link
Member Author

The Helm chart might be slightly off, the flag --issuer-name defaults to spiffe-ca even when using runtimeIssuanceConfigMap. Is that expected? Is that because csi-driver-spiffe needs some value for --issuer-name to be able to start?

The default for issuer-name is not great, yeah. We've discussed changing it in standups and we're going to change it, but it's a breaking change and I didn't want to do that in this PR!

I'd rather default to not having an issuer.

This PR adds some validation of how the issuer is set - previously there wasn't any.

@SgtCoDFish
Copy link
Member Author

--set "app.approver.signerName=clusterissuers.cert-manager.io/csi-driver-spiffe-ca"

This explicitly shouldn't be set. The default is to leave it empty now. Anywhere we say to set this we need to update.

@maelvls
Copy link
Member

maelvls commented May 13, 2024

To clarify, I think having to change a flag in the deployment (--issuer-name) is a terrible experience because of this "dummy" issuer. The dummy issuer name shows unhelpful error messages about an issuer that was never actually created. What's even worse is that this dummy issuer will probably not have the approve privilege enabled, meaning that the cert-manager webhook will forbid csi-driver-spiffe-approver from creating CRs...

It's not possible to change issuers without risking downtime of the driver.

That's the part I'm confused about: the design doesn't talk about that problem. I'm probably not familiar enough with csi-driver-spiffe's architecture. How is changing --issuer-name going to cause downtime? Existing CSI requests will still be served, then the driver pods restart and start processing CSI requests using the other issuer.

Is the downtime due to the fact that the driver won't know when there is a change of --issuer-name and won't trigger a new CR? I'm assuming that watching the configmap lets you trigger re-issuance, unlike the --issuer-name flag.

@maelvls
Copy link
Member

maelvls commented May 13, 2024

I've successfully tested the feature.

Instructions:

kind create cluster
make oci-load-manager oci-load-approver helm-chart kind_cluster_name=kind git_version=v0.0.0
helm upgrade -i -n cert-manager cert-manager jetstack/cert-manager \
  --set extraArgs={--controllers='*\,-certificaterequests-approver'} \
  --set installCRDs=true \
  --create-namespace
helm upgrade -i -n cert-manager cert-manager-csi-driver-spiffe _bin/scratch/image/cert-manager-csi-driver-spiffe-v0.0.0.tgz --wait \
 --set "app.logLevel=1" \
 --set "app.trustDomain=my.trust.domain" \
 --set "app.approver.signerName=clusterissuers.cert-manager.io/csi-driver-spiffe-ca" \
 --set image.repository.driver=cert-manager.local/cert-manager-csi-driver-spiffe \
 --set image.repository.approver=cert-manager.local/cert-manager-csi-driver-spiffe-approver \
 --set image.pullPolicy=Never \
 --set app.issuer.name= \
 --set app.issuer.group= \
 --set app.issuer.kind= \
 --set app.runtimeIssuanceConfigMap=csidriverspiffe-config

Then, install the example app:

kubectl create -f https://raw.githubusercontent.com/cert-manager/csi-driver-spiffe/ed646ccf28b1ecdf63f628bf16f1d350a9b850c1/deploy/example/example-app.yaml

The driver logs tells me there is something wrong. I wish it told me that csidriverspiffe-config wasn't found or something like that:

kubectl logs -n cert-manager -l app.kubernetes.io/name=cert-manager-csi-driver-spiffe

server.go:109] "failed processing request" err="generating certificate signing request: no issuerRef is currently active for csi-driver-spiffe; configure one using runtime configuration"

Then, I created the clusterissuer and the config:

kubectl apply -f https://raw.githubusercontent.com/cert-manager/csi-driver-spiffe/ed646ccf28b1ecdf63f628bf16f1d350a9b850c1/deploy/example/clusterissuer.yaml
kubectl create cm csidriverspiffe-config -n cert-manager \
  --from-literal=issuer-name=csi-driver-spiffe-ca \
  --from-literal=issuer-kind=ClusterIssuer \
  --from-literal=issuer-group=cert-manager.io

This time, the pods are running:

$ k get pods -n sandbox                              
NAME                         READY   STATUS    RESTARTS   AGE
my-csi-app-6f58c7589-5rqnk   1/1     Running   0          83m
my-csi-app-6f58c7589-cpt8g   1/1     Running   0          83m
my-csi-app-6f58c7589-d5bgj   1/1     Running   0          83m
my-csi-app-6f58c7589-s4dss   1/1     Running   0          83m
my-csi-app-6f58c7589-xpbmt   1/1     Running   0          83m

Questions:

  1. You mentioned that I shouldn't be using the signerName in the Helm chart:

    --set app.approver.signerName=clusterissuers.cert-manager.io/csi-driver-spiffe-ca
    

    I think I understand what's up with this 😅

  2. Is there a way to show a friendlier error message when the configmap is missing?

  3. In my helm upgrade command, I had:

    --set app.issuer.name= \
    --set app.issuer.group= \
    --set app.issuer.kind= \
    

    Is this recommended way to configure empty --issuer-name flags?

@maelvls
Copy link
Member

maelvls commented May 13, 2024

We had a quick call. Here are the answers:

(1) The problem with --set app.approver.signerName=clusterissuers.cert-manager.io/csi-driver-spiffe-ca is that the user had to know the name of the issuer name when they deploy csi-driver-spiffe, which causes the same problem as having to know --issuer-name when deploying csi-driver-spiffe rather than later on.

(3) yes, the recommended way is to use --set app.issuer.name= to avoid confusing people with a dummy issuer name.

@SgtCoDFish SgtCoDFish force-pushed the runtimeconfiguration branch from 20a03a8 to 8f1b96b Compare May 14, 2024 16:26
Copy link
Member

@maelvls maelvls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

  • Code looks clear and well commented,
  • The way to implemented switching from the configmap back to --issuer-name, and from --issuer-name to the configmap makes sense, and most importantly it is backwards compatible and makes it easy to switch away from --issuer-name to the configmap.

/lgtm
/approve
/hold in case you want to address the nit 😇

test/e2e/util/util.go Outdated Show resolved Hide resolved
@cert-manager-prow cert-manager-prow bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels May 16, 2024
@cert-manager-prow
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: maelvls

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cert-manager-prow cert-manager-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 16, 2024
This commit allows users to specify the name of a ConfigMap which should
be watched for issuer configuration. Importantly, this allows for
csi-driver-spiffe to be installed at the same time as cert-manager
without needing an issuer to already exist

The ConfigMap is currently assumed in the helm chart to be in the same
namespace as the csi-driver-spiffe DaemonSet

Signed-off-by: Ashley Davis <[email protected]>
@SgtCoDFish SgtCoDFish force-pushed the runtimeconfiguration branch from 8f1b96b to a7f2470 Compare May 16, 2024 10:01
@cert-manager-prow cert-manager-prow bot removed the lgtm Indicates that a PR is ready to be merged. label May 16, 2024
@SgtCoDFish
Copy link
Member Author

/unhold

I've addressed the nit now and rebased against main!

@cert-manager-prow cert-manager-prow bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 16, 2024
@maelvls
Copy link
Member

maelvls commented May 16, 2024

/lgtm

@cert-manager-prow cert-manager-prow bot added the lgtm Indicates that a PR is ready to be merged. label May 16, 2024
@cert-manager-prow cert-manager-prow bot merged commit 288f033 into cert-manager:main May 16, 2024
5 checks passed
@SgtCoDFish SgtCoDFish deleted the runtimeconfiguration branch May 16, 2024 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants