Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forbid kubeconfig exec #2356

Conversation

2uasimojo
Copy link
Member

@2uasimojo 2uasimojo commented Jul 17, 2024

Users with write access to the admin kubeconfig Secret for a given ClusterDeployment should not be able to execute arbitrary code in the privileged environment in which we run the controllers that use those Secrets. Funnel all code paths that load such Secrets through a validator to ensure that the AuthInfos[].Exec path is not used.

HIVE-2485

Users with write access to the admin kubeconfig Secret for a given
ClusterDeployment should not be able to execute arbitrary code in the
privileged environment in which we run the controllers that use those
Secrets. Funnel all code paths that load such Secrets through a
validator to ensure that the AuthInfos[].Exec path is not used.

HIVE-2485
@2uasimojo
Copy link
Member Author

/assign @suhanime

/hold for testing

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 17, 2024
@openshift-ci openshift-ci bot requested review from dlom and lleshchi July 17, 2024 22:22
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 17, 2024
Copy link

codecov bot commented Jul 18, 2024

Codecov Report

Attention: Patch coverage is 78.68852% with 13 lines in your changes missing coverage. Please review.

Project coverage is 58.61%. Comparing base (c5571a9) to head (df1ea18).
Report is 5 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2356      +/-   ##
==========================================
+ Coverage   58.51%   58.61%   +0.09%     
==========================================
  Files         182      182              
  Lines       25884    26987    +1103     
==========================================
+ Hits        15147    15818     +671     
- Misses       9460     9865     +405     
- Partials     1277     1304      +27     
Files Coverage Δ
...roller/argocdregister/argocdregister_controller.go 50.50% <100.00%> (-0.22%) ⬇️
...roller/awsprivatelink/awsprivatelink_controller.go 68.09% <100.00%> (+0.27%) ⬆️
...controller/clusterclaim/clusterclaim_controller.go 63.59% <100.00%> (ø)
.../clusterdeployment/clusterdeployment_controller.go 66.78% <100.00%> (+0.12%) ⬆️
...kg/controller/clusterdeployment/clusterinstalls.go 75.57% <100.00%> (ø)
.../controller/clusterdeployment/clusterprovisions.go 67.89% <100.00%> (+5.73%) ⬆️
pkg/remoteclient/remoteclient.go 73.18% <100.00%> (+1.07%) ⬆️
pkg/controller/awsprivatelink/cleanup.go 46.10% <50.00%> (ø)
pkg/remoteclient/kubeconfig.go 0.00% <0.00%> (ø)
pkg/controller/utils/secrets.go 69.81% <69.44%> (-7.97%) ⬇️

... and 9 files with indirect coverage changes

Copy link
Contributor

@suhanime suhanime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@2uasimojo Approving this right now because it is an important/urgent change, but there are many commented entries from your code-walkthrough for these changes, which I do not think are necessary to be in the code base. Can we remove them in a fup?
I also wanted to bring up the fact that we're not validating the kubeconfig when we're adopting it, which I think we should.
Also, do we need to document this anywhere?

@@ -211,7 +211,7 @@ func (r *ArgoCDRegisterController) reconcileCluster(cd *hivev1.ClusterDeployment
return reconcile.Result{}, nil
}

kubeConfigSecretName := cd.Spec.ClusterMetadata.AdminKubeconfigSecretRef.Name
kubeConfigSecretName := cd.Spec.ClusterMetadata.AdminKubeconfigSecretRef.Name // HIVE-2485 ✓
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wanted to leave a paper trail for posterity since this is a whack-a-mole effort by nature. At worst it's harmless cruft, but ideally if someone comes along later and needs to whack more moles, they will have a useful way to search for these breadcrumbs.

If you think they're more crufty than potentially useful, I can scrub them out in a fup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to scrub them per-se, but maybe label them something other than a ticket that's inaccessible to many? Or - how about we include a note where AdminKubeconfigSecretRef is defined, and all someone has to do is Find usages on that constant to see where we need to change/debug? And if that constant is just fetched and not used, we can include a note there as well. Essentially decouple the ticket and its specific use-case from here's-where-we-actually-use-kubeconfig-to-connect-to-the-cluster.

cd,
&hivev1.ClusterMetadata{
InfraID: *cp.Spec.InfraID,
AdminKubeconfigSecretRef: *cp.Spec.AdminKubeconfigSecretRef, // HIVE-2585: via ClusterMetadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have perms to view this issue but just wanted to ensure this comment is intentional

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have perms to view this issue

You do now. But that does put rather an ick on the posterity argument. Then again, hopefully someone whacking moles in this area later would be given access 🤷

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D'oh, the comment has a typo here.

Suggested change
AdminKubeconfigSecretRef: *cp.Spec.AdminKubeconfigSecretRef, // HIVE-2585: via ClusterMetadata
AdminKubeconfigSecretRef: *cp.Spec.AdminKubeconfigSecretRef, // HIVE-2485: via ClusterMetadata

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't catch it earlier but yeah, this is a typo - 2485 not 2585

@@ -533,7 +537,7 @@ func (r *ReconcileAWSPrivateLink) reconcilePrivateLink(cd *hivev1.ClusterDeploym

// Figure out the API address for cluster.
apiDomain, err := initialURL(r.Client,
client.ObjectKey{Namespace: cd.Namespace, Name: clusterMetadata.AdminKubeconfigSecretRef.Name})
client.ObjectKey{Namespace: cd.Namespace, Name: clusterMetadata.AdminKubeconfigSecretRef.Name}) // HIVE-2485 ✓
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too

// and returns a rest.Config loaded therefrom.
// If tryRaw is true, we will look for `raw-kubeconfig` first and use it if present, falling
// back to `kubeconfig` otherwise.
// The error return is non-nil if:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The error return is non-nil if:
// The error returned is non-nil if:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intentionally a noun ("the error return variable"?) but I can change it in a fup if it bugs ya :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh - either makes sense to me so ignore

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 19, 2024
Copy link
Contributor

openshift-ci bot commented Jul 19, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 2uasimojo, suhanime

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@2uasimojo
Copy link
Member Author

/override "Red Hat Konflux / hive-on-pull-request"

Thanks @suhanime, I’ll respond to your comments shortly. TL:DR all intentional, but perhaps not necessary.

Copy link
Contributor

openshift-ci bot commented Jul 19, 2024

@2uasimojo: Overrode contexts on behalf of 2uasimojo: Red Hat Konflux / hive-on-pull-request

In response to this:

/override "Red Hat Konflux / hive-on-pull-request"

Thanks @suhanime, I’ll respond to your comments shortly. TL:DR all intentional, but perhaps not necessary.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@2uasimojo
Copy link
Member Author

/hold cancel

QE hit this real good

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 19, 2024
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 73527a8 and 2 for PR HEAD df1ea18 in total

@2uasimojo
Copy link
Member Author

/override ci/prow/security

Copy link
Contributor

openshift-ci bot commented Jul 19, 2024

@2uasimojo: Overrode contexts on behalf of 2uasimojo: ci/prow/security

In response to this:

/override ci/prow/security

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@2uasimojo
Copy link
Member Author

2uasimojo commented Jul 19, 2024

I also wanted to bring up the fact that we're not validating the kubeconfig when we're adopting it, which I think we should.

Yes, good eye. I did come across those code paths. I deliberately left them alone because the Secrets won't do damage unless they're used, and we should be covering all those cases. (It's also not super clear what the error path would look like in that case. Webhook validation can't be relied upon, so while we could write such to reject CD creation, it wouldn't be an effective barrier. So then we would have to "allow" the adoption but... put a status condition on the CD? Etc.)

Also, do we need to document this anywhere?

I don't think so. The normal use case is for our provisioner to lay down the Secret, and then for hive to use it. The user should really never need to modify it (or even look at it) unless they're rotating KAS certs on the spoke -- and then they're just updating the cert fields.

Copy link
Contributor

openshift-ci bot commented Jul 19, 2024

@2uasimojo: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 92e4501 into openshift:master Jul 19, 2024
10 of 11 checks passed
@2uasimojo 2uasimojo deleted the HIVE-2485/forbid-kubeconfig-exec branch July 19, 2024 20:33
@suhanime
Copy link
Contributor

It's also not super clear what the error path would look like in that case. Webhook validation can't be relied upon, so while we could write such to reject CD creation, it wouldn't be an effective barrier. So then we would have to "allow" the adoption but... put a status condition on the CD? Etc.

I think we shouldn't create a CD and throw an error. We should let them know this kubeconfig isn't going to work early on. Or - we can create a CD but atleast throw a warning in the log. Setting AuthenticationFailure condition seems too premature at the adoption stage.

@2uasimojo
Copy link
Member Author

It's also not super clear what the error path would look like in that case. Webhook validation can't be relied upon, so while we could write such to reject CD creation, it wouldn't be an effective barrier. So then we would have to "allow" the adoption but... put a status condition on the CD? Etc.

I think we shouldn't create a CD and throw an error. We should let them know this kubeconfig isn't going to work early on. Or - we can create a CD but atleast throw a warning in the log. Setting AuthenticationFailure condition seems too premature at the adoption stage.

I think we shouldn't create a CD

Wait, are we talking about adoption through hiveutil? Yeah, I'm totally not going to sweat that code path. I thought you were talking about adding logic to adoption-specific paths in the controller.

Remember hiveutil is positioned as an internal dev tool, not for public use. If someone is going to break that seal, ultimately they're just using it as a convenience to generate yamls so they don't have to write them by hand... but either way ultimately they're creating CRs, which we can't stop them from doing (webhooks notwithstanding).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants