-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add insecure_skip_verify in kubeletMetrics preset #1034
Conversation
Signed-off-by: ChrsMark <[email protected]>
@@ -169,6 +169,7 @@ receivers: | |||
collection_interval: 20s | |||
auth_type: "serviceAccount" | |||
endpoint: "${env:K8S_NODE_NAME}:10250" | |||
insecure_skip_verify: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to enable this by default since it isn't the recommended behavior for production. Really the receiver's README should be updated to not set that value. If the cluster's certificates are in order this value is not necessary. @jinja2 do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I agree, though these certificates are a bit of a hassle to manage and we don't really have good docs how to add these to collector.
Also it used to work before 0.87 as there was a bug in kubeletstats, which caused to ignore ceritficate verification -> https://github.com/open-telemetry/opentelemetry-collector-contrib/releases/tag/v0.87.0
So I'm a bit conflicted here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment the preset
does not add any of the ca_file
, key_file
, cert_file
, right? So I wonder in which cases the preset is actually useful with the current implementation.
ref: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/27070/files#r1334828771
From my perspective, insecure_skip_verify
serves as a good base and starting point for the cases where certs are not required. For production clusters, our users would need to explicitly configure the receiver
(without the preset) and manually set the certs anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Someone correct me if I'm wrong, but I thought the receiver was looking in an expected place for the certs and ca_file
, key_file
, and cert_file
was only necessary if your certs didn't live in that expected place.
Are the certs for a standard production k8s cluster no longer installed at that location?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The preset in the chart is using the serviceAccount
auth type. This auth type does not use the ca_file
, key_file
, cert_file
options at all. These are used for auth type = TLS so basically if you are using client certificates to authenticate. With serviceAccount auth we get the cluster's root CA cert so it can be added to the trust store/cert pool, and this is the ca.crt
we see along with the token for service account at the well-known path that Tyler linked. The standard practice is to setup kubelet to serve certificate signed by the cluster CA which we have access to already when using serviceaccount auth.
Some setups might not set the k8s node name in the Subject alternative name in the kubelet's server cert. In such cases users might get the hostname mismatch
error since we use the K8S_NODE_NAME as host here. The referenced comment has more details on my findings on some of the more commonly used distros.
I would recommend to not add insecure_skip_verify: true
as default cause that would be a step back in terms of secure practices. This is the first issue about this setting after the change in behavior went out so it might not be impacting most users. A compromise might be an option under the preset variable?
The error posted in the issue with running in kind cluster can most likely be fixed by adding below patch to the kind cluster's config.
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
kubeadmConfigPatches:
- |
kind: KubeletConfiguration
serverTLSBootstrap: true
After the cluster is up, you'll probably need to approve the certificate signing request for the kubelet certs by running kubectl certificate approve {PENDING CSR}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank's for the details @jinja2 @TylerHelmuth on how the internals of serviceAccount
auth type work.
Based on what you shared insecure_skip_verify: true
does not make a lot of sense to be the default.
I tested the preset on a GKE cluster and works without any issues.
However would that make sense to document these details around the serviceAccount
type so as to make it clear how it works and what is required in order for this work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChrsMark Yes, let's definitely document it and the possible fix.
Closing this in honor of open-telemetry/opentelemetry-collector-contrib#31280 as per the discussions at #1034 (comment). Let's continue the conversation there. |
#31280) **Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> This PR adds some clarifications in the documentation of the `serviceAccount` auth type of the `kubeletstats` receiver. This is an outcome of the discussions at open-telemetry/opentelemetry-helm-charts#1034 (comment). Signed-off-by: ChrsMark <[email protected]>
open-telemetry#31280) **Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> This PR adds some clarifications in the documentation of the `serviceAccount` auth type of the `kubeletstats` receiver. This is an outcome of the discussions at open-telemetry/opentelemetry-helm-charts#1034 (comment). Signed-off-by: ChrsMark <[email protected]>
Bumping this up again since I got an error while testing the preset on GKE. Config: mode: daemonset
presets:
kubeletMetrics:
enabled: true
config:
exporters:
debug:
verbosity: detailed
service:
pipelines:
metrics:
receivers: [kubeletstats]
processors: [batch]
exporters: [debug] Logs: 2024-07-03T13:16:16.966Z error [email protected]/scraper.go:77 call to /stats/summary endpoint failed {"kind": "receiver", "name": "kubeletstats", "data_type": "metrics", "error": "Get \"https://gke-otel-chrismark-test-default-pool-b71e7458-hnfn:10250/stats/summary\": tls: failed to verify certificate: x509: certificate is not valid for any names, but wanted to match gke-otel-chrismark-test-default-pool-b71e7458-hnfn"}
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kubeletstatsreceiver.(*kubletScraper).scrape
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/[email protected]/scraper.go:77
go.opentelemetry.io/collector/receiver/scraperhelper.ScrapeFunc.Scrape
go.opentelemetry.io/collector/[email protected]/scraperhelper/scraper.go:20
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
go.opentelemetry.io/collector/[email protected]/scraperhelper/scrapercontroller.go:194
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
go.opentelemetry.io/collector/[email protected]/scraperhelper/scrapercontroller.go:169
2024-07-03T13:16:16.967Z error scraperhelper/scrapercontroller.go:197 Error scraping metrics {"kind": "receiver", "name": "kubeletstats", "data_type": "metrics", "error": "Get \"https://gke-otel-chrismark-test-default-pool-b71e7458-hnfn:10250/stats/summary\": tls: failed to verify certificate: x509: certificate is not valid for any names, but wanted to match gke-otel-chrismark-test-default-pool-b71e7458-hnfn", "scraper": "kubeletstats"}
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
go.opentelemetry.io/collector/[email protected]/scraperhelper/scrapercontroller.go:197
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
go.opentelemetry.io/collector/[email protected]/scraperhelper/scrapercontroller.go:169 Container image: I don't get these same on EKS, so I guess that sth changed recently on how GKE generates these certs. |
The OTel documentation for the Kubeletstats Receiver component suggests the following config:
However kubeletMetrics preset of the respective Helm Chart does not include the
insecure_skip_verify
.Thus, using the preset results in not being able to scrape metrics from Kubelet:
2024-02-12T13:06:38.943Z error scraperhelper/scrapercontroller.go:200 Error scraping metrics {"kind": "receiver", "name": "kubeletstats", "data_type": "metrics", "error": "Get \"https://kind-worker2:10250/stats/summary\": tls: failed to verify certificate: x509: certificate signed by unknown authority", "scraper": "kubeletstats"}
I guess it would be nice to skip the tls verification with the preset as well since this is mostly the case and would also be aligned with what the plain config example suggests.