-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Agent pools do not have kubelet metrics #1601
Comments
port |
@andyzhangx has the plain http port on 10255 been deactivated without mention in the changelog? This worked on clusters deployed a few months ago, so very surprised to see it gone. |
cc @palma21 |
@andyzhangx I've attempted to set it to https:10250 but I get "Unauthorized" as a response. Note that this is still working on the original node pool... so I'm not sure what's actually changed on the new nodepool. Can you clarify? |
Can we open this ticket back up? Not sure that anything got resolved here. I've attempted to scale up the original node pool and can confirm those new nodes have working kubelet endpoints. I've also scaled up the new nodepool and those new nodes do not have working endpoints for kubelet |
@andyzhangx We attempted to use https:10250 and were not successful. Can you suggest other steps to try? Or why we are getting "Unauthorized" when trying to use this port? We only see this issue on secondary nodepools. It does not occur on the primary nodepool. |
Mentioned on release notes here: https://github.com/Azure/AKS/blob/master/CHANGELOG.md#release-2020-01-27 Did you upgrade your original nodepool to 1.16 or was it created on 1.16? there is currently a bug that is being fixed where the upgrade did not pick up that change, so you might have 12255 working on that one. New pools created originally on 1.16 would not. That could explain your different pool behavior, could you confirm? Kubelet port 10255 is disabled by default: I believe in your case what is missing is needed flags on kubelet which we are enabling right now as well. In the meantime you can workaround it with and confirm if that solves it? |
Thanks for responding and reopening the ticket! Looks like it was referenced under "Azure Monitor" on release notes so I missed it, that's my mistake. The original node pool was created on an older version but was upgraded to 1.16 including the control plane, the secondary nodepool came after. The issue we were concerned with was the difference between the two, the bug you mention sounds like it could be what we are seeing. We've since setup Prometheus authorization over https to scrape the new endpoint. Do you know if this bug is isolated to this endpoint or are there other things we should be looking at as well? |
System information/context:
I can now query Problem: I want to verify the "self-signed" kubelet certificate, but the serviceaccount ca.crt does not work. Said ca.crt works for verifying the master node api endpoint but not the nodes' kubelet endpoint. |
For those of you using ServiceMonitor CRDs this worked for me. Update the
Change from
Now both kubelet+cadvisor targets in Prometheus are up and metrics are once again flowing :) |
What happened:
Added secondary node pools due to the inflexibilities of the original nodepool.
What you expected to happen:
Expect non-default Nodepools to have kubelet stats to allow for container stat monitoring.
How to reproduce it (as minimally and precisely as possible):
Add new nodepool on existing AKS cluster, setup prometheus to scrape kubelet http-metrics from each node. Observe only the default nodepool is reachable via
http://<node_IP>:10255/metrics
Anything else we need to know?:
We currently use kublet to provide container level metrics from prometheus such as CPU/memory stats among other things.
Support Ticket: 120050624005838
Environment:
kubectl version
):v1.16.7
This is happening on our smaller dev AKS clusters, stage and production currently. So anywhere from 1-15 nodes that vary in VM size.
Webservices/Java JVM applications
This is a huge pain point for us as we can't troubleshoot any container resources in our new primary nodepools.
The text was updated successfully, but these errors were encountered: