-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cri_dockerd as included in rke-tools has extremely bad performance #2938
Comments
I'm seeing the same behavior in a simple setup involving one node with 2 CPUs. Here's my spec for reference : RKE / Kubernetes version: Docker version: (docker version, docker info preferred) Operating system and kernel: (cat /etc/os-release, uname -r preferred) Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO) With Small excerpt from
I also noticed that metrics server does not seem to work properly when enabling this option
|
This PR in the upstream repo possibly fixes this issue. |
rke-tools 0.1.81 is available, it contains the latest cri-dockerd 0.2.1 we will use this new image in the coming KDM release. Update: |
Ticket #2938 - Test Results Reproduced with rke
Screenshot: |
In my case on flatcar Linux on a couple of +48 core nodes it used all available CPU and caused very high load eventually crashing workloads in the cluster. @jiaqiluo When will rke-tools 0.1.82 be used in Rancher provisioned RKE1 clusters? |
@Raboo The plan is in Rancher |
can be tested on v1.3.13-rc2 https://github.com/rancher/rke/releases/tag/v1.3.13-rc2 The validation steps should be the same as the reproduction steps #2938 (comment) |
Ticket #2938 - Test Results [pt. 2] With rke
Screenshot: |
It seems like this is not fixed in Rancher v2.6.6 if I read the release notes. |
@Raboo You are right; the fix is not in rancher 2.6.6 but will be in 2.6.7 ( as the milestone indicates) which is planned to be released this week. |
We are still seeing massive performance problems when using |
The correct fix is applied because the upsteam cri-dockerd has fixed the metrics performance issue in v0.2.1 |
I can confirm - a fresh RKE installation 1.24.4 (from Rancher 2.6.8) shows the same behavior as before, with very high CPU usage and low performance. Looking at the system itself, the process consuming all available CPU is dockerd. |
Well this makes me afraid of upgrading to k8s v1.24. Last time I enabled the |
Also confirmed here on Rancher 2.6.8, k8s v1.24.2 - dockerd using 100% cpu. @jiaqiluo is it possible 2.6.8 reverted the fix in 2.6.7? |
same problem with 2.7.0 and v1.23.10-rancher1-1 with cri enabled |
We should probably track rancher/rancher#38816 it seems to aggregate the abnormal CPU usage issues with the CRI |
RKE version:
1.3.11 - v1.23.6
Docker version: (
docker version
,docker info
preferred)20.10
Operating system and kernel: (
cat /etc/os-release
,uname -r
preferred)Ubuntu 20.04 LTS
Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
Azure
cluster.yml file:
Steps to Reproduce:
Set enable_cri_dockerd will lead to constant, near 100% CPU consumption from an empty cluster on a 4vCPU VM (Standard_D4s_v3)
Results:
With 1.24 approaching fast, we should provide a working option for the installed base that wants to continue with RKE1 and Docker.
The text was updated successfully, but these errors were encountered: