Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroup2: map io.stats to v1 blkio.stats correctly #2968

Merged
merged 3 commits into from
Jun 1, 2021
Merged

cgroup2: map io.stats to v1 blkio.stats correctly #2968

merged 3 commits into from
Jun 1, 2021

Conversation

cyphar
Copy link
Member

@cyphar cyphar commented May 28, 2021

Kubelet and cAdvisor depend on the metrics having the same values as in
cgroupv1, but we didn't correctly map the number of read and write IOs
to the correct cgroupv1 stats table (blkio.io_serviced).

In addition, don't leak any extra stats in our output -- if users need
that information we can always add a new field for it.

Change-log entry:

* cgroupv2: correctly convert "number of IOs" statistics in a cgroupv1-compatible way (#2967, #2964)
* cgroupv2: support larger than 32-bit IO statistics on 32-bit architectures

Fixes #2967
Reported-by: Yashpal Choudhary [email protected]
Signed-off-by: Aleksa Sarai [email protected]

@cyphar cyphar added this to the 1.0.0 milestone May 28, 2021
@cyphar
Copy link
Member Author

cyphar commented May 28, 2021

/cc @iyashu Does this seem reasonable?

@dims
Copy link
Contributor

dims commented May 28, 2021

@iyashu can you please test it locally and give us a thumbs up here?

@iyashu
Copy link
Contributor

iyashu commented May 28, 2021

@dims @cyphar lgtm. I've also tested this patch by recompiling the cadvisor with updated runc dependency. All container_fs_{read|write}{_bytes|}_total prometheus metrics now shows the expected values. Thanks for promptly fixing the issue. Hoping to get a new release in cadvisor and kubelet soon :).

container_fs_reads_total{container_label_app="",container_label_app_kubernetes_io_name="",container_label_app_kubernetes_io_version="",container_label_controller_revision_hash="",container_label_environment="",container_label_io_cri_containerd_kind="container",container_label_io_kubernetes_container_name="perfrunner-ssd",container_label_io_kubernetes_pod_name="test-ssd-a-0",container_label_io_kubernetes_pod_namespace="demo",container_label_io_kubernetes_pod_uid="8f151d4f-88fa-4d94-ae23-ad3573b3454a",container_label_k8s_app="",container_label_openebs_io_component_name="",container_label_openebs_io_version="",container_label_pod_template_generation="",container_label_role="",container_label_statefulset_kubernetes_io_pod_name="",device="/dev/dm-3",id="/kubepods.slice/kubepods-pod8f151d4f_88fa_4d94_ae23_ad3573b3454a.slice/cri-containerd-24f81c30c494bf78da70e6b6d0d719a5f38ea723d448f4743f7d81c2fa83ef1d.scope",image="xxx/buster-fio:0.0.1-rc1",name="24f81c30c494bf78da70e6b6d0d719a5f38ea723d448f4743f7d81c2fa83ef1d"} 146 1622216624267

container_fs_reads_bytes_total{container_label_app="",container_label_app_kubernetes_io_name="",container_label_app_kubernetes_io_version="",container_label_controller_revision_hash="",container_label_environment="",container_label_io_cri_containerd_kind="container",container_label_io_kubernetes_container_name="perfrunner-ssd",container_label_io_kubernetes_pod_name="test-ssd-a-0",container_label_io_kubernetes_pod_namespace="demo",container_label_io_kubernetes_pod_uid="8f151d4f-88fa-4d94-ae23-ad3573b3454a",container_label_k8s_app="",container_label_openebs_io_component_name="",container_label_openebs_io_version="",container_label_pod_template_generation="",container_label_role="",container_label_statefulset_kubernetes_io_pod_name="",device="/dev/dm-3",id="/kubepods.slice/kubepods-pod8f151d4f_88fa_4d94_ae23_ad3573b3454a.slice/cri-containerd-24f81c30c494bf78da70e6b6d0d719a5f38ea723d448f4743f7d81c2fa83ef1d.scope",image="xxx/buster-fio:0.0.1-rc1",name="24f81c30c494bf78da70e6b6d0d719a5f38ea723d448f4743f7d81c2fa83ef1d"} 598016 1622216624267

container_fs_writes_total{container_label_app="",container_label_app_kubernetes_io_name="",container_label_app_kubernetes_io_version="",container_label_controller_revision_hash="",container_label_environment="",container_label_io_cri_containerd_kind="container",container_label_io_kubernetes_container_name="perfrunner-ssd",container_label_io_kubernetes_pod_name="test-ssd-a-0",container_label_io_kubernetes_pod_namespace="demo",container_label_io_kubernetes_pod_uid="8f151d4f-88fa-4d94-ae23-ad3573b3454a",container_label_k8s_app="",container_label_openebs_io_component_name="",container_label_openebs_io_version="",container_label_pod_template_generation="",container_label_role="",container_label_statefulset_kubernetes_io_pod_name="",device="/dev/dm-3",id="/kubepods.slice/kubepods-pod8f151d4f_88fa_4d94_ae23_ad3573b3454a.slice/cri-containerd-24f81c30c494bf78da70e6b6d0d719a5f38ea723d448f4743f7d81c2fa83ef1d.scope",image="xxx/openebs/buster-fio:0.0.1-rc1",name="24f81c30c494bf78da70e6b6d0d719a5f38ea723d448f4743f7d81c2fa83ef1d"} 2.06602564e+08 1622216624267

container_fs_writes_bytes_total{container_label_app="",container_label_app_kubernetes_io_name="",container_label_app_kubernetes_io_version="",container_label_controller_revision_hash="",container_label_environment="",container_label_io_cri_containerd_kind="container",container_label_io_kubernetes_container_name="perfrunner-ssd",container_label_io_kubernetes_pod_name="test-ssd-a-0",container_label_io_kubernetes_pod_namespace="demo",container_label_io_kubernetes_pod_uid="8f151d4f-88fa-4d94-ae23-ad3573b3454a",container_label_k8s_app="",container_label_openebs_io_component_name="",container_label_openebs_io_version="",container_label_pod_template_generation="",container_label_role="",container_label_statefulset_kubernetes_io_pod_name="",device="/dev/dm-3",id="/kubepods.slice/kubepods-pod8f151d4f_88fa_4d94_ae23_ad3573b3454a.slice/cri-containerd-24f81c30c494bf78da70e6b6d0d719a5f38ea723d448f4743f7d81c2fa83ef1d.scope",image="xxx/buster-fio:0.0.1-rc1",name="24f81c30c494bf78da70e6b6d0d719a5f38ea723d448f4743f7d81c2fa83ef1d"} 3.76242522271744e+14 1622216624267

iyashu
iyashu previously approved these changes May 28, 2021
dims
dims previously approved these changes May 28, 2021
Copy link
Contributor

@dims dims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cyphar cyphar requested review from AkihiroSuda, kolyshkin and a team May 29, 2021 03:30
AkihiroSuda
AkihiroSuda previously approved these changes May 30, 2021
Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but can we have UT?

@cyphar cyphar dismissed stale reviews from AkihiroSuda, dims, and iyashu via 4512e52 May 31, 2021 03:01
@cyphar
Copy link
Member Author

cyphar commented May 31, 2021

Added a unit test.

@cyphar cyphar requested a review from AkihiroSuda May 31, 2021 03:02
cyphar added 2 commits June 1, 2021 10:35
Kubelet and cAdvisor depend on the metrics having the same values as in
cgroupv1, but we didn't correctly map the number of read and write IOs
to the correct cgroupv1 stats table (blkio.io_serviced).

In addition, don't leak any extra stats in our output -- if users need
that information we can always add a new field for it.

Reported-by: Yashpal Choudhary <[email protected]>
Signed-off-by: Aleksa Sarai <[email protected]>
strconv.ParseUint(..., 0) is not really safe, because on 32-bit
architectures it will trigger runtime errors when trying to parse large
numbers (which in the case of the cgroupv2 io controller, is almost
certainly going to happen).

Fixes: 1932917 ("libcontainer: add initial support for cgroups v2")
Signed-off-by: Aleksa Sarai <[email protected]>
@cyphar
Copy link
Member Author

cyphar commented Jun 1, 2021

Turns out there was a bug in our handling of 64-bit values on 32-bit architectures which the new unit test picked up. It's fixed now.

@cyphar
Copy link
Member Author

cyphar commented Jun 1, 2021

/ping @opencontainers/runc-maintainers This is ready for review now, and is the last PR for 1.0.0.

@dqminh dqminh merged commit 3a0234e into opencontainers:master Jun 1, 2021
@cyphar
Copy link
Member Author

cyphar commented Jun 1, 2021

Coolio. I'll send out the vote for 1.0.0 now. 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Capture number of IO operations performed with cgroup v2 hierarchy
5 participants