Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU user time over 8000% ? #30

Closed
tiagoalves83 opened this issue Nov 11, 2015 · 13 comments
Closed

CPU user time over 8000% ? #30

tiagoalves83 opened this issue Nov 11, 2015 · 13 comments

Comments

@tiagoalves83
Copy link

Hi,

I am using App Docker Template active (30s interval). Zabbix discovery is working nice when I start a new container.

I am running a stress test container. I have an Intel Core 2 Duo processor (Mac Book Pro 2011). The container is running inside a VirtualBox VM (2 CPUs 2Gb RAM).

The curious thing is that the CPU user time graph for this container is showing over 8000% (eight thousand). Is that right?

Inside docker host, if i run docker stats <stress_container> I get around 250% CPU, sometimes it goes to 300%. I presence quick tick of over 410%.

How this measure works ?

Thank you.

@jangaraj
Copy link
Member

That's normal. The right concept for CPU monitoring is collector. Fortunately, it's not easy to implement collector as module, because consistent storage is required. So only basic CPU math from CPU counters is used. See http://www.labouisse.com/how-to/2014/11/18/simple-monitoring-for-docker-part-2/

You can use docker.stat[], which is stat directly from docker not from cpu counters.

@tiagoalves83
Copy link
Author

Thanks for the link. So, basically I will need to use a python script (like the one from labouisse.com) to have a cpu math that looks like docker stats, right ?

The script will probably look like this: https://github.com/docker/docker/blob/master/api/client/stats.go#L81
and this:
https://github.com/docker/docker/blob/master/api/client/stats.go#L316

right?

I've just started to learn zabbix :)

@jangaraj
Copy link
Member

No.

See https://www.kernel.org/doc/Documentation/cgroups/cpuacct.txt what are cpu cgroup counters.

cpuacct.stat file lists a few statistics which further divide the
CPU time obtained by the cgroup into user and system times. Currently
the following statistics are supported:

user: Time spent by tasks of the cgroup in user mode.
system: Time spent by tasks of the cgroup in kernel mode.

How do you calculate cpu percentage if you have only cpu counter in Zabbix? You need to change it into rate (delta per sec in Zabbix template). Then you multiply it by 100 (multiplier in Zabbix template). That's about provided links. You don't need root permission for cpu counters, so this is default method in my template. You can see a unrealistic peak eg 8000%, but values are correct in avg eg. avg 1 day.
It can be eliminated only by using of collector concept. Zabbix (docker server, sysdig, ...) uses collector for os cpu metrics, so os cpu metrics are nice smooth without any peaks.

You like cpu usage metric value reported by docker (docker stats command) and this option is also available in my module - see README about docker.stats[]:

docker.stats[cid,cpu_stats,cpu_usage,total_usage]

If you want to use this method, then you need to modify my Docker template/create new template. Just keep in mind, it's avg value for last X second (IMO 5).

Docker monitoring is not the easiest way how to learn Zabbix. I recommend to start with official doc https://www.zabbix.com/documentation/2.4/manual

@tiagoalves83
Copy link
Author

Thank you once again. I am learning a lot.

What I am trying to achieve is getting some CPU / memory / Disk I/O Graphs for the docker host (individually and comparing with one or more running containers).
I need 6h to 12h graphs. (google/cAdvisor just gives me real time ...)

The thing is that I am not getting just an unrealistic peak of 8000% (even google/cAdvisor we can get strange peaks ...), I am getting a avg of 4000% and min of 2000% in a CPU stress container that runs for 1 hour.

I will try docker.stats[..]

@jangaraj
Copy link
Member

What's you config (docker version/execution driver, OS, number of CPUs)?

@tiagoalves83
Copy link
Author

My machine is a MacBook Pro Mid 2010 - Intel Core 2 Duo 2.4 with 8Gb Ram

Running 2 Ubuntu 15 VM's (VirtualBox) with 2Gb RAM each.

Zabbix Server (1 CPU /proc/cpuinfo shows 1 CPU 1 core):
Zabbix server v2.4.6 (revision 54796) (10 August 2015)

Docker Host (2 CPUs /proc/cpuinfo shows 2 CPUs with 2 cores each):
Docker version 1.9.0, build 76d6bc9
Zabbix agent v2.2.7 (revision 50148) (24 October 2014)
zabbix_docker_module.so (downloaded 10 Nov 2015)
AllowRoot=1

@tiagoalves83
Copy link
Author

Hello again.

I have implemented a shell script to calculate CPU % of a running container. I based the code in "docker stats" golang source. Zabbix is collecting results fine.

But I notice a 5%-15% difference between docker-stats and systemd-cgtop.
systemd-cgtop seens more accurate (https://github.com/systemd/systemd/blob/master/src/cgtop/cgtop.c#L126).

I was not able to collect systemd-cgtop CPU% using a bash script, so I was wondering if your implementation could use systemd-cgtop implementation.

Is it possible? What is necessary ?

@jangaraj
Copy link
Member

Great.

1.) Is it really the best CPU usage algorithm?
I was testing different container monitoring tools before. This module, cAdvisor, sysdig, Control Center (some go lib) and all of them have had different outputs for CPU usage. Which one is the right one?

2.) systemd-cgtop is not the best implementation for Zabbix
It'll be OK only when, you you are using only one Zabbix server. However you can pull data from many Zabbix servers. And we are back to collector. There should be some process in the background, which will process CPU metrics periodically (here can be used cgtop algorithm) + all CPU metrics will be provided by this process. It'll be challenge to implement collector as Zabbix module.

@tiagoalves83
Copy link
Author

I dont know if systemd-cgtop is the best CPU usage algorithm, but it seems more accurate than docker-stats (I will explain my tests bellow) . I like this module. It's elegant, but in my environment I couldn't make it to show CPU usage correctly (While I get 8000% with the module, docker-stats are showing 105% and systemd-cgtop 99.5% )

My tests were simple:
One VM runnung zabbix server, one VM running docker host with 2 containers (google/cadvisor and agileek/cpuset-test).

docker run   --volume=/:/rootfs:ro   --volume=/var/run:/var/run:rw   --volume=/sys:/sys:ro   --volume=/var/lib/docker/:/var/lib/docker:ro  --publish=8080:8080   --detach=true   --name=cadvisor   google/cadvisor:latest
docker run -ti --name cpuset-test-1 agileek/cpuset-test /cpus 2

Docker host has 2 vCPUs,
systemd-cgtop shows:

# systemd-cgtop
Path                                                                                    Tasks   %CPU   Memory  Input/s Output/s

/                                                                                          67  199.8     1.0G        -        -
/system.slice                                                                               -  192.2   645.1M        -        -
/system.slice/docker-6ebbd82...a996f1e8aaaa0654a9af5fb1707dab680e352618f93f0669.scope       3  114.1   288.0K        -        -
/system.slice/docker-eb2b02b...58e9b1ab23a3f95b9dc5ecfc13e58111ee6628847ccae072.scope       2   68.3    48.6M        -        -

docker-stats shows:

# docker stats 6ebbd82 eb2b02b
CONTAINER           CPU %               MEM USAGE / LIMIT     MEM %               NET I/O              BLOCK I/O
6ebbd82             116.70%             294.9 kB / 2.098 GB   0.01%               648 B / 648 B        0 B / 0 B
eb2b02b             72.79%              51.02 MB / 134.2 MB   38.01%              4.524 MB / 69.7 MB   0 B / 0 B

The CPU usage changes a little, when I and using cpuset-cpus=1

# docker stop 6ebbd82
# docker rm 6ebbd82
# docker run -ti --cpuset-cpus=0 --name cpuset-test-1 agileek/cpuset-test

systemd-cgtop shows:

# systemd-cgtop
Path                                                                                 Tasks   %CPU   Memory  Input/s Output/s

/                                                                                       70  143.1     1.0G        -        -
/system.slice                                                                            -  135.8   646.4M        -        -
/system.slice/docker-5d17c2...49a8a48633ce455cb669bdfb4a353577834b80a012457d.scope       2   97.6   128.0K        -        -
/system.slice/docker-eb2b02...e9b1ab23a3f95b9dc5ecfc13e58111ee6628847ccae072.scope       1   36.5    50.1M        -        -

docker-stats shows:

# docker stats 5d17c2 eb2b02
CONTAINER           CPU %               MEM USAGE / LIMIT     MEM %               NET I/O              BLOCK I/O
5d17c2              105.05%             131.1 kB / 2.098 GB   0.01%               648 B / 648 B        0 B / 0 B
eb2b02              33.55%              52.64 MB / 134.2 MB   39.22%              4.525 MB / 69.7 MB   0 B / 0 B

The point is: when I limit container to just 1 cpu, systemd-cgtop will almost never goes over 100% in a 2 vCPUs VM while within docker-stats this behavior happens very often.

I am not an C expert, but if I got it right, this module lets CPU % usage calcs to zabbix (using deltas), I was wondering how to use the module for doing the calcs inside it (no zabbix deltas calcs).
I believe that zabbix delta calcs and this module are not playing well in my environment.

@jangaraj
Copy link
Member

jangaraj commented Dec 6, 2015

I know, that implementation for CPU monitoring is not perfect. It's in my long term TODO. Feel free to create PR - cgtop CPU monitoring implementation can be at least some experimental feature - docker.xcpu.

@berlic
Copy link

berlic commented Mar 21, 2017

@jangaraj Thanks for you work! Grate for monitoring docker hosts with Zabbix!

But I find your template formula for cpu stats (delta)*100 a bit misleading. cpuacct.stat is given in ticks, and you can't make percents out of it! I thinks just using delta without multiplier and labelling it as ticks is far more easy to understand.

@jangaraj
Copy link
Member

jangaraj commented Mar 21, 2017

@berlic You can - http://serverfault.com/questions/441897/how-to-calculate-cpu-based-on-raw-cpu-ticks-in-snmp.

Linux: usually 100 ticks per second => 100tick/sec = 100% CPU usage
So delta/sec = % usage. You are right - multiplier is not required 👍 I'll fix it.

@jangaraj jangaraj reopened this Mar 22, 2017
@jangaraj jangaraj mentioned this issue Mar 22, 2017
@jangaraj
Copy link
Member

Closing - change has been merged.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants