Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use docker client CPU & memory percent helper functions #2457

Closed
georgyturevich opened this issue Feb 22, 2017 · 8 comments
Closed

Use docker client CPU & memory percent helper functions #2457

georgyturevich opened this issue Feb 22, 2017 · 8 comments
Labels
area/docker bug unexpected problem or unintended behavior
Milestone

Comments

@georgyturevich
Copy link

Hello there,

Relevant telegraf.conf:

[[inputs.docker]]
  endpoint = "ENV"
  container_names = []
  timeout = "10s"
  perdevice = false
  total = true

System info:

Telegraf version:
Telegraf v1.1.1 (git: release-1.1.0 94de9dca1fc6efb3a4bf3ec6869c356278c6755a)

Operating system:
Windows Server 2016 Version 1607 (OS Build 14393.693)

Output of docker version

Client:
 Version:      1.13.1-cs1
 API version:  1.26
 Go version:   go1.7.5
 Git commit:   8709b81
 Built:        Thu Feb  9 02:05:36 2017
 OS/Arch:      windows/amd64

Server:
 Version:      1.13.1-cs1
 API version:  1.26 (minimum version 1.24)
 Go version:   go1.7.5
 Git commit:   8709b81
 Built:        Thu Feb  9 02:05:36 2017
 OS/Arch:      windows/amd64
 Experimental: false

Steps to reproduce:

When we try to request stats data about container from Docker API (e.g http://host:2375/containers/gt_test_iis/stats). It shows us

{
   "read":"2017-01-11T08:32:46.2413794Z",
   "preread":"0001-01-01T00:00:00Z",
…
   "num_procs":64,
…
   "cpu_stats":{
      "cpu_usage":{
         "total_usage":536718750,
         "usage_in_kernelmode":390468750,
         "usage_in_usermode":390468750
      },
      "throttling_data":{
         "periods":0,
         "throttled_periods":0,
         "throttled_time":0
      }
   },
   "precpu_stats":{
      "cpu_usage":{
         "total_usage":0,
         "usage_in_kernelmode":0,
         "usage_in_usermode":0
      },
      "throttling_data":{
         "periods":0,
         "throttled_periods":0,
         "throttled_time":0
      }
   },
   "memory_stats":{
      "commitbytes":77160448,
      "commitpeakbytes":105000960,
      "privateworkingset":59961344
   },
   "name":"/gt_test_iis",
...
}

which is quite different from otput which we can se on Linux

{
   "read":"2017-01-11T08:41:36.739447065Z",
   "precpu_stats":{
      "cpu_usage":{
         "total_usage":0,
         "percpu_usage":null,
         "usage_in_kernelmode":0,
         "usage_in_usermode":0
      },
      "system_cpu_usage":0,
      "throttling_data":{
         "periods":0,
         "throttled_periods":0,
         "throttled_time":0
      }
   },
   "cpu_stats":{
      "cpu_usage":{
         "total_usage":101417508,
         "percpu_usage":[
            0,
			...
            0,
            1112514,
            4651620,
			...
         ],
         "usage_in_kernelmode":60000000,
         "usage_in_usermode":30000000
      },
      "system_cpu_usage":30758197360000000,
      "throttling_data":{
         "periods":19,
         "throttled_periods":0,
         "throttled_time":0
      }
   },
   "memory_stats":{
      "usage":6504448,
      "max_usage":7073792,
      "stats":{
         "active_anon":675840,
         "active_file":8192,
         "cache":90112,
         "dirty":0,
         "hierarchical_memory_limit":4294967296,
         "inactive_anon":45056,
         "inactive_file":4096,
         "mapped_file":0,
         "pgfault":856,
         "pgmajfault":0,
         "pgpgin":560,
         "pgpgout":381,
         "rss":643072,
         "rss_huge":0,
         "total_active_anon":675840,
         "total_active_file":8192,
         "total_cache":90112,
         "total_dirty":0,
         "total_inactive_anon":45056,
         "total_inactive_file":4096,
         "total_mapped_file":0,
         "total_pgfault":856,
         "total_pgmajfault":0,
         "total_pgpgin":560,
         "total_pgpgout":381,
         "total_rss":643072,
         "total_rss_huge":0,
         "total_unevictable":0,
         "total_writeback":0,
         "unevictable":0,
         "writeback":0
      },
      "failcnt":0,
      "limit":4294967296
   },
...
}

So as a result - we see a lot of zero memory stats

docker_container_mem_active_anon{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_active_file{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_cache{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_fail_count{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_hierarchical_memory_limit{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_inactive_anon{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_inactive_file{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_limit{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_mapped_file{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_max_usage{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_pgfault{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_pgmajfault{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_pgpgin{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_pgpgout{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_rss{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_rss_huge{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_total_active_anon{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_total_active_file{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_total_cache{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_total_inactive_anon{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_total_inactive_file{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_total_mapped_file{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_total_pgfault{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_total_pgmafault{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_total_pgpgin{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_total_pgpgout{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_total_rss{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_total_rss_huge{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_total_unevictable{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_total_writeback{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_unevictable{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_usage{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_usage_percent{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_mem_writeback{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0

Another thing that CPU percentage also is calculated incorrectly. At least these two

docker_container_cpu_usage_percent{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",cpu="cpu-total",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_cpu_usage_system{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",cpu="cpu-total",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0

And I do not have clear understanding what following stats can mean for Windows containers

docker_container_cpu_throttling_periods{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",cpu="cpu-total",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_cpu_throttling_throttled_periods{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",cpu="cpu-total",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_cpu_throttling_throttled_time{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",cpu="cpu-total",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 0
docker_container_cpu_usage_in_kernelmode{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",cpu="cpu-total",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 3.10625e+08
docker_container_cpu_usage_in_usermode{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",cpu="cpu-total",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 3.10625e+08
docker_container_cpu_usage_total{container_image="microsoft/iis",container_name="gt_test_iis",container_version="unknown",cpu="cpu-total",engine_host="EC2AMAZ-MNUSQD8",host="EC2AMAZ-MNUSQD8"} 5.2765625e+08

You can see how Docker itself calculates CPU percentage there

func calculateCPUPercentWindows(v *types.StatsJSON) float64 {
	// Max number of 100ns intervals between the previous time read and now
	possIntervals := uint64(v.Read.Sub(v.PreRead).Nanoseconds()) // Start with number of ns intervals
	possIntervals /= 100                                         // Convert to number of 100ns intervals
	possIntervals *= uint64(v.NumProcs)                          // Multiple by the number of processors

	// Intervals used
	intervalsUsed := v.CPUStats.CPUUsage.TotalUsage - v.PreCPUStats.CPUUsage.TotalUsage

	// Percentage avoiding divide-by-zero
	if possIntervals > 0 {
		return float64(intervalsUsed) / float64(possIntervals) * 100.0
	}
	return 0.00
}

but Telegraf use another algoritm for calculating CPU percentage there

func calculateCPUPercent(stat *types.StatsJSON) float64 {
	var cpuPercent = 0.0
	// calculate the change for the cpu and system usage of the container in between readings
	cpuDelta := float64(stat.CPUStats.CPUUsage.TotalUsage) - float64(stat.PreCPUStats.CPUUsage.TotalUsage)
	systemDelta := float64(stat.CPUStats.SystemUsage) - float64(stat.PreCPUStats.SystemUsage)

	if systemDelta > 0.0 && cpuDelta > 0.0 {
		cpuPercent = (cpuDelta / systemDelta) * float64(len(stat.CPUStats.CPUUsage.PercpuUsage)) * 100.0
	}
	return cpuPercent
}

It works fine for Linux docker but not for Windows where we do not have stat.CPUStats.SystemUsage variable at all as you can see.

Proposal

So, may be it makes seance to leave only following stats for mem/cpu on Windows, which is known how to calculate correctly

docker_container_cpu_usage_percent
docker_container_mem_commitbytes
docker_container_mem_commitpeakbytes
docker_container_mem_privateworkingset

May be you can also suppose other statistics.

Thanks in advance!

@sparrc sparrc changed the title Docker input shows incorrect mem/cpu stats on Windows Use docker client CPU percent helper functions Feb 22, 2017
@sparrc
Copy link
Contributor

sparrc commented Feb 22, 2017

looks like the official docker client may have a better way of getting the cpu usage percentages than what we're currently doing. Changing this issue to re-evaluate that as a whole.

@sparrc sparrc added this to the 1.3.0 milestone Feb 22, 2017
@sparrc sparrc added the bug unexpected problem or unintended behavior label Feb 22, 2017
@georgyturevich
Copy link
Author

@sparrc Hi Cameron

I see that you renamed the issue to "Use docker client CPU percent helper functions" but the bug is not only about CPU. As I wrote output of memory stats is also incorrect.

@sparrc sparrc changed the title Use docker client CPU percent helper functions Use docker client CPU & memory percent helper functions Feb 23, 2017
georgyturevich added a commit to georgyturevich/telegraf that referenced this issue Feb 24, 2017
@georgyturevich
Copy link
Author

@sparrc Hi Cameron,

If it is usefull I pushed my quick changes to fix our internal monitoring

georgyturevich@80cf1cb

Sorry for my ugly code, it were my first hours with Golang :)

On thing that I faced with similar error when I tried to perform make windows:

 github.com/influxdata/telegraf/plugins/inputs/docker
 plugins/inputs/docker/docker.go:103: cannot use c (type *client.Client) as     type DockerClient in assignment:
    *client.Client does not implement DockerClient (wrong type for  ContainerList method)
            have ContainerList("github.com/docker/docker/vendor/golang.org/x/net/context".Context, types.ContainerListOptions) ([]types.Container, error)
            want ContainerList("context".Context, types.ContainerListOptions) ([]types.Container, error) make: *** [build] Error 2  sh-4.2# vi plugins/inputs/docker/docker.go

To resolve it I had to delete this folder from docker vendors direcotory
$GOPATH/src/github.com/docker/docker/vendor/golang.org

@georgyturevich
Copy link
Author

Forgot to mention that it is branche form 1.1 version as we use exact that version on our servers now

@danielnelson
Copy link
Contributor

@georgyturevich We have changed the docker client for the next release, are you able to test with the latest nightly and provide an update?
https://dl.influxdata.com/telegraf/nightlies/telegraf-nightly_windows_amd64.zip

@georgyturevich
Copy link
Author

@danielnelson Hi Daniel,

I can have look but I do not see significant changes in the code. It is still use single function for calculating CPU docker.go#L387 but for Windows we should use another like there in my simple fix commit #diff-691fdaed...R431 . Also for windows we do not have enough data to calculate Memory percentage. Docker API returns only following three stats:

  • stat.MemoryStats.Commit,
  • stat.MemoryStats.CommitPeak,
  • stat.MemoryStats.PrivateWorkingSet,

@eesprit
Copy link
Contributor

eesprit commented Jul 20, 2017

@georgyturevich there is one thing wrong in your code : the use of runtime.GOOS to detect the OS will give you the OS on which Telegraf is running, which might be different then the host Docker is running on (users can collect metrics on a docker daemon running on a different host, using the tcp socket).

You might better rely on the OSType obtained from the daemon info, like it is done for the name of the daemon here : https://github.com/georgyturevich/telegraf/blob/80cf1cbbfe9be2f28a4ca3555f84f82eba227c94/plugins/inputs/docker/docker.go#L152

@danielnelson
Copy link
Contributor

Fixed in #3043

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docker bug unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

4 participants