Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to monitor GPU metrics on container level #2095

Closed
Cherishty opened this issue Nov 7, 2018 · 33 comments
Closed

How to monitor GPU metrics on container level #2095

Cherishty opened this issue Nov 7, 2018 · 33 comments

Comments

@Cherishty
Copy link

Cherishty commented Nov 7, 2018

I am a new comer in cadvisor and when I attempt to deploy kube-prometheus on my k8s cluster to monitoring my GPU. There is no GPU usage info in container level and machine level.
My k8s version is v1.9.5 and I use Nvidia GPU in container via setting --feature-aget=Accelerators=true in kubelet instead of device-plugin, It does work when running tensorflow with GPU in container.

I check Not able to collect metrics for nvidia GPU where @mindprince claimed that

GPU monitoring support in Kubernetes was added in 1.9

Also find add accelerator info to container spec where @mindprince claimed that

cAdvisor only supports container level accelerator metrics

I check the running.md of cadvisor where said it can support monitoring GPU by adding some parameter when starting cadvisor

So my questions are :

  1. by default kubelet will start the cadvisor in itself, how to override it to apply Hardware Accelerator Monitoring setting in running.md?
  2. what are the metrics about GPU usage exposing to cadvisor? I can find some metric like kube_pod_container_resource_requests_nvidia_gpu_devices and kube_node_status_capacity_nvidia_gpu_cards in kube-prometheus, but I guess it produced by node-exporter, not cadvisor, am I right?
  3. For cadvisor in kubelet,I can only check it on < nodeip >:10255/metrics, but can NOT access < nodeip >:4194, why this happened?

Can anyone kindly gives me a hand since we do really need to monitor our GPU jobs running in k8s? Thanks!

@WanLinghao
Copy link
Contributor

/cc

@Cherishty
Copy link
Author

Hi @WanLinghao what you mean for /cc ?
Since we are actively working on exposing GPU to Prometheus, any suggestion or guide?

Best Regards!

@WanLinghao
Copy link
Contributor

@Cherishty hello, I am interested in both GPU and cadvisor, I want to know any update about this issue. And I will share if I find any solution about this^_^

@Cherishty
Copy link
Author

cc @mindprince

@rohitagarwal003
Copy link
Contributor

https://github.com/google/cadvisor/blob/master/docs/running.md#hardware-accelerator-monitoring has all the details. You need to make sure that however you are running cAdvisor, it satisfies the two conditions mentioned there: access to NVML library and access to GPU devices.

If you are running cAdvisor embedded into the kubelet, then kubelet should have access to NVML library (i.e. its LD_LIBRARY_PATH should contain the location where nvml is present.) Similarly, it should have access to GPU devices.

If you are running cAdvisor/kubelet inside a container itself, things are complicated but the two requirements are the same. The link above explains how to satisfy these two requirements when running cAdvisor inside the container.

It's hard to debug individual cases without access to the environment.

@dashpole
Copy link
Collaborator

dashpole commented Nov 9, 2018

I got a container running using NVML a couple weeks ago: https://github.com/dashpole/example-gpu-monitor/blob/master/deploy/kubernetes/daemonset.yaml. It probably isn't a great example in terms of security, as it just makes the pod privileged, but it is somewhere to start.

@Cherishty
Copy link
Author

Quite appreciate for your attention and help!
@dashpole I have a try on your solution, checking the log of pod I believe it has detected the GPU, then where it exposes the metrics to and how can I check it?

@mindprince Yes, for embedded cAdvisor, I have set the right LD_LIBRARY_PATH, I am not sure what you mean for have access to GPU devices but it should be since tensorflow training for GPU can run successfully on my GPU node. Additionally, may you double confirm what is the metrics representing for GPU?
Following the context in issue 1912, seems it should be container_accelerator_xxx, but I didn't find clu like this on my <gpu_node_ip>::10255/metrics/cadvisor

Thanks !!!

@dashpole
Copy link
Collaborator

@Cherishty just exec into the pod, and wget or curl localhost:8080/metrics, and look for the container_accelerator... metrics. It is also worth noting that cAdvisor only monitors GPUs being used by containers, so make sure a container is running using the GPU when you are looking for metrics

@Cherishty
Copy link
Author

Cherishty commented Nov 16, 2018

@dashpole Thanks for your guidance.
While it returns nothing when I running wget localhost:8080/metrics , I check the log which reports that:

[root@stcal-102 ~]# kubectl log nvidia-gpu-monitoring-daemonset-4fgp6 -n monitoring
W1116 10:40:48.336472 105667 cmd.go:353] log is DEPRECATED and will be removed in a future version. Use logs instead.
I1113 15:14:59.103954 1 main.go:41] Starting example-gpu-monitor
I1113 15:14:59.798777 1 stats.go:56] NVML initialized. Number of nvidia devices: 1
W1113 15:14:59.804360 1 util_unix.go:75] Using "/podresources/kubelet.sock" as endpoint is deprecated, please consider using full url format "unix:///podresources/kubelet.sock".
E1116 01:59:52.948493 1 metrics.go:71] error getting devices from the kubelet: &{0xc0002101e0}.Get(_) = _, rpc error: code = Unavailable desc = grpc: the connection is unavailable

As I mentioned before, I am using --feature-aget=Accelerators=true in kubelet instead of device-plugin and I do NOT install nvidia-docker, is that a necessary?
Docker version: 1.12.6
Kubernetes version: 1.9.5
Any clues or ideas?

@dashpole
Copy link
Collaborator

Please do not use the example GPU monitor. I just shared that as an example of how to get NVML to work from inside a container. I am adding this patch for the cAdvisor daemonset to show how to get cAdvisor working.

@dashpole
Copy link
Collaborator

It shouldn't matter how your container is consuming GPUs. cAdvisor interacts directly with the cgroup tree.

@Cherishty
Copy link
Author

@dashpole I can not appreciate more for your patient clarification and guidance!
Following your words I try Kustomize with your patch and generate below cadvisor.yaml:

apiVersion: v1
kind: Namespace
metadata:
  labels:
    app: cadvisor
  name: cadvisor
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    seccomp.security.alpha.kubernetes.io/pod: docker/default
  labels:
    app: cadvisor
  name: cadvisor
  namespace: cadvisor
spec:
  selector:
    matchLabels:
      app: cadvisor
      name: cadvisor
  template:
    metadata:
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ""
      labels:
        app: cadvisor
        name: cadvisor
    spec:
      automountServiceAccountToken: false
      containers:
      - args:
        - --housekeeping_interval=10s
        - --max_housekeeping_interval=15s
        - --event_storage_event_limit=default=0
        - --event_storage_age_limit=default=0
        - --disable_metrics=percpu,disk,network,tcp,udp
        - --docker_only
        env:
        - name: LD_LIBRARY_PATH
          value: /bin/nvidia/lib64/
        image: k8s.gcr.io/cadvisor:v0.30.2
        name: cadvisor
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        resources:
          limits:
            cpu: 300m
          requests:
            cpu: 150m
            memory: 200Mi
        securityContext:
          privileged: true
        volumeMounts:
        - mountPath: /dev
          name: dev
        - mountPath: /bin/nvidia/lib64/
          name: libnvidia
        - mountPath: /rootfs
          name: rootfs
          readOnly: true
        - mountPath: /var/run
          name: var-run
          readOnly: true
        - mountPath: /sys
          name: sys
          readOnly: true
        - mountPath: /var/lib/docker
          name: docker
          readOnly: true
        - mountPath: /dev/disk
          name: disk
          readOnly: true
      - command:
        - /monitor
        - --stackdriver-prefix=custom.googleapis.com
        - --source=cadvisor:http://localhost:8080
        - --pod-id=$(POD_NAME)
        - --namespace-id=$(POD_NAMESPACE)
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        image: gcr.io/google-containers/prometheus-to-sd:v0.2.6
        name: prometheus-to-sd
        ports:
        - containerPort: 6061
          name: profiler
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
      priorityClassName: system-node-critical
      terminationGracePeriodSeconds: 30
      tolerations:
      - key: CriticalAddonsOnly
        operator: Exists
      volumes:
      - hostPath:
          path: /dev
        name: dev
      - hostPath:
          path: /usr/lib64/nvidia
        name: libnvidia
      - hostPath:
          path: /
        name: rootfs
      - hostPath:
          path: /var/run
        name: var-run
      - hostPath:
          path: /sys
        name: sys
      - hostPath:
          path: /var/lib/docker
        name: docker
      - hostPath:
          path: /dev/disk
        name: disk

While it occurs CrashLoopBackOff and I collect the error log as below:

kubectl log -p cadvisor-9pl2f -n cadvisor
I1119 07:49:56.632813       1 manager.go:233] Version: {KernelVersion:3.10.0-862.14.4.el7.x86_64 ContainerOsVersion:Alpine Linux v3.7 DockerVersion:1.12.6 DockerAPIVersion:1.24 CadvisorVersion:v0.30.2 CadvisorRevision:de723a09}
E1119 07:49:56.674977       1 factory.go:340] devicemapper filesystem stats will not be reported: usage of thin_ls is disabled to preserve iops
I1119 07:49:56.675983       1 factory.go:356] Registering Docker factory
I1119 07:49:56.676199       1 factory.go:54] Registering systemd factory
I1119 07:49:56.678237       1 factory.go:86] Registering Raw factory
I1119 07:49:56.680433       1 manager.go:1205] Started watching for new ooms in manager
I1119 07:49:56.820255       1 nvidia.go:100] NVML initialized. Number of nvidia devices: 1
I1119 07:49:56.827148       1 manager.go:356] Starting recovery of all containers
I1119 07:49:58.522108       1 manager.go:361] Recovery completed
F1119 07:49:58.727346       1 cadvisor.go:159] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/cpuacct,cpu: no such file or directory

and

[root@stcal-102 ~]# kubectl log -p cadvisor-9pl2f -n cadvisor  prometheus-to-sd
W1119 15:50:15.588580   69321 cmd.go:353] log is DEPRECATED and will be removed in a future version. Use logs instead.
F1119 07:49:57.880255       1 main.go:82] Failed to get GCE config: Not running on GCE.

Seems securityContext: privileged: true is lost in the generated yaml?
Any clues or suggestion?

@Cherishty
Copy link
Author

@mindprince Sorry to disturb you, but still want to make it clear that how can the cadvisor embedded in kubelet access to expose gpu metrics?
Since I can't see any metrics like container_accelerator under the default configuration

Best Regards!

@dashpole
Copy link
Collaborator

hmmm. I confirmed that the container_accelerator_... metrics are in the 1.9.5 release.

@Cherishty it looks like NVML loaded successfully: nvidia.go:100] NVML initialized. Number of nvidia devices: 1. What OS are you using? It looks like you are running into #1444

@rohitagarwal003
Copy link
Contributor

Yes, for embedded cAdvisor, I have set the right LD_LIBRARY_PATH, I am not sure what you mean for have access to GPU devices but it should be since tensorflow training for GPU can run successfully on my GPU node. Additionally, may you double confirm what is the metrics representing for GPU?

By access to GPU devices, I meant the process running cAdvisor, should have access to the GPU devices in /dev/ (/dev/nvidiactl, /dev/nvidia0, and so on).

You should see the log like "NVML initialized, number of nvidia devices: N" for the process running cAdvisor.

@Cherishty
Copy link
Author

@dashpole After fixing the issue you mentioned, The only error it occurs is that prometheus-to-sd container said:

Failed to get GCE config: Not running on GCE.

So I remove it and re-create cadvisor, now it can export container_accelerator_ metrics as you said when invoking GPU in container !

Again, thanks a lot for your contribution and kindness 💯

@Cherishty
Copy link
Author

@mindprince Sorry to say that I am not quite familiar with how cadvisor embedded in k8s, so I assume cadvisor running in kubelet and checking its log I find nothing about NVML :

journalctl -r -u kubelet

kubelet[25630]: I1120 16:55:05.979606   25630 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "libnvidia" (UniqueName: "kubernetes.io/host-path/2a711dea-ec91-11e8-85b6-000d3af9a998-libnvidia") pod "cadvisor-5j7mh" (UID: "2a711dea-ec91-11e8-85b6-000d3af9a998")
kubelet[25630]: E1120 16:55:04.678992   25630 container_manager_linux.go:583] [ContainerManager]: Fail to get rootfs information unable to find data for container /
kubelet[25630]: I1120 16:55:04.180135   25630 kubelet.go:1789] skipping pod synchronization - [container runtime is down]
kubelet[25630]: I1120 16:55:04.113884   25630 kubelet_node_status.go:792] Node became not ready: {Type:Ready Status:False LastHeartbeatTime:2018-11-20 16:55:04.113844472 +0800 CST m=+1.748154141 LastTransitionTime:2018-11-20 16:55:04.113844472 +0800 CST m=+1.748154141 Reason:KubeletNotReady Message:container runtime is down}
kubelet[25630]: I1120 16:55:04.078260   25630 kubelet_network.go:196] Setting Pod CIDR:  -> 10.244.1.0/24
kubelet[25630]: I1120 16:55:04.077938   25630 docker_service.go:343] docker cri received runtime config &RuntimeConfig{NetworkConfig:&NetworkConfig{PodCidr:10.244.1.0/24,},}
kubelet[25630]: I1120 16:55:04.077617   25630 kuberuntime_manager.go:918] updating runtime config through cri with podcidr 10.244.1.0/24
kubelet[25630]: I1120 16:55:03.891016   25630 kubelet_node_status.go:85] Successfully registered node suzlab1080-001
kubelet[25630]: I1120 16:55:03.890976   25630 kubelet_node_status.go:127] Node suzlab1080-001 was previously registered
kubelet[25630]: E1120 16:55:03.678373   25630 container_manager_linux.go:583] [ContainerManager]: Fail to get rootfs information unable to find data for container /
kubelet[25630]: I1120 16:55:03.379983   25630 kubelet.go:1789] skipping pod synchronization - [container runtime is down]
kubelet[25630]: I1120 16:55:02.979777   25630 kubelet.go:1789] skipping pod synchronization - [container runtime is down]
kubelet[25630]: I1120 16:55:02.817838   25630 kubelet_node_status.go:82] Attempting to register node suzlab1080-001
kubelet[25630]: I1120 16:55:02.779384   25630 kubelet.go:1789] skipping pod synchronization - [container runtime is down]
kubelet[25630]: I1120 16:55:02.779257   25630 kubelet_node_status.go:273] Setting node annotation to enable volume controller attach/detach
kubelet[25630]: E1120 16:55:02.707977   25630 factory.go:340] devicemapper filesystem stats will not be reported: usage of thin_ls is disabled to preserve iops
kubelet[25630]: I1120 16:55:02.679263   25630 kubelet.go:1789] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 2562047h47m16.854775807s ago; threshold is 3m0s]
kubelet[25630]: I1120 16:55:02.679204   25630 volume_manager.go:247] Starting Kubelet Volume Manager
kubelet[25630]: I1120 16:55:02.679128   25630 kubelet.go:1772] Starting kubelet main sync loop.
kubelet[25630]: I1120 16:55:02.679077   25630 status_manager.go:140] Starting to sync pod status with apiserver
kubelet[25630]: I1120 16:55:02.679015   25630 fs_resource_analyzer.go:66] Starting FS ResourceAnalyzer
kubelet[25630]: E1120 16:55:02.678152   25630 container_manager_linux.go:583] [ContainerManager]: Fail to get rootfs information unable to find data for container /
kubelet[25630]: I1120 16:55:02.637304   25630 kubelet_node_status.go:273] Setting node annotation to enable volume controller attach/detach
kubelet[25630]: I1120 16:55:02.635290   25630 server.go:299] Adding debug handlers to kubelet server.
kubelet[25630]: I1120 16:55:02.633370   25630 server.go:129] Starting to listen on 0.0.0.0:10250
kubelet[25630]: E1120 16:55:02.633281   25630 kubelet.go:1281] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: unable to find data for container /
kubelet[25630]: I1120 16:55:02.633158   25630 server.go:755] Started kubelet
kubelet[25630]: I1120 16:55:02.632266   25630 client.go:109] Start docker client with request timeout=2m0s
kubelet[25630]: I1120 16:55:02.632239   25630 client.go:80] Connecting to docker on unix:///var/run/docker.sock
kubelet[25630]: I1120 16:55:02.630339   25630 kuberuntime_manager.go:186] Container runtime docker initialized, version: 1.12.6, apiVersion: 1.24.0
kubelet[25630]: I1120 16:55:02.628326   25630 remote_runtime.go:43] Connecting to runtime service unix:///var/run/dockershim.sock
kubelet[25630]: I1120 16:55:02.606446   25630 docker_service.go:250] Setting cgroupDriver to systemd
kubelet[25630]: I1120 16:55:02.606282   25630 docker_service.go:237] Docker Info: &{ID:E2YI:PBFW:JOI7:TW3T:XIEA:GWCS:BCHK:RCTR:MREP:6TPW:MMUJ:S2SO Containers:40 ContainersRunning:40 ContainersPaused:0 ContainersStopped:0 Images:21 Driver:devicemapper DriverStatus:[[Pool Name docker-thinpool] [Pool Blocksize 524.3 kB] [Base Device Size 21.47 GB] [Backing Filesystem xfs] [Data file ] [Metadata file ] [Data Space Used 7.034 GB] [Data Space Total 2.089 TB] [Data Space Available 2.082 TB] [Metadata Space Used 3.588 MB] [Metadata Space Total 16.98 GB] [Metadata Space Available 16.97 GB] [Thin Pool Minimum Free Space 208.9 GB] [Udev Sync Supported true] [Deferred Removal Enabled true] [Deferred Deletion Enabled true] [Deferred Deleted Device Count 0] [Library Version 1.02.146-RHEL7 (2018-01-22)]] SystemStatus:[] Plugins:{Volume:[local] Network:[null host bridge overlay] Authorization:[] Log:[]} MemoryLimit:true SwapLimit:true KernelMemory:true CPUCfsPeriod:true CPUCfsQuota:true CPUShares:true CPUSet:true IPv4Forwarding:true BridgeNfIptables:true BridgeNfIP6tables:true Debug:false NFd:238 OomKillDisable:true NGoroutines:190 SystemTime:2018-11-20T16:55:02.605262641+08:00 LoggingDriver:json-file CgroupDriver:systemd NEventsListener:0 KernelVersion:3.10.0-862.14.4.el7.x86_64 OperatingSystem:CentOS Linux 7 (Core) OSType:linux Architecture:x86_64 IndexServerAddress:https://index.docker.io/v1/ RegistryConfig:0xc42052a5b0 NCPU:8 MemTotal:67038867456 GenericResources:[] DockerRootDir:/var/lib/docker HTTPProxy: HTTPSProxy: NoProxy: Name:suzlab1080-001 Labels:[] ExperimentalBuild:false ServerVersion:1.12.6 ClusterStore: ClusterAdvertise: Runtimes:map[runc:{Path:docker-runc Args:[--systemd-cgroup=true]}] DefaultRuntime:runc Swarm:{NodeID: NodeAddr: LocalNodeState:inactive ControlAvailable:false Error: RemoteManagers:[] Nodes:0 Managers:0 Cluster:0xc420648500} LiveRestoreEnabled:false Isolation: InitBinary: ContainerdCommit:{ID: Expected:} RuncCommit:{ID: Expected:} InitCommit:{ID: Expected:} SecurityOptions:[seccomp]}
kubelet[25630]: I1120 16:55:02.587108   25630 docker_service.go:232] Docker cri networking managed by cni
kubelet[25630]: I1120 16:55:02.580430   25630 client.go:109] Start docker client with request timeout=2m0s
kubelet[25630]: I1120 16:55:02.580408   25630 client.go:80] Connecting to docker on unix:///var/run/docker.sock
kubelet[25630]: I1120 16:55:02.580081   25630 kubelet.go:577] Hairpin mode set to "hairpin-veth"
kubelet[25630]: W1120 16:55:02.580040   25630 kubelet_network.go:139] Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back to "hairpin-veth"
kubelet[25630]: I1120 16:55:02.575019   25630 kubelet.go:316] Watching apiserver
kubelet[25630]: I1120 16:55:02.574964   25630 kubelet.go:291] Adding manifest path: /etc/kubernetes/manifests
kubelet[25630]: I1120 16:55:02.574767   25630 container_manager_linux.go:266] Creating device plugin manager: false
kubelet[25630]: I1120 16:55:02.574536   25630 container_manager_linux.go:247] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:docker CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:systemd KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[memory:{i:{value:1073741824 scale:0} d:{Dec:<nil>} s:1Gi Format:BinarySI}] HardEvictionThresholds:[{Signal:memory.available Operator:LessThan Value:{Quantity:200Mi Percentage:0} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod:0s MinReclaim:<nil>} {Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.2} GracePeriod:0s MinReclaim:<nil>}]} ExperimentalQOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalCPUManagerReconcilePeriod:10s}
kubelet[25630]: I1120 16:55:02.574506   25630 container_manager_linux.go:242] container manager verified user specified cgroup-root exists: /
kubelet[25630]: I1120 16:55:02.573190   25630 server.go:428] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /
kubelet[25630]: I1120 16:55:02.497428   25630 certificate_store.go:130] Loading cert/key pair from ("/var/lib/kubelet/pki/kubelet-client.crt", "/var/lib/kubelet/pki/kubelet-client.key").
kubelet[25630]: I1120 16:55:02.467619   25630 plugins.go:101] No cloud provider specified.
kubelet[25630]: I1120 16:55:02.467484   25630 feature_gate.go:226] feature gates: &{{} map[Accelerators:true]}
kubelet[25630]: I1120 16:55:02.467414   25630 server.go:182] Version: v1.9.5
kubelet[25630]: I1120 16:55:02.456538   25630 controller.go:118] kubelet config controller: validating combination of defaults and flags
kubelet[25630]: I1120 16:55:02.456533   25630 controller.go:114] kubelet config controller: starting controller
kubelet[25630]: I1120 16:55:02.456452   25630 feature_gate.go:226] feature gates: &{{} map[Accelerators:true]}
systemd[1]: Starting kubelet: The Kubernetes Node Agent...
systemd[1]: Started kubelet: The Kubernetes Node Agent.
systemd[1]: kubelet.service failed.
systemd[1]: Unit kubelet.service entered failed state.
systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
systemd[1]: Stopping kubelet: The Kubernetes Node Agent...

@dashpole
Copy link
Collaborator

Glad you got cAdvisor working. I don't see any unusual errors in the kubelet log you provided. It doesn't look like it is actually the full log from kubelet startup (as it usually starts with the flags provided by to the kubelet). Try sudo journalctl -u kubelet --no-pager | grep NVML

@Cherishty
Copy link
Author

Sorry but it return nothing :(

@dashpole
Copy link
Collaborator

Can you provide the full kubelet log from a run after you restart the kubelet? sudo systemctl restart kubelet

@Cherishty
Copy link
Author

Sure, here is my environemnt:

OS kernel : CentOS 7.5 3.10.0-862.14.4.el7.x86_64
kubernetes version :1.9.5
docker version : 1.12.6
nvidia Driver Version: 390.59
GPU : GTX 1080

I have two VM with the same configuration and the same behavior

And below is the log provided by journalctl -u kubelet --no-pager > log after running systemctl restart kubelet

Nov 22 23:43:10 suzlab1080-001 systemd[1]: Stopping kubelet: The Kubernetes Node Agent...
Nov 22 23:43:10 suzlab1080-001 systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
Nov 22 23:43:10 suzlab1080-001 systemd[1]: Unit kubelet.service entered failed state.
Nov 22 23:43:10 suzlab1080-001 systemd[1]: kubelet.service failed.
Nov 22 23:43:10 suzlab1080-001 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Nov 22 23:43:10 suzlab1080-001 systemd[1]: Starting kubelet: The Kubernetes Node Agent...
Nov 22 23:43:10 suzlab1080-001 kubelet[102275]: I1122 23:43:10.862651  102275 feature_gate.go:226] feature gates: &{{} map[Accelerators:true]}
Nov 22 23:43:10 suzlab1080-001 kubelet[102275]: I1122 23:43:10.862749  102275 controller.go:114] kubelet config controller: starting controller
Nov 22 23:43:10 suzlab1080-001 kubelet[102275]: I1122 23:43:10.862769  102275 controller.go:118] kubelet config controller: validating combination of defaults and flags
Nov 22 23:43:10 suzlab1080-001 kubelet[102275]: I1122 23:43:10.874132  102275 server.go:182] Version: v1.9.5
Nov 22 23:43:10 suzlab1080-001 kubelet[102275]: I1122 23:43:10.874201  102275 feature_gate.go:226] feature gates: &{{} map[Accelerators:true]}
Nov 22 23:43:10 suzlab1080-001 kubelet[102275]: I1122 23:43:10.874325  102275 plugins.go:101] No cloud provider specified.
Nov 22 23:43:10 suzlab1080-001 kubelet[102275]: I1122 23:43:10.901221  102275 certificate_store.go:130] Loading cert/key pair from ("/var/lib/kubelet/pki/kubelet-client.crt", "/var/lib/kubelet/pki/kubelet-client.key").
Nov 22 23:43:10 suzlab1080-001 kubelet[102275]: I1122 23:43:10.971050  102275 server.go:428] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /
Nov 22 23:43:10 suzlab1080-001 kubelet[102275]: I1122 23:43:10.972203  102275 container_manager_linux.go:242] container manager verified user specified cgroup-root exists: /
Nov 22 23:43:10 suzlab1080-001 kubelet[102275]: I1122 23:43:10.972230  102275 container_manager_linux.go:247] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:docker CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:systemd KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[memory:{i:{value:1073741824 scale:0} d:{Dec:<nil>} s:1Gi Format:BinarySI}] HardEvictionThresholds:[{Signal:memory.available Operator:LessThan Value:{Quantity:200Mi Percentage:0} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod:0s MinReclaim:<nil>} {Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.2} GracePeriod:0s MinReclaim:<nil>}]} ExperimentalQOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalCPUManagerReconcilePeriod:10s}
Nov 22 23:43:10 suzlab1080-001 kubelet[102275]: I1122 23:43:10.972419  102275 container_manager_linux.go:266] Creating device plugin manager: false
Nov 22 23:43:10 suzlab1080-001 kubelet[102275]: I1122 23:43:10.972590  102275 kubelet.go:291] Adding manifest path: /etc/kubernetes/manifests
Nov 22 23:43:10 suzlab1080-001 kubelet[102275]: I1122 23:43:10.972634  102275 kubelet.go:316] Watching apiserver
Nov 22 23:43:10 suzlab1080-001 kubelet[102275]: W1122 23:43:10.977287  102275 kubelet_network.go:139] Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back to "hairpin-veth"
Nov 22 23:43:10 suzlab1080-001 kubelet[102275]: I1122 23:43:10.977319  102275 kubelet.go:577] Hairpin mode set to "hairpin-veth"
Nov 22 23:43:10 suzlab1080-001 kubelet[102275]: I1122 23:43:10.977617  102275 client.go:80] Connecting to docker on unix:///var/run/docker.sock
Nov 22 23:43:10 suzlab1080-001 kubelet[102275]: I1122 23:43:10.977635  102275 client.go:109] Start docker client with request timeout=2m0s
Nov 22 23:43:10 suzlab1080-001 kubelet[102275]: I1122 23:43:10.984214  102275 docker_service.go:232] Docker cri networking managed by cni
Nov 22 23:43:11 suzlab1080-001 kubelet[102275]: I1122 23:43:11.005308  102275 docker_service.go:237] Docker Info: &{ID:O7OI:MPIJ:N47K:LJFI:3PAO:POKH:O67Z:RPLI:D4CH:PIFX:6WVF:GWHU Containers:39 ContainersRunning:38 ContainersPaused:0 ContainersStopped:1 Images:17 Driver:devicemapper DriverStatus:[[Pool Name docker-thinpool] [Pool Blocksize 524.3 kB] [Base Device Size 21.47 GB] [Backing Filesystem xfs] [Data file ] [Metadata file ] [Data Space Used 5.741 GB] [Data Space Total 2.089 TB] [Data Space Available 2.083 TB] [Metadata Space Used 3.457 MB] [Metadata Space Total 16.98 GB] [Metadata Space Available 16.98 GB] [Thin Pool Minimum Free Space 208.9 GB] [Udev Sync Supported true] [Deferred Removal Enabled true] [Deferred Deletion Enabled true] [Deferred Deleted Device Count 1] [Library Version 1.02.146-RHEL7 (2018-01-22)]] SystemStatus:[] Plugins:{Volume:[local] Network:[host bridge overlay null] Authorization:[] Log:[]} MemoryLimit:true SwapLimit:true KernelMemory:true CPUCfsPeriod:true CPUCfsQuota:true CPUShares:true CPUSet:true IPv4Forwarding:true BridgeNfIptables:true BridgeNfIP6tables:true Debug:false NFd:223 OomKillDisable:true NGoroutines:180 SystemTime:2018-11-22T23:43:11.004295354+08:00 LoggingDriver:json-file CgroupDriver:systemd NEventsListener:0 KernelVersion:3.10.0-862.14.4.el7.x86_64 OperatingSystem:CentOS Linux 7 (Core) OSType:linux Architecture:x86_64 IndexServerAddress:https://index.docker.io/v1/ RegistryConfig:0xc42016a000 NCPU:8 MemTotal:67038867456 GenericResources:[] DockerRootDir:/var/lib/docker HTTPProxy: HTTPSProxy: NoProxy: Name:suzlab1080-001 Labels:[] ExperimentalBuild:false ServerVersion:1.12.6 ClusterStore: ClusterAdvertise: Runtimes:map[runc:{Path:docker-runc Args:[--systemd-cgroup=true]}] DefaultRuntime:runc Swarm:{NodeID: NodeAddr: LocalNodeState:inactive ControlAvailable:false Error: RemoteManagers:[] Nodes:0 Managers:0 Cluster:0xc42098c000} LiveRestoreEnabled:false Isolation: InitBinary: ContainerdCommit:{ID: Expected:} RuncCommit:{ID: Expected:} InitCommit:{ID: Expected:} SecurityOptions:[seccomp]}
Nov 22 23:43:11 suzlab1080-001 kubelet[102275]: I1122 23:43:11.005499  102275 docker_service.go:250] Setting cgroupDriver to systemd
Nov 22 23:43:11 suzlab1080-001 kubelet[102275]: I1122 23:43:11.027212  102275 remote_runtime.go:43] Connecting to runtime service unix:///var/run/dockershim.sock
Nov 22 23:43:11 suzlab1080-001 kubelet[102275]: I1122 23:43:11.029500  102275 kuberuntime_manager.go:186] Container runtime docker initialized, version: 1.12.6, apiVersion: 1.24.0
Nov 22 23:43:11 suzlab1080-001 kubelet[102275]: I1122 23:43:11.031778  102275 client.go:80] Connecting to docker on unix:///var/run/docker.sock
Nov 22 23:43:11 suzlab1080-001 kubelet[102275]: I1122 23:43:11.031814  102275 client.go:109] Start docker client with request timeout=2m0s
Nov 22 23:43:11 suzlab1080-001 kubelet[102275]: I1122 23:43:11.032730  102275 server.go:755] Started kubelet
Nov 22 23:43:11 suzlab1080-001 kubelet[102275]: E1122 23:43:11.032908  102275 kubelet.go:1281] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: unable to find data for container /
Nov 22 23:43:11 suzlab1080-001 kubelet[102275]: I1122 23:43:11.032976  102275 server.go:129] Starting to listen on 0.0.0.0:10250
Nov 22 23:43:11 suzlab1080-001 kubelet[102275]: I1122 23:43:11.034928  102275 server.go:299] Adding debug handlers to kubelet server.
Nov 22 23:43:11 suzlab1080-001 kubelet[102275]: I1122 23:43:11.036642  102275 kubelet_node_status.go:273] Setting node annotation to enable volume controller attach/detach
Nov 22 23:43:11 suzlab1080-001 kubelet[102275]: E1122 23:43:11.081316  102275 container_manager_linux.go:583] [ContainerManager]: Fail to get rootfs information unable to find data for container /
Nov 22 23:43:11 suzlab1080-001 kubelet[102275]: I1122 23:43:11.082329  102275 fs_resource_analyzer.go:66] Starting FS ResourceAnalyzer
Nov 22 23:43:11 suzlab1080-001 kubelet[102275]: I1122 23:43:11.082416  102275 status_manager.go:140] Starting to sync pod status with apiserver
Nov 22 23:43:11 suzlab1080-001 kubelet[102275]: I1122 23:43:11.082486  102275 volume_manager.go:247] Starting Kubelet Volume Manager
Nov 22 23:43:11 suzlab1080-001 kubelet[102275]: I1122 23:43:11.082500  102275 kubelet.go:1772] Starting kubelet main sync loop.
Nov 22 23:43:11 suzlab1080-001 kubelet[102275]: I1122 23:43:11.082572  102275 kubelet.go:1789] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 2562047h47m16.854775807s ago; threshold is 3m0s]
Nov 22 23:43:11 suzlab1080-001 kubelet[102275]: E1122 23:43:11.106867  102275 factory.go:340] devicemapper filesystem stats will not be reported: usage of thin_ls is disabled to preserve iops
Nov 22 23:43:11 suzlab1080-001 kubelet[102275]: I1122 23:43:11.182648  102275 kubelet.go:1789] skipping pod synchronization - [container runtime is down]
Nov 22 23:43:11 suzlab1080-001 kubelet[102275]: I1122 23:43:11.182671  102275 kubelet_node_status.go:273] Setting node annotation to enable volume controller attach/detach
Nov 22 23:43:11 suzlab1080-001 kubelet[102275]: I1122 23:43:11.224339  102275 kubelet_node_status.go:82] Attempting to register node suzlab1080-001
Nov 22 23:43:11 suzlab1080-001 kubelet[102275]: I1122 23:43:11.382967  102275 kubelet.go:1789] skipping pod synchronization - [container runtime is down]
Nov 22 23:43:11 suzlab1080-001 kubelet[102275]: I1122 23:43:11.783123  102275 kubelet.go:1789] skipping pod synchronization - [container runtime is down]
Nov 22 23:43:12 suzlab1080-001 kubelet[102275]: E1122 23:43:12.081579  102275 container_manager_linux.go:583] [ContainerManager]: Fail to get rootfs information unable to find data for container /
Nov 22 23:43:12 suzlab1080-001 kubelet[102275]: I1122 23:43:12.315118  102275 kubelet_node_status.go:127] Node suzlab1080-001 was previously registered
Nov 22 23:43:12 suzlab1080-001 kubelet[102275]: I1122 23:43:12.315156  102275 kubelet_node_status.go:85] Successfully registered node suzlab1080-001
Nov 22 23:43:12 suzlab1080-001 kubelet[102275]: I1122 23:43:12.505908  102275 kuberuntime_manager.go:918] updating runtime config through cri with podcidr 10.244.1.0/24
Nov 22 23:43:12 suzlab1080-001 kubelet[102275]: I1122 23:43:12.506236  102275 docker_service.go:343] docker cri received runtime config &RuntimeConfig{NetworkConfig:&NetworkConfig{PodCidr:10.244.1.0/24,},}
Nov 22 23:43:12 suzlab1080-001 kubelet[102275]: I1122 23:43:12.506607  102275 kubelet_network.go:196] Setting Pod CIDR:  -> 10.244.1.0/24
Nov 22 23:43:12 suzlab1080-001 kubelet[102275]: I1122 23:43:12.547624  102275 kubelet_node_status.go:792] Node became not ready: {Type:Ready Status:False LastHeartbeatTime:2018-11-22 23:43:12.54758975 +0800 CST m=+1.771800734 LastTransitionTime:2018-11-22 23:43:12.54758975 +0800 CST m=+1.771800734 Reason:KubeletNotReady Message:container runtime is down}
Nov 22 23:43:12 suzlab1080-001 kubelet[102275]: I1122 23:43:12.583274  102275 kubelet.go:1789] skipping pod synchronization - [container runtime is down]
Nov 22 23:43:13 suzlab1080-001 kubelet[102275]: E1122 23:43:13.082231  102275 container_manager_linux.go:583] [ContainerManager]: Fail to get rootfs information unable to find data for container /
Nov 22 23:43:14 suzlab1080-001 kubelet[102275]: I1122 23:43:14.382937  102275 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "proc" (UniqueName: "kubernetes.io/host-path/92568c5d-e4b3-11e8-85b6-000d3af9a998-proc") pod "node-exporter-r6bvv" (UID: "92568c5d-e4b3-11e8-85b6-000d3af9a998")

@Cherishty
Copy link
Author

Additionally, below is my configuration for 10-kubeadm.conf and docker daemon.json

10-kubeadm.conf

[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true"
Environment="KUBELET_NETWORK_ARGS=--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin"
Environment="KUBELET_DNS_ARGS=--cluster-dns=10.96.0.10 --cluster-domain=cluster.local"
Environment="KUBELET_AUTHZ_ARGS=--authentication-token-webhook=true --authorization-mode=Webhook --client-ca-file=/etc/kubernetes/pki/ca.crt"
Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=systemd"
Environment="KUBELET_CERTIFICATE_ARGS=--rotate-certificates=true --cert-dir=/var/lib/kubelet/pki"
Environment="KUBELET_EXTRA_ARGS=--feature-gates=Accelerators=true --eviction-hard=memory.available<200Mi,nodefs.available<10%,imagefs.available<20% --eviction-minimum-reclaim=memory.available=100Mi,nodefs.available=5%,imagefs.available=5% --system-reserved=memory=1Gi --root-dir=/var/lib/kubelet"
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_CGROUP_ARGS $KUBELET_CERTIFICATE_ARGS $KUBELET_EXTRA_ARGS

daemon.json

{
    "insecure-registries": [],
    "storage-driver": "devicemapper",
    "storage-opts": [
      "dm.thinpooldev=/dev/mapper/docker-thinpool",
      "dm.basesize=20G",
      "dm.use_deferred_removal=true",
      "dm.use_deferred_deletion=true"
    ],
    "exec-opts": ["native.cgroupdriver=systemd"]
}

Best Regards!

@dashpole
Copy link
Collaborator

Ah, it looks like the log line in question is emitted at V(4). Can you increase the verbosity of the kubelet to --v=4 in KUBELET_EXTRA_ARGS, and look again?

@Cherishty
Copy link
Author

Cherishty commented Nov 29, 2018

That's right.
After doing this and reload daemon,restart docker and kubelet , it shows as below:

Nov 29 18:44:35 suzlab1080-001 kubelet[17830]: I1129 18:44:35.783570   17830 nvidia.go:98] Could not initialize NVML: could not load NVML library
Nov 29 18:44:35 suzlab1080-001 kubelet[17830]: I1129 18:44:35.783773   17830 nvidia.go:59] Starting goroutine to initialize NVML

I have set $LD_LIBRARY_PATH=/usr/lib64/nvidia, where does locate the NVML library mentioned before.

Any clues or suggestion?

@Cherishty
Copy link
Author

BTW, for the cadivsor running in container, which has been proved to run, I can only find container_accelerator_memory_total_bytes and container_accelerator_memory_used_bytes metrics

should any other metrics which reflect gpu-utilization be provided?

@dashpole
Copy link
Collaborator

container_accelerator_duty_cycle is the name of the utilization metric.

@ghost
Copy link

ghost commented Mar 3, 2019

@ dashpole: I am using KOPS cluster. AWS cloud environment. 1 Master and 2 Worker nodes (p2.xlarge and t2.medium)

I am trying to collect GPU metrics by using cAdvisor but unable to gather them. The below is my (cadvisor.yaml) script.

**---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: cadvisor
namespace: monitoring
labels:
app: cadvisor
spec:
selector:
matchLabels:
name: cadvisor
template:
metadata:
labels:
name: cadvisor
spec:
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: cadvisor
image: google/cadvisor:v0.24.1 # k8s.gcr.io/cadvisor:v0.30.2 # google/cadvisor:v0.24.1
volumeMounts:
- name: rootfs
mountPath: /rootfs
readOnly: true
- name: var-run
mountPath: /var/run
readOnly: false
- name: docker
mountPath: /var/lib/docker
readOnly: true
- name: sysfs
mountPath: /sys
readOnly: true
- name: cgroup
mountPath: /sys/fs/cgroup
readOnly: true
- name: libnvidia
mountPath: /usr/local/cuda-9.0/lib64
readOnly: true

- name: kubelet-podresources

mountPath: /var/lib/kubelet/

    - name: dev-nvidiactl
      mountPath: /dev/nvidiactl
    - name: dev-nvidia0
      mountPath: /dev/nvidia0
    - name: dev-nvidia-uvm
      mountPath: /dev/nvidia-uvm
    - name: dev-cgroup
      mountPath: /dev/cgroup
    - name: dev
      mountPath: /dev/
    securityContext:
      privileged: true
    args: 
    - --device-cgroup-rule 'c 195:* mrw'
    ports:
      - name: http
        containerPort: 8899
        protocol: TCP
    env:
    - name: LD_LIBRARY_PATH
      value: "/usr/local/cuda-9.0/lib64/"
    args:
      - --housekeeping_interval=10s
      - --port=8899
      - --storage_driver=influxdb 
      - --storage_driver_host=172.20.60.112:31116
      - --storage_driver_db=cadvisor

- --device-cgroup-rule 'c 195:* mrw'

  terminationGracePeriodSeconds: 30
  volumes:
  - name: rootfs
    hostPath:
      path: /
  - name: var-run
    hostPath:
      path: /var/run
  - name: docker
    hostPath:
      path: /var/lib/docker
  - name: sysfs
    hostPath:
      path: /sys
  - name: cgroup
    hostPath:
      path: /cgroup
  - name: libnvidia
    hostPath:
      path: /usr/local/cuda-9.0/lib64

- name: kubelet-podresoures

hostPath:

path: /var/lib/kubelet/

  - name: dev-nvidiactl
    hostPath:
      path: /dev/nvidiactl
  - name: dev-nvidia0
    hostPath:
      path: /dev/nvidia0
  - name: dev-nvidia-uvm
    hostPath:
      path: /dev/nvidia-uvm
  - name: dev-cgroup
    hostPath:
      path: /cgroup
  - name: dev
    hostPath:
      path: /dev/

---**

It would be great if you could please check and update, what did I miss here as it help me to resolve the issue at the earliest. Thanks a tone in advance.

Regards,
Selva

@dashpole
Copy link
Collaborator

dashpole commented Mar 4, 2019

@reachmeselva the first glance looks like everything you need should be present. Can you check the prometheus API at /metrics? I think the problem might just be that the influxdb storage plugin doesn't push accelerator metrics...

@dashpole
Copy link
Collaborator

dashpole commented Mar 4, 2019

Yeah, you can see all of the series pushed to influxDB. If the prometheus endpoint returns the correct metrics, then all that needs to be done is add accelerator metrics there.

@ghost
Copy link

ghost commented Mar 7, 2019

@ dashpole: Thanks for your reply. It would be great, if you could please provide gpu metrics names and sample link for how to add metrics in influxdb

Thanks in advance.

@dashpole
Copy link
Collaborator

dashpole commented Mar 7, 2019

In storage/influxdb/influxdb.go#L224, you add the influxDB "points" from the v1 container stats. You just need to take the stats in the Accelerators portion of container stats (info/v1/container.go#L583), and change them to the influxdb format.

@GuiBin2013
Copy link

@dashpole hi
I try to monitor GPU metrics(per container) by cadvisor. I run these containers without k8s.
But I find the metrics(GPU) of every continers are same , such as container_accelerator_memory_total_bytes and container_accelerator_memory_used_bytes and so on.
I think these metrics are the sum of all container, but not per container. is it right?
Thanks in advance.

@dashpole
Copy link
Collaborator

@GuiBin2013 cAdvisor monitors GPUs in two steps:

  1. Look at the devices cgroup to get container <-> gpu mappings
  2. Monitor each GPU found using NVML.

Step (2) can only give us metrics for the entire GPU, as NVIDIA GPUs aren't natively supported in linux cgroups. So if you are sharing GPUs between two containers, you should get two identical metrics for each container.

I am closing this issue, as the original question has been answered. For further questions, please open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants