"unable to get disk metrics" when deployed to kubernetes #1961

techdragon · 2018-07-09T01:26:42Z

Describe what happened:
I have the same issue as #1730, with the container deployed to kubernetes. The "solution" to issue #1730 does not apply to a Kubernetes deployment. The issue appears to happen on a subset of my Kubernetes nodes.

[ AGENT ] 2018-07-09 00:39:31 UTC | WARN | (datadog_agent.go:135 in LogMessage) | (disk.py:105) | Unable to get disk metrics for /host/proc/sys/fs/binfmt_misc: [Errno 40] Too many levels of symbolic links: '/host/proc/sys/fs/binfmt_misc'

Describe what you expected:
No errors.

Steps to reproduce the issue:
Currently its happening on 2 out of 7 nodes. So direct reproduction steps are uncertain.

Additional environment details (Operating System, Cloud provider, etc):
Kops deployed Kubernetes on AWS. Running latest DataDog container deployed via a DaemonSet

The text was updated successfully, but these errors were encountered:

amineo · 2018-07-10T15:23:24Z

I was also encountering the same issue with datadog/agent:latest (6.3.2)

Downgrading to datadog/agent:6.3.1 seems to have fixed it for me so there might be a bug somewhere in 6.3.2.

Is it possible that your other nodes are running a different version of the dd-agent?

Hope this helps!

coreypobrien · 2018-07-13T17:47:40Z

I'm seeing this with 6.3.0 so I don't think it is version-related (unless it was fixed in 6.3.1 and then regressed in 6.3.2).

sudermanjr · 2018-07-19T16:16:15Z

I had this issue with 6.3.0 and updated the image to 6.3.1 and the issue went away.

I then updated to 6.3.2 and the newest version of the chart (1.0.0), and the issue is still gone. Either the redeployment fixed something, or there is a change in the chart that fixed it.

Environment

Kube 1.9.9 deployed with kops to AWS

steinnes · 2018-08-16T00:26:25Z

I'm having this issue as well with agent 6.3.2 on k8s 1.9.8 deployed on AWS via kops. I have only three nodes in this cluster, and the host nodes are running the kope.io/k8s-1.9-debian-stretch-amd64-hvm-ebs-2018-03-11AMI.

steinnes · 2018-08-16T00:28:10Z

Not sure if it matters, but this is not the standard AMI, but one which supports hvm, and rootVolumeOptimization.

j-vizcaino · 2018-09-14T15:41:10Z

Tuning the disk check solves this for me. In the conf.d/disk.d/conf.yaml file, make sure the autofs and binfmt_misc filesystems are blacklisted.

Linux OSes using systemd usually have an automount enabled for /proc/sys/fs/binfmt_misc. Blacklisting this prevents the agent from considering this endpoint.

nerdinand · 2018-10-02T09:02:30Z

For the record: To fix this in a Kubernetes deployment I followed this guide: https://docs.datadoghq.com/agent/kubernetes/integrations/#configmap

Leading to these changes in the DaemonSet:

        volumeMounts:
          - name: datadog-agent-config
            mountPath: /conf.d
[...]
      volumes:
        - name: datadog-agent-config
          configMap:
            name: datadog-agent
            items:
            - key: disk-config
              path: disk_check.yaml

Along with this new ConfigMap:

kind: ConfigMap
apiVersion: v1
metadata:
  name: datadog-agent
  namespace: monitoring
data:
  disk-config: |-
    init_config:

    instances:
      - use_mount: false
        excluded_filesystems:
          - autofs
          - /proc/sys/fs/binfmt_misc

This seems to get rid of the warnings...

ofek · 2018-10-31T05:17:17Z

Superseded by DataDog/integrations-core#2492

svrist · 2018-12-13T10:26:10Z

for anyone using the helm chart seeing this issue, I use a values.yaml like this:

datadog:
  apiKey: ...
  appKey: ....
  confd:
    disk.yaml: |-
      init_config:

      instances:
        - use_mount: false
          excluded_filesystems:
            - autofs
            - /proc/sys/fs/binfmt_misc
            - /host/proc/sys/fs/binfmt_misc

Turns out it's called disk.yaml now (6.6.0 / 6.7.0)

assafcoh · 2020-01-05T12:48:00Z

This fix works, but it is a workaround.
These warnings still appear with datadog agent 6.15.0 and also latest image.
The /proc/sys/fs/binfmt_misc should be excluded in datadog agent by default
Is there a version in which this is fixed by default?

grv231 · 2020-05-05T05:04:29Z

I recently found this issue in our Agents as well. Using the container agent: 7.17.0 and still encountering the same issue. Will be putting in the workaround

ofek · 2020-05-05T06:27:05Z

Please comment here instead: DataDog/integrations-core#2492

gaffneyd4 · 2022-03-01T00:19:13Z

For anyone else who finds this issue and is looking to disable these locations from the disk check, the option names have been renamed in this PR.

for anyone using the helm chart seeing this issue, I use a values.yaml like this:
datadog:
  apiKey: ...
  appKey: ....
  confd:
    disk.yaml: |-
      init_config:

      instances:
        - use_mount: false
          excluded_filesystems:
            - autofs
            - /proc/sys/fs/binfmt_misc
            - /host/proc/sys/fs/binfmt_misc
Turns out it's called disk.yaml now (6.6.0 / 6.7.0)

This is the configuration I used in my Helm values file:

datadog:
  confd:
    disk.yaml: |-
      init_config:

      instances:
        - use_mount: false
          file_system_exclude:
            - autofs$
          mount_point_exclude:
            - /proc/sys/fs/binfmt_misc
            - /host/proc/sys/fs/binfmt_misc

orhuidobro · 2023-07-03T12:38:32Z

In my case (Agent v7.42.2), the problem was solved by monitoring only the device that is mounted to the root / :

  disk.yaml: |-
    init_config:

    instances:
        ## Instruct the check to collect using mount points instead of volumes.
      - use_mount: true

        ## Collect data from root mountpoint (regex)
        mount_point_include:
          - ^/$

I took the idea of here.

sarcasticadmin · 2024-12-09T19:40:43Z

Ran into this as well on EKS. Deploying a new cluster with the datadog agent on the hosts in the EKS node group and the helmchart resulted in my logs from the host agent being spammed constantly with the following error:

# journalctl -fu proc-sys-fs-binfmt_misc.automount
Dec 09 18:36:59 ip-5-5-5-5.us-west-2.compute.internal systemd[1]: proc-sys-fs-binfmt_misc.automount: Got au
tomount request for /proc/sys/fs/binfmt_misc, triggered by 3349533 (agent)
Dec 09 18:36:59 ip-5-5-5-5.us-west-2.compute.internal systemd[1]: proc-sys-fs-binfmt_misc.automount: Automo
unt point already active?
Dec 09 18:36:59 ip-5-5-5-5.us-west-2.compute.internal systemd[1]: proc-sys-fs-binfmt_misc.automount: Got au
tomount request for /proc/sys/fs/binfmt_misc, triggered by 3349533 (agent)
Dec 09 18:36:59 ip-5-5-5-5.us-west-2.compute.internal systemd[1]: proc-sys-fs-binfmt_misc.automount: Automo
unt point already active?
...

Incorporating the following changes into my helmchart removed the automount requests on the hosts in the EKS node group: #1961 (comment)

There have been attempts to ignore this already in the agent: DataDog/integrations-core#7650 but they dont seem to be working in this case.

Config

helmchart config

helmchart pre fix:

datadog:
  apiKeyExistingSecret: datadog
  logs:
    enabled: true
    containerCollectAll: true
  apm:
    portEnabled: true
  clusterAgent:
    replicas: 2
    createPodDisruptionBudget: true

helmchart post fix:

datadog:
  apiKeyExistingSecret: datadog
  logs:
    enabled: true
    containerCollectAll: true
  apm:
    portEnabled: true
  clusterAgent:
    replicas: 2
    createPodDisruptionBudget: true
  confd:
    disk.yaml: |-
      init_config:

      instances:
        - use_mount: false
          file_system_exclude:
            - autofs$
          mount_point_exclude:
            - /proc/sys/fs/binfmt_misc
            - /host/proc/sys/fs/binfmt_misc

Version info

version details

kube version:

$  kubectl version
Client Version: v1.31.0
Kustomize Version: v5.4.2
Server Version: v1.31.2-eks-7f9249a

helmchart version:

helm repo: https://helm.datadoghq.com
chart: datadog/datadog
version: 3.77.2

datadog-agent version in pod:

# agent version
Agent 7.58.0 - Commit: cf39839 - Serialization version: v5.0.130 - Go version: go1.22.7

datadog-agent version on EKS host in node group:

# datadog-agent version
Agent 7.59.1 - Commit: 3638fcd32d - Serialization version: v5.0.132 - Go version: go1.22.8

ofek mentioned this issue Oct 31, 2018

[disk] Blacklist certain partitions by default DataDog/integrations-core#2492

Closed

ofek closed this as completed Oct 31, 2018

inductor mentioned this issue Aug 10, 2019

"Timeout while retrieving the disk usage of /host/proc/sys/fs/binfmt_misc mountpoint." #3996

Closed

ofek mentioned this issue Sep 23, 2020

Ignore /proc/sys/fs/binfmt_misc by default DataDog/integrations-core#7650

Merged

vyrwu mentioned this issue Apr 18, 2023

[BUG] AKS Unable to get disk metrics: [Errno 40] Too many levels of symbolic links #16433

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"unable to get disk metrics" when deployed to kubernetes #1961

"unable to get disk metrics" when deployed to kubernetes #1961

techdragon commented Jul 9, 2018 •

edited

Loading

amineo commented Jul 10, 2018 •

edited

Loading

coreypobrien commented Jul 13, 2018

sudermanjr commented Jul 19, 2018 •

edited

Loading

steinnes commented Aug 16, 2018

steinnes commented Aug 16, 2018

j-vizcaino commented Sep 14, 2018

nerdinand commented Oct 2, 2018 •

edited

Loading

ofek commented Oct 31, 2018

svrist commented Dec 13, 2018

assafcoh commented Jan 5, 2020 •

edited

Loading

grv231 commented May 5, 2020

ofek commented May 5, 2020

gaffneyd4 commented Mar 1, 2022 •

edited

Loading

orhuidobro commented Jul 3, 2023 •

edited

Loading

sarcasticadmin commented Dec 9, 2024

"unable to get disk metrics" when deployed to kubernetes #1961

"unable to get disk metrics" when deployed to kubernetes #1961

Comments

techdragon commented Jul 9, 2018 • edited Loading

amineo commented Jul 10, 2018 • edited Loading

coreypobrien commented Jul 13, 2018

sudermanjr commented Jul 19, 2018 • edited Loading

Environment

steinnes commented Aug 16, 2018

steinnes commented Aug 16, 2018

j-vizcaino commented Sep 14, 2018

nerdinand commented Oct 2, 2018 • edited Loading

ofek commented Oct 31, 2018

svrist commented Dec 13, 2018

assafcoh commented Jan 5, 2020 • edited Loading

grv231 commented May 5, 2020

ofek commented May 5, 2020

gaffneyd4 commented Mar 1, 2022 • edited Loading

orhuidobro commented Jul 3, 2023 • edited Loading

sarcasticadmin commented Dec 9, 2024

Config

Version info

techdragon commented Jul 9, 2018 •

edited

Loading

amineo commented Jul 10, 2018 •

edited

Loading

sudermanjr commented Jul 19, 2018 •

edited

Loading

nerdinand commented Oct 2, 2018 •

edited

Loading

assafcoh commented Jan 5, 2020 •

edited

Loading

gaffneyd4 commented Mar 1, 2022 •

edited

Loading

orhuidobro commented Jul 3, 2023 •

edited

Loading