Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"unable to get disk metrics" when deployed to kubernetes #1961

Closed
techdragon opened this issue Jul 9, 2018 · 15 comments
Closed

"unable to get disk metrics" when deployed to kubernetes #1961

techdragon opened this issue Jul 9, 2018 · 15 comments

Comments

@techdragon
Copy link

techdragon commented Jul 9, 2018

Describe what happened:
I have the same issue as #1730, with the container deployed to kubernetes. The "solution" to issue #1730 does not apply to a Kubernetes deployment. The issue appears to happen on a subset of my Kubernetes nodes.

[ AGENT ] 2018-07-09 00:39:31 UTC | WARN | (datadog_agent.go:135 in LogMessage) | (disk.py:105) | Unable to get disk metrics for /host/proc/sys/fs/binfmt_misc: [Errno 40] Too many levels of symbolic links: '/host/proc/sys/fs/binfmt_misc'

Describe what you expected:
No errors.

Steps to reproduce the issue:
Currently its happening on 2 out of 7 nodes. So direct reproduction steps are uncertain.

Additional environment details (Operating System, Cloud provider, etc):
Kops deployed Kubernetes on AWS. Running latest DataDog container deployed via a DaemonSet

@amineo
Copy link

amineo commented Jul 10, 2018

I was also encountering the same issue with datadog/agent:latest (6.3.2)

Downgrading to datadog/agent:6.3.1 seems to have fixed it for me so there might be a bug somewhere in 6.3.2.

Is it possible that your other nodes are running a different version of the dd-agent?

Hope this helps!

@coreypobrien
Copy link

I'm seeing this with 6.3.0 so I don't think it is version-related (unless it was fixed in 6.3.1 and then regressed in 6.3.2).

@sudermanjr
Copy link

sudermanjr commented Jul 19, 2018

I had this issue with 6.3.0 and updated the image to 6.3.1 and the issue went away.

I then updated to 6.3.2 and the newest version of the chart (1.0.0), and the issue is still gone. Either the redeployment fixed something, or there is a change in the chart that fixed it.

Environment

Kube 1.9.9 deployed with kops to AWS

@steinnes
Copy link

I'm having this issue as well with agent 6.3.2 on k8s 1.9.8 deployed on AWS via kops. I have only three nodes in this cluster, and the host nodes are running the kope.io/k8s-1.9-debian-stretch-amd64-hvm-ebs-2018-03-11AMI.

@steinnes
Copy link

Not sure if it matters, but this is not the standard AMI, but one which supports hvm, and rootVolumeOptimization.

@j-vizcaino
Copy link
Contributor

Tuning the disk check solves this for me. In the conf.d/disk.d/conf.yaml file, make sure the autofs and binfmt_misc filesystems are blacklisted.

Linux OSes using systemd usually have an automount enabled for /proc/sys/fs/binfmt_misc. Blacklisting this prevents the agent from considering this endpoint.

@nerdinand
Copy link

nerdinand commented Oct 2, 2018

For the record: To fix this in a Kubernetes deployment I followed this guide: https://docs.datadoghq.com/agent/kubernetes/integrations/#configmap

Leading to these changes in the DaemonSet:

        volumeMounts:
          - name: datadog-agent-config
            mountPath: /conf.d
[...]
      volumes:
        - name: datadog-agent-config
          configMap:
            name: datadog-agent
            items:
            - key: disk-config
              path: disk_check.yaml

Along with this new ConfigMap:

kind: ConfigMap
apiVersion: v1
metadata:
  name: datadog-agent
  namespace: monitoring
data:
  disk-config: |-
    init_config:

    instances:
      - use_mount: false
        excluded_filesystems:
          - autofs
          - /proc/sys/fs/binfmt_misc

This seems to get rid of the warnings...

@ofek
Copy link
Contributor

ofek commented Oct 31, 2018

Superseded by DataDog/integrations-core#2492

@ofek ofek closed this as completed Oct 31, 2018
@svrist
Copy link

svrist commented Dec 13, 2018

for anyone using the helm chart seeing this issue, I use a values.yaml like this:

datadog:
  apiKey: ...
  appKey: ....
  confd:
    disk.yaml: |-
      init_config:

      instances:
        - use_mount: false
          excluded_filesystems:
            - autofs
            - /proc/sys/fs/binfmt_misc
            - /host/proc/sys/fs/binfmt_misc

Turns out it's called disk.yaml now (6.6.0 / 6.7.0)

@assafcoh
Copy link

assafcoh commented Jan 5, 2020

This fix works, but it is a workaround.
These warnings still appear with datadog agent 6.15.0 and also latest image.
The /proc/sys/fs/binfmt_misc should be excluded in datadog agent by default
Is there a version in which this is fixed by default?

@grv231
Copy link

grv231 commented May 5, 2020

I recently found this issue in our Agents as well. Using the container agent: 7.17.0 and still encountering the same issue. Will be putting in the workaround

@ofek
Copy link
Contributor

ofek commented May 5, 2020

Please comment here instead: DataDog/integrations-core#2492

@gaffneyd4
Copy link
Contributor

gaffneyd4 commented Mar 1, 2022

For anyone else who finds this issue and is looking to disable these locations from the disk check, the option names have been renamed in this PR.

for anyone using the helm chart seeing this issue, I use a values.yaml like this:

datadog:
  apiKey: ...
  appKey: ....
  confd:
    disk.yaml: |-
      init_config:

      instances:
        - use_mount: false
          excluded_filesystems:
            - autofs
            - /proc/sys/fs/binfmt_misc
            - /host/proc/sys/fs/binfmt_misc

Turns out it's called disk.yaml now (6.6.0 / 6.7.0)

This is the configuration I used in my Helm values file:

datadog:
  confd:
    disk.yaml: |-
      init_config:

      instances:
        - use_mount: false
          file_system_exclude:
            - autofs$
          mount_point_exclude:
            - /proc/sys/fs/binfmt_misc
            - /host/proc/sys/fs/binfmt_misc

@orhuidobro
Copy link

orhuidobro commented Jul 3, 2023

In my case (Agent v7.42.2), the problem was solved by monitoring only the device that is mounted to the root / :

  disk.yaml: |-
    init_config:

    instances:
        ## Instruct the check to collect using mount points instead of volumes.
      - use_mount: true

        ## Collect data from root mountpoint (regex)
        mount_point_include:
          - ^/$

I took the idea of here.

@sarcasticadmin
Copy link

Ran into this as well on EKS. Deploying a new cluster with the datadog agent on the hosts in the EKS node group and the helmchart resulted in my logs from the host agent being spammed constantly with the following error:

# journalctl -fu proc-sys-fs-binfmt_misc.automount
Dec 09 18:36:59 ip-5-5-5-5.us-west-2.compute.internal systemd[1]: proc-sys-fs-binfmt_misc.automount: Got au
tomount request for /proc/sys/fs/binfmt_misc, triggered by 3349533 (agent)
Dec 09 18:36:59 ip-5-5-5-5.us-west-2.compute.internal systemd[1]: proc-sys-fs-binfmt_misc.automount: Automo
unt point already active?
Dec 09 18:36:59 ip-5-5-5-5.us-west-2.compute.internal systemd[1]: proc-sys-fs-binfmt_misc.automount: Got au
tomount request for /proc/sys/fs/binfmt_misc, triggered by 3349533 (agent)
Dec 09 18:36:59 ip-5-5-5-5.us-west-2.compute.internal systemd[1]: proc-sys-fs-binfmt_misc.automount: Automo
unt point already active?
...

Incorporating the following changes into my helmchart removed the automount requests on the hosts in the EKS node group: #1961 (comment)

There have been attempts to ignore this already in the agent: DataDog/integrations-core#7650 but they dont seem to be working in this case.

Config

helmchart config helmchart pre fix:
datadog:
  apiKeyExistingSecret: datadog
  logs:
    enabled: true
    containerCollectAll: true
  apm:
    portEnabled: true
  clusterAgent:
    replicas: 2
    createPodDisruptionBudget: true

helmchart post fix:

datadog:
  apiKeyExistingSecret: datadog
  logs:
    enabled: true
    containerCollectAll: true
  apm:
    portEnabled: true
  clusterAgent:
    replicas: 2
    createPodDisruptionBudget: true
  confd:
    disk.yaml: |-
      init_config:

      instances:
        - use_mount: false
          file_system_exclude:
            - autofs$
          mount_point_exclude:
            - /proc/sys/fs/binfmt_misc
            - /host/proc/sys/fs/binfmt_misc

Version info

version details

kube version:

$  kubectl version
Client Version: v1.31.0
Kustomize Version: v5.4.2
Server Version: v1.31.2-eks-7f9249a

helmchart version:

helm repo: https://helm.datadoghq.com
chart: datadog/datadog
version: 3.77.2

datadog-agent version in pod:

# agent version
Agent 7.58.0 - Commit: cf39839 - Serialization version: v5.0.130 - Go version: go1.22.7

datadog-agent version on EKS host in node group:

# datadog-agent version
Agent 7.59.1 - Commit: 3638fcd32d - Serialization version: v5.0.132 - Go version: go1.22.8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests