[disk] Blacklist certain partitions by default #2492

ofek · 2018-10-31T05:15:42Z

Continuation of DataDog/datadog-agent#1961
Advanced filtering logic introduced in #2483

Adding new things to blacklist by default will be a breaking change and will likely require a major Agent release. As such, let's take the time to compile everything that should be excluded by default.

So far we have:

file_system_blacklist
- autofs$
- iso9660$ (cdrom) note we already do this one
device_blacklist
- none so far
mount_point_blacklist
- /proc/sys/fs/binfmt_misc (added now, see Ignore /proc/sys/fs/binfmt_misc by default #7650)

It should also be noted that since blacklists take precedence over whitelists, ~~users would need to update both to re-enable something. Therefore, reducing the occurrence of that should be a goal.~~ Implemented a better way: #7648

cc @DataDog/agent-integrations @DataDog/agent-core @DataDog/container-integrations

cc original participants @techdragon @amineo @coreypobrien @sudermanjr @steinnes @j-vizcaino @nerdinand

The text was updated successfully, but these errors were encountered:

j-vizcaino · 2018-10-31T08:49:15Z

As far as filesystem metrics are concerned, using a whitelist should be the easiest and safest way to achieve what a user wants. I suspect the list of FS to consider this way would be pretty small (ext*, xfs, zfs, btrfs).
Given the myriad of pseudo filesystems currently available and mounted in Linux, building an exhaustive blacklist would be a daunting task. Even worse, since the family of pseudo-filesystem is in constant evolution, that would require maintenance over time.

It's also worth noting mount_point_blacklist default only addresses the non-containerized agent deployment. Again, FS whitelisting should be the answer here, automagically skipping binfmt_misc pseudo FS.

It should also be noted that since blacklists take precedence over whitelists, users would need to update both to re-enable something.

As a user, this behaviour seems a bit awkward: could we make it blacklist first, then whitelist entries after that? That would allow us to blacklist every FS by default, then "punch holes" by whitelisting a small set of FS that provide meaningful stats.

zippolyte · 2018-10-31T10:57:04Z

As a user, this behaviour seems a bit awkward: could we make it blacklist first, then whitelist entries after that? That would allow us to blacklist every FS by default, then "punch holes" by whitelisting a small set of FS that provide meaningful stats.

Well if you want just a specific set of FS, you just need to whitelist them, and only the whitelisted ones will be considered, no need to explicitly blacklist the ones you don't want.
The precedence of the blacklist over the whitelist only applies when there is an intersection between whitelisted elements, and blacklisted ones. In this case then the elements that match both the whitelist and the blacklist will be blacklisted. Does that make sense ?

This is the way we have implemented blacklist/whitelist in other integrations as well.

sandstrom · 2019-04-04T16:15:36Z

I was bitten by this recently. Mine looked something like this:

udev             1977284       0   1977284   0% /dev
tmpfs             397864     744    397120   1% /run
/dev/nvme0n1p1  24329532 7438244  16874904  31% /
tmpfs            1989304       0   1989304   0% /dev/shm
tmpfs               5120       0      5120   0% /run/lock
tmpfs            1989304       0   1989304   0% /sys/fs/cgroup
/dev/loop5         16896   16896         0 100% /snap/amazon-ssm-agent/784
/dev/loop2         18432   18432         0 100% /snap/amazon-ssm-agent/930
…

And the end-result in Datadog was that the value system.disk.in_use would be the average of several devices (some are already filtered by the Datadog agent per default, but not all of them). Since some were at 0%, it would deflate the actual value, so our trigger at 80% usage wouldn't fire even though the main disk was at ~99% and the system crashed.

For a tool such as DataDog, I don't think I'm alone in assuming that a monitor with something like warn if disk.in_use > 80 would "do the right thing" out of the box.

Just wanted to share this example.

grv231 · 2020-05-05T17:50:39Z

Faced similar issues (related to certain partitions) in our agent (version 7.17). Had to go with the workaround of adding disk.yaml configmap in addition to the in our Kube DD agent daemonsets manifests. I would really like this feature to be there by default so that the kube manifests files are much cleaner. In addition to this, documentation on DD does mention the issue, but never really tells on what has to be done (tasks). Could have easily saved a lot of time yesterday if at least the GitHub issue links were mentioned (just my 2 cents)

KIVagant · 2020-05-27T21:25:28Z

Does anyone have a ready-to-use example of Helm values that one can use to ignore /host/proc/sys/fs/binfmt_misc?

mx-psi · 2020-09-24T07:20:27Z

#7378 adds an option to ignore non physical file systems that is relevant for this goal. We still track all devices by default to avoid breaking changes.

ofek added documentation/proposal kind/feature-request integration/disk labels Oct 31, 2018

ofek mentioned this issue Oct 31, 2018

"unable to get disk metrics" when deployed to kubernetes DataDog/datadog-agent#1961

Closed

masci removed the documentation/proposal label Nov 26, 2018

jmchuster mentioned this issue Jul 21, 2020

Too many levels of symbolic links occur again in datadog-agent:6.11.0 DataDog/datadog-agent#3329

Closed

ofek mentioned this issue Sep 23, 2020

Ignore /proc/sys/fs/binfmt_misc by default #7650

Merged

hithwen closed this as completed May 6, 2021

DataDog locked and limited conversation to collaborators May 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

[disk] Blacklist certain partitions by default #2492

[disk] Blacklist certain partitions by default #2492

ofek commented Oct 31, 2018 •

edited

Loading

j-vizcaino commented Oct 31, 2018 •

edited

Loading

zippolyte commented Oct 31, 2018

sandstrom commented Apr 4, 2019 •

edited

Loading

grv231 commented May 5, 2020 •

edited

Loading

KIVagant commented May 27, 2020 •

edited

Loading

mx-psi commented Sep 24, 2020

This issue was moved to a discussion.

This issue was moved to a discussion.

[disk] Blacklist certain partitions by default #2492

[disk] Blacklist certain partitions by default #2492

Comments

ofek commented Oct 31, 2018 • edited Loading

j-vizcaino commented Oct 31, 2018 • edited Loading

zippolyte commented Oct 31, 2018

sandstrom commented Apr 4, 2019 • edited Loading

grv231 commented May 5, 2020 • edited Loading

KIVagant commented May 27, 2020 • edited Loading

mx-psi commented Sep 24, 2020

This issue was moved to a discussion.

ofek commented Oct 31, 2018 •

edited

Loading

j-vizcaino commented Oct 31, 2018 •

edited

Loading

sandstrom commented Apr 4, 2019 •

edited

Loading

grv231 commented May 5, 2020 •

edited

Loading

KIVagant commented May 27, 2020 •

edited

Loading