Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/hostmetrics] Resilient process scraping on Linux #18923

Closed
jskiba opened this issue Feb 27, 2023 · 3 comments
Closed

[receiver/hostmetrics] Resilient process scraping on Linux #18923

jskiba opened this issue Feb 27, 2023 · 3 comments
Assignees
Labels

Comments

@jskiba
Copy link
Contributor

jskiba commented Feb 27, 2023

Component(s)

receiver/hostmetrics

Is your feature request related to a problem? Please describe.

Now (v0.71.0) a process using hostmetrics receiver will not scrape all processes on Linux machines.

2023-03-01T10:36:09.270Z	error	scraperhelper/scrapercontroller.go:197	Error scraping metrics	{"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics", "error": "error reading process name for pid 2: readlink /proc/2/exe: no such file or directory; error reading process name for pid 3: readlink /proc/3/exe: no such file or directory; error reading process name for pid 4: readlink /proc/4/exe: no such file or directory; error reading process name for pid 5: readlink /proc/5/exe: no such file or directory; error reading process name for pid 6: readlink /proc/6/exe: no such file or directory; error reading process name for pid 8: readlink /proc/8/exe: no such file or directory; error reading process name for pid 10: readlink /proc/10/exe: no such file or directory; error reading process name for pid 11: readlink /proc/11/exe: no such file or directory; error reading process name for pid 12: readlink /proc/12/exe: no such file or directory;

So basically we are able to scrape all processes but only because we cannot get the executable path of the process so it is completely discarded which seems wrong to me as we can still collect a lot of data.

Describe the solution you'd like

I would suggest adding two additional flags in process config.
Example:

      process:
        mute_process_exe_error: true
        mute_process_io_error: true

mute_process_exe_error - continue process scraping despite now having executable path, this one fails here
mute_process_io_error - mute io permission denied errors, if hostmetrics receiver is not running with root privileges it is unable to read these so it will add errors for a lot of processes

Describe alternatives you've considered

No response

Additional context

Example errors after making some tweaks in my fork and implementing mute_process_exe_error option:

2023-02-22T14:30:45.059Z	error	scraperhelper/scrapercontroller.go:197	Error scraping metrics	{"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics", "error": "error reading disk usage for process \"systemd\" (pid 1): open /proc/1/io: permission denied; error reading disk usage for process \"kthreadd\" (pid 2): open /proc/2/io: permission denied; error reading disk usage for process \"rcu_gp\" (pid 3): open /proc/3/io: permission denied; error reading disk usage for process \"rcu_par_gp\" (pid 4): open /proc/4/io: permission denied; error reading disk usage for process \"slub_flushwq\" (pid 5): open /proc/5/io: permission denied; error reading disk usage for process \"netns\" (pid 6): open /proc/6/io: permission denied; error reading disk usage for process \"kworker/0:0H-events_highpri\" (pid 8): open /proc/8/io: permission denied; error reading disk usage for process \"mm_percpu_wq\" (pid 10): open /proc/10/io: permission denied; error reading disk usage for process \"rcu_tasks_rude_\" (pid 11): open /proc/11/io: permission denied; error reading disk usage for process \"rcu_tasks_trace\" (pid 12): open /proc/12/io: permission denied; error reading disk usage for process \"ksoftirqd/0\" (pid 13): open /proc/13/io: permission denied; error reading disk usage for process \"rcu_sched\" (pid 14): open /proc/14/io: permission denied; error reading disk usage for process \"migration/0\" (pid 15): open /proc/15/io: permission denied;

these could be further muted with mute_process_io_error option.

@jskiba jskiba added enhancement New feature or request needs triage New item requiring triage labels Feb 27, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@jskiba
Copy link
Contributor Author

jskiba commented Feb 27, 2023

Here is my proposition on how it could be solved

@jskiba jskiba changed the title [receiver/hostmetrics] Scrape all processes without root privileges on Linux [receiver/hostmetrics] Resilient process scraping on Linux Mar 1, 2023
@jskiba
Copy link
Contributor Author

jskiba commented Mar 1, 2023

The problem diagnosed wrong initially so I changed the title and the description

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants