Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent EINVAL errors from KERN_PROCARGS2 in beats self-monitoring on MacOS #47

Open
fearful-symmetry opened this issue Jul 26, 2022 · 1 comment
Labels
bug Something isn't working Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Comments

@fearful-symmetry
Copy link
Contributor

fearful-symmetry commented Jul 26, 2022

Recently, we've seen a few beats errors like this on systems running MacOS:

Error while getting memory usage: error fetching PID 222: FillPidMetrics: error fetching string data from process: error in sysctl: invalid argument
Error retrieving CPU percentages: error fetching PID 222: FillPidMetrics: error fetching string data from process: error in sysctl: invalid argument

Oddly enough, these errors originate from the self-monitoring subsystem that's setup here: https://github.com/elastic/elastic-agent-system-metrics/blob/main/report/setup.go

The error itself originates from a syscall with MIBs []C.int{C.CTL_KERN, C.KERN_PROCARGS2, C.int(pid)}:

func getProcArgs(pid int, filter func(string) bool) ([]string, string, mapstr.M, error) {

Based on some research, there are three main instances in which this sysctl can return an EINVAL error:

  • the PID no longer exists
  • The process is in a zombie state
  • The buffer size sent to the sysctl is wrong
  • Permission denied

However, it seems like none of these situations apply to us: This is an error coming from a process's attempt to monitor itself, via os.Getpid. That means it can't be a zombie process, and the PID exists.

The buffer size doesn't seem to be an issue, as we set the buffer size (rather inefficiently) via ARG_MAX, which depending on the OS version will be anywhere from hundreds of KBs to a MB, far in excess of what the average command line size will be. In addition, we've confirmed in at least one case that the process's arg size is well below ARG_MAX.

This leaves a permissions issue, which also doesn't make sense, as the process is trying to monitor itself, which rules out any kind of user-level permissions error. In addition, in some situations the process is running as root. This leaves us with (as far as I can tell) two options:

  • Some other permissions issue, involving something like SIP or sandboxing. In both of the known cases, Agent appears to be installed and managed by some kind tech like JAMF, so it's possible that there's some security config going on there.
  • Another edge case in the KERN_PROCARGS2 that I haven't figured out yet.

Note that I haven't been able to reproduce this error, despite a great deal of effort, which makes me suspect this is a fairly unusual edge case.

@fearful-symmetry fearful-symmetry added bug Something isn't working Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Jul 26, 2022
@fearful-symmetry fearful-symmetry changed the title Strange, inconsistant EINVAL errors from KERN_PROCARGS2 in beats self-monitoring Strange, inconsistent EINVAL errors from KERN_PROCARGS2 in beats self-monitoring Jul 26, 2022
@fearful-symmetry fearful-symmetry changed the title Strange, inconsistent EINVAL errors from KERN_PROCARGS2 in beats self-monitoring Inconsistent EINVAL errors from KERN_PROCARGS2 in beats self-monitoring on MacOS Jul 26, 2022
@cmacknz
Copy link
Member

cmacknz commented Jul 27, 2022

#46 is a way to suppress the errors, but does not fix the underlying problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

No branches or pull requests

2 participants