-
Notifications
You must be signed in to change notification settings - Fork 814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[process] prettify & cache AccessDenied failures #1595
Conversation
b77e27b
to
fb811e9
Compare
Nice cleanup! We should probably cache the pid list too, i'm not sure it's useful to refresh it at every iteration. According to the dev mode ( 👍 @talwai ) this is by far the most intensive operation of the check. We could maybe cache it for a few iterations ? |
@remh it's a bit hard to make this design choice, where people could potentially put no-data monitors on top of those metrics. Or maybe hide it behind a feature flag? |
fb811e9
to
ed86f0a
Compare
+1 for the feature flag / a TTL parameter for the pid list. |
@remh implemented, let me know what you think? |
self.last_pid_cache_ts = {} | ||
self.pid_cache = {} | ||
self.pid_cache_duration = int( | ||
init_config.get( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't that be an instance parameter instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in most cases, this shouldn't be changed so I don't think it's worth the trouble for an instance parameter.
Refactor a bit this check, two noticeable changes: 1. on Windows especially, we get randomly some `AccessDenied` exceptions when trying to look at some processes. Because we iterate on all the processes at every run, it would trigger as many failures *every run*. Once we get this error and `ignore_denied_access` is set to true, we'll not look at it again until DEFAULT_AD_CACHE_DURATION or `access_denied_cache_duration` in the init config. 2. instead of doing some sketchy try/catch over the code, all the psutil-related code that actually gets metric values is wrapped into `psutil_wrapper` which knows how to handle exceptions and known failures. Last remark: before we were setting some metrics to 0 even if we had no values parsed, this is not the case anymore.
The new init_config `pid_cache_duration` is the period during which the agent won't try to refresh the "matching" PID list for each instance. It avoids having to iterate over all the processes every run, the cache is invalidated when: * the deadline expire * or one of the cached process can't be read anymore One would want to lower the cache deadline if he wants to alert on no-data.
8166577
to
cc5a67b
Compare
Changes Unknown when pulling cc5a67b on leo/processcheck into ** on master**. |
Refactor a bit this check, two noticeable changes:
AccessDenied
exceptionswhen trying to look at some processes. Because we iterate on all the
processes at every run, it would trigger as many failures every run.
Once we get this error and
ignore_denied_access
is set to true, we'llnot look at it again until DEFAULT_AD_CACHE_DURATION or
access_denied_cache_duration
in the init config.psutil-related code that actually gets metric values is wrapped into
psutil_wrapper
which knows how to handle exceptions and knownfailures.
Last remark: before we were setting some metrics to 0 even if we had no
values parsed, this is not the case anymore.