Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[process] Improve sampling of cpu.pct #1928

Merged
merged 3 commits into from
Sep 18, 2015

Conversation

olivielpeau
Copy link
Member

See #1660

Improvements:

  • Non-blocking call to cpu_percent after the first sample
  • After the first sample, retrieves an averaged value since the last call (instead of an averaged value over the fraction of a second defined by cpu_check_interval)

The first sample of cpu.pct on every process still uses a blocking call so that the check can retrieve a value as soon as it finds new processes. Setting the cpu_check_interval to 0 removes that first blocking call.

Also fixed the handling of the cache of AccessDenied PIDs (it was previously refreshed at every run of every instance).

Improvements:
- Non-blocking call to `cpu_percent` after the first sample
- Retrieves an averaged value since the last call (instead of an
averaged value over the fraction of a second defined by
`cpu_check_interval`)

To use the non-blocking call we keep Process instances through
multiple check runs.

Also updated a test on the check, on which we stop using real PIDs
because the test was previously relying on the of `find_pids` to
return a list of identical PIDs, whereas `find_pids` actually returns
a set of PIDs.
@olivielpeau olivielpeau added this to the 5.6.0 milestone Sep 18, 2015
@yannmh yannmh self-assigned this Sep 18, 2015
@yannmh
Copy link
Member

yannmh commented Sep 18, 2015

Setting the cpu_check_interval to 0 removes that first blocking call.

As we discussed, I think that's the right thing to do.

@olivielpeau
Copy link
Member Author

Thanks @yannmh for the review!

I've implemented the changes you suggested (see commit message of my last commit: 1193212).

@olivielpeau olivielpeau force-pushed the olivielpeau/better-process-cpu-sample branch 2 times, most recently from 29f6d66 to 2b19032 Compare September 18, 2015 20:31
@yannmh
Copy link
Member

yannmh commented Sep 18, 2015

Nice improvement. Looks good to merge to me ! 🚢

The check was never setting `last_ad_cache_ts` so the cache
was being refreshed at every run of the check.

This fixes the issue and updates the ad cache by instance so that
every instance refreshes the cache (and not only the first one).
Implement suggestions on the PR:
1. the first sample of `cpu_percent` is not sent at all instead of
being retrieved in a blocking call. This allows the first run of the
check to be fast even when a lot of processes are monitored
2. as a consequence we drop the `cpu_check_interval` parameter as
it has become useless

Also, initialize self.process_cache with a defaultdict for simplicity.
@olivielpeau olivielpeau force-pushed the olivielpeau/better-process-cpu-sample branch from 2b19032 to 1193212 Compare September 18, 2015 21:02
@olivielpeau
Copy link
Member Author

Thanks again for the thorough review ;)

I've rebased the PR and changed the Exception type tested in test_ad_cache (per your comment), I'll merge once the CI passes.

@olivielpeau
Copy link
Member Author

The CI is failing on unrelated tests, merging.

olivielpeau added a commit that referenced this pull request Sep 18, 2015
…sample

[process] Improve sampling of cpu.pct
@olivielpeau olivielpeau merged commit 28d4c09 into master Sep 18, 2015
@olivielpeau olivielpeau deleted the olivielpeau/better-process-cpu-sample branch September 18, 2015 22:17
@remh
Copy link

remh commented Sep 21, 2015

👍

olivielpeau added a commit that referenced this pull request Sep 23, 2015
PR #1928 improved the sampling of `cpu.pct` but its value
is not accurate for short-lived processes (it was even less
accurate before that PR but it's still worth mentioning).
olivielpeau added a commit that referenced this pull request Sep 23, 2015
PR #1928 improved the sampling of `cpu.pct` but its value
is not accurate for short-lived processes (it was even less
accurate before that PR but it's still worth mentioning).

[skip ci]
urosgruber pushed a commit to urosgruber/dd-agent that referenced this pull request Dec 23, 2015
PR DataDog#1928 improved the sampling of `cpu.pct` but its value
is not accurate for short-lived processes (it was even less
accurate before that PR but it's still worth mentioning).

[skip ci]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants