Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more "core" Linux metrics #2030

Closed
wants to merge 4 commits into from

Conversation

gphat
Copy link
Contributor

@gphat gphat commented Nov 2, 2015

What's this PR do?

Adds some new system metrics:

  • context switch count (for rates!)
  • processes created count (for rates!)
  • interrupts
  • counts of processes by state, with tags
  • counts of processes by priority, with tags

Motivation

These metrics are used in our existing setup and, from what I can tell, are not available today in Datadog agent. We can get them from specific processes via process.yaml, but not for the whole system!

Notes

I originally tried to add this stuff to unix.py but I'm unable to determine how those metrics end up as things like sys.mem.free and the like. So I settled on making a check that added the metrics I as after. I expect to add more.

Most of these metrics feel very "core" to Linux and therefore seem suitable to be included in something like unix.py. But for now I'll opt to take a conservative approach and assume this will just be a plugin. If you are not interested in having it in the agent, we'll just use it locally.

Lemme know and thanks for looking!

@irabinovitch
Copy link
Contributor

@gphat Apologies for the delay here. We are mid prepping for the 5.6 release. We will get back to you shortly with feedback on the change. In the mean time would you mind having a gander at the failing tests? It looks like at least some of them are just style issues that should be quick to address.

./checks/system/unix.py:580:55: F812 list comprehension redefines 'line' from line 572

Thank you for being part of the Datadog community!

@gphat
Copy link
Contributor Author

gphat commented Nov 5, 2015

Done! Fyi, that change was in existing code, not in something I touched. But a fix nonetheless!

@gphat gphat changed the title Add context switching and process count. Add more "core" Linux metrics Nov 6, 2015
@gphat gphat force-pushed the add-some-new-system-metrics branch from 4bf0672 to 734465b Compare November 6, 2015 18:44
@remh remh added this to the 5.7.0 milestone Nov 19, 2015
@gphat gphat force-pushed the add-some-new-system-metrics branch from 9e82180 to 77b488a Compare November 24, 2015 17:48
@remh remh self-assigned this Dec 22, 2015
init_config:

instances:
- tags: []
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some comment here to say that the configuration it doesn't need any specific parameters ?

@gphat
Copy link
Contributor Author

gphat commented Dec 22, 2015

Done! Waiting on the linux question above and ready to go.

tags = instance.get('tags', [])

state_counts = collections.Counter(
uninterruptible=0,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The good thing about the Counter is that you don't need to initialize your keys at 0 :)

state_counts = Counter()

would give you the same result

@gphat
Copy link
Contributor Author

gphat commented Dec 22, 2015

Fixed! :)

@remh
Copy link

remh commented Dec 22, 2015

Thanks Cory.
I think the metrics should be named system.linux
Can you squash your commits as well please ?
Thanks!

@gphat gphat force-pushed the add-some-new-system-metrics branch from 15c2bf9 to feaa9fd Compare December 22, 2015 22:32
@gphat
Copy link
Contributor Author

gphat commented Dec 22, 2015

Squashed! Omg I just learned how to squash merge when I've already done merges in to the branch. It's an early Christmas!

from checks import AgentCheck
from utils.subprocess_output import get_subprocess_output
from collections import Counter
import subprocess
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used anymore btw :)

@gphat gphat force-pushed the add-some-new-system-metrics branch from 6ba6645 to 17ba278 Compare December 22, 2015 22:45
@gphat
Copy link
Contributor Author

gphat commented Dec 22, 2015

Fixed and rebased for brevity.

def check(self, instance):
tags = instance.get('tags', [])

state_counts = Counter()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know i'm the one who told you to use a Counter but i forgot that Counter were not available on python 2.6. Can you use a defaultdict(int) instead ?

@remh
Copy link

remh commented Jan 5, 2016

Thanks @gphat ! Can you rebase against the latest version of master again please ? It should fix the zookeeper failing test.

Also could you write a test for this integration please ? It should be pretty straigthforward.

@gphat
Copy link
Contributor Author

gphat commented Jan 7, 2016

Hey @remh, does all that look right? I'm still getting failures, but they seem to be elsewhere. I will clean up the commits once you think it's ok.

@gphat
Copy link
Contributor Author

gphat commented Jan 12, 2016

Friendly ping, @remh!

@remh
Copy link

remh commented Jan 14, 2016

Looks good to me @gphat ! Can you squash the commits and then i'll merge

@gphat gphat closed this Jan 14, 2016
@gphat gphat deleted the add-some-new-system-metrics branch January 14, 2016 19:13
@gphat
Copy link
Contributor Author

gphat commented Jan 14, 2016

See #2202, I broke it when squash merging. Ugh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants