Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve check performance by filtering it's input before parsing #1875

Merged
merged 2 commits into from
Jul 16, 2018

Conversation

xvello
Copy link
Contributor

@xvello xvello commented Jul 12, 2018

This PR is a port of #1872 to master

What does this PR do?

The kubelet's /metrics/cadvisor payload contains statistics on all cgroups, including system slices that are of no use for the kubelet check.

The kubelet check currently filters these samples by looking of a non-empty container_name label. But this happens after we incurred the parsing, conversion and lookup costs.

On systems running a lot of system slices, this makes the kubelet check run in more than 15 seconds and use a lot of memory. One "pathological" host goes up to 40 seconds and > 1 GB memory used. 99,5 % of the kubelet payload is system slices.

This PR injects a simple text filtering component before the prometheus_client parsing logic, to remove these lines before incurring the parsing / conversion / lookup costs. For simplicity and performance, it is implemented as a list of blacklisted strings instead of regexp. If no blacklist is setup, the filtering logic is bypassed completely.

Average kubelet check run on this test payload goes from 47784ms down to 842ms. These is some CPU overhead to the filtering (with a pre-filtered payload, the check run time is ~500ms), but it is significantly amortised on even a few system slices. More info in this test notebook

On a "regular" host with 15 containers and a dozen system slices, the patch lowers the CPU usage, while keeping the memory usage constant.

We still need to optimise the processing pipeline to handle a large amount of containers, this PR does not address this.

This fix is backported on top of 6.3.2 on the datadog/agent-dev:xvello-kubelet-input-filter image (with it's jmx variant too)

Motivation

Make the kubelet check usable on hosts with lots of system slices.

Review checklist

  • PR has a meaningful title or PR has the no-changelog label attached
  • Feature or bugfix has tests
  • Git history is clean
    - [ ] If PR impacts documentation, docs team has been notified or an issue has been opened on the documentation repo

xvello added 2 commits July 12, 2018 17:26
* implement input payload filtering to increase kubelet check performance

* add unit test
:output: generator of filtered lines
"""
for line in input_gen:
for item in self._text_filter_blacklist:
Copy link
Contributor

@ofek ofek Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How large might this blacklist get? You might be able to achieve increased perf if you compile it to regex then a simple compiled_re.search.

compiled_re = re.compile('|'.join(filters))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty drastic feature we should only use in case of obviously spammy content, not in place of the existing metric filtering. The kubelet check only needs one string, but I went with a list in case we'd need two or three (tops) in the future.

@xvello xvello added this to the 6.4.0 milestone Jul 16, 2018
@nmuesch nmuesch removed this from the 6.4.0 milestone Jul 16, 2018
@xvello xvello merged commit b39a0f2 into master Jul 16, 2018
@xvello xvello deleted the xvello/port-1872 branch July 16, 2018 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants