Improve check performance by filtering it's input before parsing #1875

xvello · 2018-07-12T15:29:15Z

This PR is a port of #1872 to master

What does this PR do?

The kubelet's /metrics/cadvisor payload contains statistics on all cgroups, including system slices that are of no use for the kubelet check.

The kubelet check currently filters these samples by looking of a non-empty container_name label. But this happens after we incurred the parsing, conversion and lookup costs.

On systems running a lot of system slices, this makes the kubelet check run in more than 15 seconds and use a lot of memory. One "pathological" host goes up to 40 seconds and > 1 GB memory used. 99,5 % of the kubelet payload is system slices.

This PR injects a simple text filtering component before the prometheus_client parsing logic, to remove these lines before incurring the parsing / conversion / lookup costs. For simplicity and performance, it is implemented as a list of blacklisted strings instead of regexp. If no blacklist is setup, the filtering logic is bypassed completely.

Average kubelet check run on this test payload goes from 47784ms down to 842ms. These is some CPU overhead to the filtering (with a pre-filtered payload, the check run time is ~500ms), but it is significantly amortised on even a few system slices. More info in this test notebook

On a "regular" host with 15 containers and a dozen system slices, the patch lowers the CPU usage, while keeping the memory usage constant.

We still need to optimise the processing pipeline to handle a large amount of containers, this PR does not address this.

This fix is backported on top of 6.3.2 on the datadog/agent-dev:xvello-kubelet-input-filter image (with it's jmx variant too)

Motivation

Make the kubelet check usable on hosts with lots of system slices.

Review checklist

PR has a meaningful title or PR has the no-changelog label attached
Feature or bugfix has tests
Git history is clean
~~- [ ] If PR impacts documentation, docs team has been notified or an issue has been opened on the documentation repo~~

* implement input payload filtering to increase kubelet check performance * add unit test

ofek · 2018-07-13T05:06:15Z

datadog_checks_base/datadog_checks/checks/prometheus/mixins.py

+        :output: generator of filtered lines
+        """
+        for line in input_gen:
+            for item in self._text_filter_blacklist:


How large might this blacklist get? You might be able to achieve increased perf if you compile it to regex then a simple compiled_re.search.

compiled_re = re.compile('|'.join(filters))

This is pretty drastic feature we should only use in case of obviously spammy content, not in place of the existing metric filtering. The kubelet check only needs one string, but I went with a list in case we'd need two or three (tops) in the future.

xvello added 2 commits July 12, 2018 17:26

Improve check performance by filtering it's input before parsing (#1872)

4236e81

* implement input payload filtering to increase kubelet check performance * add unit test

add another unit test

baba37d

xvello added integration/kubelet changelog/Fixed labels Jul 12, 2018

xvello requested a review from a team as a code owner July 12, 2018 15:29

ofek reviewed Jul 13, 2018

View reviewed changes

xvello added this to the 6.4.0 milestone Jul 16, 2018

nmuesch removed this from the 6.4.0 milestone Jul 16, 2018

nmuesch approved these changes Jul 16, 2018

View reviewed changes

xvello merged commit b39a0f2 into master Jul 16, 2018

xvello deleted the xvello/port-1872 branch July 16, 2018 14:22

CharlyF added the test-card-created label Jul 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve check performance by filtering it's input before parsing #1875

Improve check performance by filtering it's input before parsing #1875

xvello commented Jul 12, 2018 •

edited

Loading

ofek Jul 13, 2018 •

edited

Loading

xvello Jul 13, 2018

Improve check performance by filtering it's input before parsing #1875

Improve check performance by filtering it's input before parsing #1875

Conversation

xvello commented Jul 12, 2018 • edited Loading

What does this PR do?

Motivation

Review checklist

ofek Jul 13, 2018 • edited Loading

Choose a reason for hiding this comment

xvello Jul 13, 2018

Choose a reason for hiding this comment

xvello commented Jul 12, 2018 •

edited

Loading

ofek Jul 13, 2018 •

edited

Loading