feat(languagedetection): use `dd_language_detected` if available #33711

stanistan · 2025-02-04T18:14:48Z

What does this PR do?

This PR adds a privileged detector that uses a file set in memfd if it is available. The injection mechanism will write a file to memfd if a language is detected, and the file will be in dd_language_detected source.

Motivation

As we mature kubernetes auto-instrumentation, we want to leverage language detection to minimize the number of init containers we load at pod startup time. While having this mechanism might not help the first time the process is running, but will help on subsequent pods.

Describe how you validated your changes

apm-inject
- this version (ghcr.io/datadog/apm-inject:6babf1ba57cd2b1ca5943c99b5eab9ed653529a6) supports writes to memfd
- this version (ghcr.io/datadog/apm-inject:0.29.0) does not

I'm using https://github.com/DataDog/k8s-ssi-v2-testing to set up two deployments for an application and the built image from this PR.

agent: https://gitlab.ddbuild.io/DataDog/datadog-agent/-/jobs/793848449
cluster-agent: https://gitlab.ddbuild.io/DataDog/datadog-agent/-/jobs/793848468
Both re-published here and here.

Manual

On an agent container we can run this script which looks for the memfd file and makes sure that it matches what's in the workload-list.

# get a process with the memfd associated with it
filename=$((ls -lRa /proc/*/fd/* 2>/dev/null) | grep 'memfd:dd_language_detected' | awk '{ print $9 }')
pid=$(echo $filename | awk -F/ '{ print $3 }')

# output the written file as well as what's in workload
agent workload-list | grep $pid -A4
echo "/memfd:dd_language_detected: $(cat $filename)"

Automated

added unit tests for memfd behavior when its enabled.

Testing Log

(ls -lRa /proc/*/fd/* 2>/dev/null) | grep 'memfd:dd_language_detected' | awk '{ print $9 }' | xargs cat

This means the file is being written, and without any lang detection we're not doing anything at the deployment.

We can also find the process in workloads being correctly tagged.

Then the deployment gets annotated.

delete the pod, the next one has only the js container.

By default the js app uses node which will match langdetection outside of memfd, but we can run something else cp $(which node) app-runner && ./app-runner index.js. And this should give us the same behavior.

After turning on system-probe and language-detection by setting up env vars manually on the helm chart: ✅

Additional Notes

There is test coverage for different languages having their language_detection memfd files present in the auto-inject repository.
This factors out memfd code to util/kernel since it's also used in servicediscovery (even though auto-inject has not wired this up yet).

agent-platform-auto-pr · 2025-02-04T19:07:40Z

Test changes on VM

Use this command from test-infra-definitions to manually test this PR changes on a VM:

inv aws.create-vm --pipeline-id=55192587 --os-family=ubuntu

Note: This applies to commit 54cea15

agent-platform-auto-pr · 2025-02-04T19:08:39Z

Uncompressed package size comparison

Comparison with ancestor c741f9294366fdf079a225799173e81b1d521365

Diff per package

package	diff	status	size	ancestor	threshold
datadog-agent-amd64-deb	0.01MB	⚠️	873.46MB	873.45MB	0.50MB
datadog-agent-x86_64-rpm	0.01MB	⚠️	883.20MB	883.19MB	0.50MB
datadog-agent-x86_64-suse	0.01MB	⚠️	883.20MB	883.19MB	0.50MB
datadog-iot-agent-arm64-deb	0.00MB	✅	82.50MB	82.50MB	0.50MB
datadog-iot-agent-aarch64-rpm	0.00MB	✅	82.57MB	82.57MB	0.50MB
datadog-iot-agent-x86_64-rpm	0.00MB	✅	86.30MB	86.30MB	0.50MB
datadog-iot-agent-x86_64-suse	0.00MB	✅	86.30MB	86.30MB	0.50MB
datadog-agent-arm64-deb	0.00MB	✅	861.34MB	861.34MB	0.50MB
datadog-dogstatsd-amd64-deb	0.00MB	✅	59.09MB	59.09MB	0.50MB
datadog-dogstatsd-x86_64-rpm	0.00MB	✅	59.16MB	59.16MB	0.50MB
datadog-dogstatsd-x86_64-suse	0.00MB	✅	59.16MB	59.16MB	0.50MB
datadog-dogstatsd-arm64-deb	0.00MB	✅	56.56MB	56.56MB	0.50MB
datadog-heroku-agent-amd64-deb	0.00MB	✅	445.70MB	445.70MB	0.50MB
datadog-iot-agent-amd64-deb	0.00MB	✅	86.23MB	86.23MB	0.50MB
datadog-agent-aarch64-rpm	-0.00MB	✅	871.06MB	871.06MB	0.50MB

Decision

⚠️ Warning

cit-pr-commenter · 2025-02-04T19:36:01Z

Regression Detector

Regression Detector Results

Metrics dashboard
Target profiles
Run ID: 5f00fc23-1fd8-4f67-9914-1fbdcb2e9003

Baseline: c741f92
Comparison: 54cea15
Diff

Optimization Goals: ✅ No significant changes detected

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	quality_gate_logs	% cpu utilization	+3.12	[+0.03, +6.21]	1	Logs
➖	quality_gate_idle	memory utilization	+0.40	[+0.36, +0.43]	1	Logs bounds checks dashboard
➖	file_tree	memory utilization	+0.30	[+0.23, +0.36]	1	Logs
➖	tcp_syslog_to_blackhole	ingress throughput	+0.17	[+0.09, +0.24]	1	Logs
➖	quality_gate_idle_all_features	memory utilization	+0.16	[+0.08, +0.24]	1	Logs bounds checks dashboard
➖	file_to_blackhole_0ms_latency_http1	egress throughput	+0.10	[-0.75, +0.94]	1	Logs
➖	file_to_blackhole_500ms_latency	egress throughput	+0.08	[-0.70, +0.87]	1	Logs
➖	file_to_blackhole_100ms_latency	egress throughput	+0.06	[-0.72, +0.85]	1	Logs
➖	file_to_blackhole_0ms_latency_http2	egress throughput	+0.05	[-0.80, +0.89]	1	Logs
➖	file_to_blackhole_300ms_latency	egress throughput	+0.04	[-0.60, +0.68]	1	Logs
➖	file_to_blackhole_0ms_latency	egress throughput	+0.01	[-0.84, +0.86]	1	Logs
➖	tcp_dd_logs_filter_exclude	ingress throughput	+0.00	[-0.01, +0.02]	1	Logs
➖	uds_dogstatsd_to_api	ingress throughput	-0.02	[-0.30, +0.27]	1	Logs
➖	uds_dogstatsd_to_api_cpu	% cpu utilization	-0.07	[-0.94, +0.79]	1	Logs
➖	file_to_blackhole_1000ms_latency_linear_load	egress throughput	-0.13	[-0.60, +0.34]	1	Logs
➖	file_to_blackhole_1000ms_latency	egress throughput	-1.01	[-1.79, -0.23]	1	Logs

Bounds Checks: ✅ Passed

perf	experiment	bounds_check_name	replicates_passed	links
✅	file_to_blackhole_0ms_latency	lost_bytes	10/10
✅	file_to_blackhole_0ms_latency	memory_usage	10/10
✅	file_to_blackhole_0ms_latency_http1	lost_bytes	10/10
✅	file_to_blackhole_0ms_latency_http1	memory_usage	10/10
✅	file_to_blackhole_0ms_latency_http2	lost_bytes	10/10
✅	file_to_blackhole_0ms_latency_http2	memory_usage	10/10
✅	file_to_blackhole_1000ms_latency	memory_usage	10/10
✅	file_to_blackhole_1000ms_latency_linear_load	memory_usage	10/10
✅	file_to_blackhole_100ms_latency	lost_bytes	10/10
✅	file_to_blackhole_100ms_latency	memory_usage	10/10
✅	file_to_blackhole_300ms_latency	lost_bytes	10/10
✅	file_to_blackhole_300ms_latency	memory_usage	10/10
✅	file_to_blackhole_500ms_latency	lost_bytes	10/10
✅	file_to_blackhole_500ms_latency	memory_usage	10/10
✅	quality_gate_idle	intake_connections	10/10	bounds checks dashboard
✅	quality_gate_idle	memory_usage	10/10	bounds checks dashboard
✅	quality_gate_idle_all_features	intake_connections	10/10	bounds checks dashboard
✅	quality_gate_idle_all_features	memory_usage	10/10	bounds checks dashboard
✅	quality_gate_logs	intake_connections	10/10
✅	quality_gate_logs	lost_bytes	10/10
✅	quality_gate_logs	memory_usage	10/10

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

CI Pass/Fail Decision

✅ Passed. All Quality Gates passed.

quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.

agent-platform-auto-pr · 2025-02-05T13:10:54Z

Static quality checks ✅

Please find below the results from static quality gates

Info

Result	Quality gate	On disk size	On disk size limit	On wire size	On wire size limit
✅	static_quality_gate_agent_deb_amd64	844.72MiB	858.45MiB	203.56MiB	214.3MiB
✅	static_quality_gate_docker_agent_amd64	929.12MiB	942.69MiB	310.66MiB	321.56MiB

vitkyrka · 2025-02-05T13:25:21Z

system-probe is not enabled by default, you have to be doing discovery, or network monitoring, etc for this to be on, so we're not getting as much from this as I had hoped.
not only that, but there is no option to turn on system-probe language-detection that is enabled for users, so this change is basically dark.

When service discovery is enabled (it's on when USM is enabled currently), it will do language detection in system-probe so this code will be exercised in that case.

pkg/languagedetection/internal/detectors/injector.go

pkg/languagedetection/detector.go

pkg/languagedetection/internal/detectors/injector.go

vitkyrka · 2025-02-05T13:20:53Z

pkg/languagedetection/internal/detectors/injector.go

+	fdsPath := kernel.HostProc(strconv.Itoa(pid), "fd")
+	// quick path, the shadow file is the first opened file by the process
+	// unless there are inherited fds
+	path := filepath.Join(fdsPath, "3")


Is the dd_process_inject_info going to be implemented? If so then both that and the language memfd won't have fd 3 right? (If dd_process_inject_info is not going to be implemented we can get rid of that code in servicediscovery and don't have to worry about code duplication.)

I think the language file will be first, and then we will add the other one, the implementation will work either way, and I was thinking that we could add a "fd hint" in the future for optimizations if needed.

drichards-87

Left a very small suggestion from Docs and approved the PR.

releasenotes/notes/injector-language-detection-e2691e28c6273286.yaml

…6.yaml Co-authored-by: DeForest Richards <[email protected]>

pkg/util/kernel/proc.go

This commit resolves an issue where reading a file of exactly the max size would be an error condition. It does so by reading an extra byte and sending an error if the size is larger then the max size.

betterengineering · 2025-02-07T14:22:43Z

/merge

dd-devflow · 2025-02-07T14:22:50Z

Devflow running: `/merge`

View all feedbacks in Devflow UI.

2025-02-07 14:22:50 UTC ℹ️ MergeQueue: pull request added to the queue

The median merge time in main is 30m.

2025-02-07 14:50:22 UTC ℹ️ MergeQueue: This merge request was merged

stanistan requested review from a team as code owners February 4, 2025 18:14

github-actions bot added team/container-intake fka Processes team/agent-discovery medium review PR review might take time labels Feb 4, 2025

stanistan added the qa/rc-required Only for a PR that requires validation on the Release Candidate label Feb 4, 2025

stanistan changed the title ~~feat(languagedetection): use dd-injector-detected-langauge if available.~~ feat(languagedetection): use dd_language_detected if available Feb 4, 2025

stanistan added 2 commits February 5, 2025 07:42

feat(languagedetection): use dd-injector-detected-langauge if available.

562d40b

add package docs

8bfb717

stanistan force-pushed the stanistan/INPLAT-464/propagate-injector-detected-language branch from ef7499c to 8bfb717 Compare February 5, 2025 12:42

vitkyrka reviewed Feb 5, 2025

View reviewed changes

stanistan added 2 commits February 5, 2025 10:23

remove unrelated changes

64db0ee

move reading memfd file to util/kernel

019519c

stanistan requested a review from a team as a code owner February 5, 2025 16:27

github-actions bot added component/system-probe long review PR is complex, plan time to review it and removed medium review PR review might take time labels Feb 5, 2025

stanistan added 4 commits February 5, 2025 11:42

fix lint

50bfed1

w/ tests for injector language detection

6d2a24d

fix lint

8c16be8

with release notes

a511fb4

stanistan requested a review from a team as a code owner February 5, 2025 19:07

drichards-87 approved these changes Feb 5, 2025

View reviewed changes

releasenotes/notes/injector-language-detection-e2691e28c6273286.yaml Outdated Show resolved Hide resolved

Update releasenotes/notes/injector-language-detection-e2691e28c627328…

cc7a631

…6.yaml Co-authored-by: DeForest Richards <[email protected]>

vitkyrka approved these changes Feb 6, 2025

View reviewed changes

pkg/util/kernel/proc.go Outdated Show resolved Hide resolved

brycekahle reviewed Feb 6, 2025

View reviewed changes

pkg/util/kernel/proc.go Outdated Show resolved Hide resolved

fix bug in mamFdMaxSize

4c00f3c

This commit resolves an issue where reading a file of exactly the max size would be an error condition. It does so by reading an extra byte and sending an error if the size is larger then the max size.

betterengineering requested a review from brycekahle February 6, 2025 22:11

Remove commented out log line

54cea15

brycekahle approved these changes Feb 6, 2025

View reviewed changes

wiyu approved these changes Feb 7, 2025

View reviewed changes

dd-mergequeue bot merged commit 583ca6c into main Feb 7, 2025
306 of 307 checks passed

dd-mergequeue bot deleted the stanistan/INPLAT-464/propagate-injector-detected-language branch February 7, 2025 14:50

github-actions bot added this to the 7.64.0 milestone Feb 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(languagedetection): use `dd_language_detected` if available #33711

feat(languagedetection): use `dd_language_detected` if available #33711

stanistan commented Feb 4, 2025 •

edited

Loading

agent-platform-auto-pr bot commented Feb 4, 2025 •

edited

Loading

agent-platform-auto-pr bot commented Feb 4, 2025 •

edited

Loading

cit-pr-commenter bot commented Feb 4, 2025 •

edited

Loading

Fine details of change detection per experiment

Bounds Checks: ✅ Passed

Explanation

agent-platform-auto-pr bot commented Feb 5, 2025 •

edited

Loading

vitkyrka commented Feb 5, 2025

vitkyrka Feb 5, 2025

stanistan Feb 5, 2025

drichards-87 left a comment

betterengineering commented Feb 7, 2025

dd-devflow bot commented Feb 7, 2025 •

edited

Loading

feat(languagedetection): use dd_language_detected if available #33711

feat(languagedetection): use dd_language_detected if available #33711

Conversation

stanistan commented Feb 4, 2025 • edited Loading

What does this PR do?

Motivation

Describe how you validated your changes

Manual

Automated

Testing Log

Additional Notes

agent-platform-auto-pr bot commented Feb 4, 2025 • edited Loading

Test changes on VM

agent-platform-auto-pr bot commented Feb 4, 2025 • edited Loading

Uncompressed package size comparison

Decision

cit-pr-commenter bot commented Feb 4, 2025 • edited Loading

Regression Detector

Regression Detector Results

Optimization Goals: ✅ No significant changes detected

Fine details of change detection per experiment

Bounds Checks: ✅ Passed

Explanation

CI Pass/Fail Decision

agent-platform-auto-pr bot commented Feb 5, 2025 • edited Loading

Static quality checks ✅

Info

vitkyrka commented Feb 5, 2025

vitkyrka Feb 5, 2025

Choose a reason for hiding this comment

stanistan Feb 5, 2025

Choose a reason for hiding this comment

drichards-87 left a comment

Choose a reason for hiding this comment

betterengineering commented Feb 7, 2025

dd-devflow bot commented Feb 7, 2025 • edited Loading

Devflow running: /merge

feat(languagedetection): use `dd_language_detected` if available #33711

feat(languagedetection): use `dd_language_detected` if available #33711

stanistan commented Feb 4, 2025 •

edited

Loading

agent-platform-auto-pr bot commented Feb 4, 2025 •

edited

Loading

agent-platform-auto-pr bot commented Feb 4, 2025 •

edited

Loading

cit-pr-commenter bot commented Feb 4, 2025 •

edited

Loading

agent-platform-auto-pr bot commented Feb 5, 2025 •

edited

Loading

dd-devflow bot commented Feb 7, 2025 •

edited

Loading

Devflow running: `/merge`