-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Metricbeat k8s metadata sometimes not being present at startup #41216
Conversation
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
|
943e070
to
09a940b
Compare
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
09a940b
to
b0878d4
Compare
I took a look and the code changes look ok to me. The part of the enricher initialization code was the most tricky one when we implemented the shared watchers logic. Maybe we overcomplicated it. I am not sure still about the problem it tries to solve, as we had tested these scenarios a lot. |
delete(k8sMeta, "pod") | ||
} | ||
ecsMeta := meta.Clone() | ||
ecsMeta := meta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: not required since the Clone is now handled by the e.getMetadata(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just renaming the variable for clarity. Maps are reference types in Go, so this really just copies a pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that maps are built-in references ty 🙂 I have to admit that this is the first time I witness a new variable assignment in the same scope for clarity reasons; however, I have seen comments added or renaming the variable at the creation time, for clarity reasons
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The real meaning of this series of statements is "ecsMeta
is meta
without the kubernetes
key". It was clearer when we were actually cloning meta
, but we don't need to do that anymore. I can change it to only use meta
if you think this is confusing.
Sure @swiatekm , I did not have time to reproduce tbh. But as long I understood the visualisation and why the issue was raised I am ok |
…41216) (#41284) * Fix Metricbeat k8s metadata sometimes not being present at startup * Clone the whole metadata map when fetching (cherry picked from commit 4e62fa5) Co-authored-by: Mikołaj Świątek <[email protected]>
…41216) (#41285) * Fix Metricbeat k8s metadata sometimes not being present at startup * Clone the whole metadata map when fetching (cherry picked from commit 4e62fa5) Co-authored-by: Mikołaj Świątek <[email protected]>
…ng present at startup (#41283) * Fix Metricbeat k8s metadata sometimes not being present at startup (#41216) * Fix Metricbeat k8s metadata sometimes not being present at startup * Clone the whole metadata map when fetching (cherry picked from commit 4e62fa5) * Update CHANGELOG.next.asciidoc --------- Co-authored-by: Mikołaj Świątek <[email protected]>
…lastic#41216) * Fix Metricbeat k8s metadata sometimes not being present at startup * Clone the whole metadata map when fetching
Proposed commit message
Fix Metricbeat k8s metadata sometimes not being present at startup
In the metricbeat k8s module, each metricset has an enricher, which is used to add k8s metadata to events. Enrichers use watchers to watch k8s resource changes, and these watchers can be shared between enrichers. This is implemented by each enricher maintaining a writethrough cache of metadata - when a K8s event indicating a change to a resource is received, the metadata cache is updated.
The problem with this is that a watcher is only started once, so enrichers which are created after it was started, have to catch up on the resource change events. This is a great opportunity for race conditions, and is in general unnecessarily complicated.
Instead, we turn the writethrough cache into a readthrough cache. The change event handlers now only invalidate the cache, and the enricher computes the metadata when it is first requested. This completely sidesteps the catch up problem, as we only compute the metadata lazily.
I've also refactored the code a bit:
TestBuildMetadataEnricher_EventHandler_PastObjects
, as it doesn't seem to check anything not covered by other tests.createWatcher
function, so actual watcher creation (as opposed to skipping because it's already there) happens on the main path.Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.How to test this PR locally
Install elastic-agent in K8s using the standalone manifests and track the metadata.
Related issues