Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Metricbeat k8s metadata sometimes not being present at startup #41216

Merged
merged 2 commits into from
Oct 17, 2024

Conversation

swiatekm
Copy link
Contributor

@swiatekm swiatekm commented Oct 14, 2024

Proposed commit message

Fix Metricbeat k8s metadata sometimes not being present at startup

In the metricbeat k8s module, each metricset has an enricher, which is used to add k8s metadata to events. Enrichers use watchers to watch k8s resource changes, and these watchers can be shared between enrichers. This is implemented by each enricher maintaining a writethrough cache of metadata - when a K8s event indicating a change to a resource is received, the metadata cache is updated.

The problem with this is that a watcher is only started once, so enrichers which are created after it was started, have to catch up on the resource change events. This is a great opportunity for race conditions, and is in general unnecessarily complicated.

Instead, we turn the writethrough cache into a readthrough cache. The change event handlers now only invalidate the cache, and the enricher computes the metadata when it is first requested. This completely sidesteps the catch up problem, as we only compute the metadata lazily.

I've also refactored the code a bit:

  • Resource event handlers are now added to a watcher when it is created, as opposed to when an enricher is created. They're always the same, so there's no reason to do it multiple times.
  • Removed TestBuildMetadataEnricher_EventHandler_PastObjects, as it doesn't seem to check anything not covered by other tests.
  • Rearranged the createWatcher function, so actual watcher creation (as opposed to skipping because it's already there) happens on the main path.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Install elastic-agent in K8s using the standalone manifests and track the metadata.

Related issues

@swiatekm swiatekm added the bug label Oct 14, 2024
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Oct 14, 2024
@swiatekm swiatekm added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Oct 14, 2024
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Oct 14, 2024
Copy link
Contributor

mergify bot commented Oct 14, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @swiatekm? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Oct 14, 2024

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Oct 14, 2024
@swiatekm swiatekm added the backport-8.15 Automated backport to the 8.15 branch with mergify label Oct 14, 2024
@swiatekm swiatekm force-pushed the fix/metricbeats-metadata branch 3 times, most recently from 943e070 to 09a940b Compare October 15, 2024 09:09
@swiatekm swiatekm marked this pull request as ready for review October 15, 2024 09:59
@swiatekm swiatekm requested a review from a team as a code owner October 15, 2024 09:59
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@pierrehilbert pierrehilbert added the Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team label Oct 15, 2024
@swiatekm swiatekm force-pushed the fix/metricbeats-metadata branch from 09a940b to b0878d4 Compare October 15, 2024 11:19
@MichaelKatsoulis
Copy link
Contributor

I took a look and the code changes look ok to me. The part of the enricher initialization code was the most tricky one when we implemented the shared watchers logic. Maybe we overcomplicated it.

I am not sure still about the problem it tries to solve, as we had tested these scenarios a lot.
But this new approach looks simpler.

delete(k8sMeta, "pod")
}
ecsMeta := meta.Clone()
ecsMeta := meta
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not required since the Clone is now handled by the e.getMetadata(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just renaming the variable for clarity. Maps are reference types in Go, so this really just copies a pointer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that maps are built-in references ty 🙂 I have to admit that this is the first time I witness a new variable assignment in the same scope for clarity reasons; however, I have seen comments added or renaming the variable at the creation time, for clarity reasons

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real meaning of this series of statements is "ecsMeta is meta without the kubernetes key". It was clearer when we were actually cloning meta, but we don't need to do that anymore. I can change it to only use meta if you think this is confusing.

@swiatekm
Copy link
Contributor Author

@gizas are you ok with me merging this? I know you wanted to try reproducing #41213 first.

@gizas
Copy link
Contributor

gizas commented Oct 17, 2024

Sure @swiatekm , I did not have time to reproduce tbh. But as long I understood the visualisation and why the issue was raised I am ok

@swiatekm swiatekm merged commit 4e62fa5 into main Oct 17, 2024
38 checks passed
@swiatekm swiatekm deleted the fix/metricbeats-metadata branch October 17, 2024 13:41
@swiatekm swiatekm added the backport-8.16 Automated backport with mergify label Oct 17, 2024
mergify bot pushed a commit that referenced this pull request Oct 17, 2024
…41216)

* Fix Metricbeat k8s metadata sometimes not being present at startup

* Clone the whole metadata map when fetching

(cherry picked from commit 4e62fa5)
mergify bot pushed a commit that referenced this pull request Oct 17, 2024
…41216)

* Fix Metricbeat k8s metadata sometimes not being present at startup

* Clone the whole metadata map when fetching

(cherry picked from commit 4e62fa5)
mergify bot pushed a commit that referenced this pull request Oct 17, 2024
…41216)

* Fix Metricbeat k8s metadata sometimes not being present at startup

* Clone the whole metadata map when fetching

(cherry picked from commit 4e62fa5)
pierrehilbert pushed a commit that referenced this pull request Oct 17, 2024
…41216) (#41284)

* Fix Metricbeat k8s metadata sometimes not being present at startup

* Clone the whole metadata map when fetching

(cherry picked from commit 4e62fa5)

Co-authored-by: Mikołaj Świątek <[email protected]>
MichaelKatsoulis pushed a commit that referenced this pull request Oct 18, 2024
…41216) (#41285)

* Fix Metricbeat k8s metadata sometimes not being present at startup

* Clone the whole metadata map when fetching

(cherry picked from commit 4e62fa5)

Co-authored-by: Mikołaj Świątek <[email protected]>
swiatekm added a commit that referenced this pull request Oct 18, 2024
…ng present at startup (#41283)

* Fix Metricbeat k8s metadata sometimes not being present at startup (#41216)

* Fix Metricbeat k8s metadata sometimes not being present at startup

* Clone the whole metadata map when fetching

(cherry picked from commit 4e62fa5)

* Update CHANGELOG.next.asciidoc

---------

Co-authored-by: Mikołaj Świątek <[email protected]>
belimawr pushed a commit to belimawr/beats that referenced this pull request Oct 18, 2024
…lastic#41216)

* Fix Metricbeat k8s metadata sometimes not being present at startup

* Clone the whole metadata map when fetching
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify backport-8.15 Automated backport to the 8.15 branch with mergify backport-8.16 Automated backport with mergify bug Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

K8s metadata for metricbeat Kubernetes module missing at startup
6 participants