Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix panic in the metrics-generator when using multiple tenants with default overrides #2786

Merged
merged 4 commits into from
Aug 17, 2023

Conversation

yvrhdn
Copy link
Member

@yvrhdn yvrhdn commented Aug 11, 2023

What this PR does:
We have a concurrent read/write panic while reading from desiredProcessors, this is the only place we modify it really. Let's copy it before modifying it.

Which issue(s) this PR fixes:
Fixes #2785

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Sorry, something went wrong.

@@ -151,8 +152,11 @@ func (i *instance) updateSubprocessors(desiredProcessors map[string]struct{}, de
_, latencyOk := desiredProcessors[spanmetrics.Latency.String()]
_, sizeOk := desiredProcessors[spanmetrics.Size.String()]

newDesiredProcessors := map[string]struct{}{}
maps.Copy(newDesiredProcessors, desiredProcessors)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if desiredProcessors was capable of racing in the previous code, I don't see how this fixes it. it could still race here while copying to newDesiredProcessors

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only reads the map, afaik concurrent map reads are fine. Later in this function we would modify desiredProcessors but with this change we only modify the copy newDesiredProcessors.

Copy link
Member

@joe-elliott joe-elliott Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. A little googling and a quick test confirms what you're saying. If multiple goroutines have access to this map though there is the threat in the future of someone writing to it in a different goroutine and breaking this assumption.

I'm fine with the fix, but let's add a comment explaining why we are copying the map and what our expectations are for other goroutines holding a reference to the map.

…ocessors calls
@yvrhdn yvrhdn force-pushed the kvrhdn/metrics-generator-panic branch from 11c95df to 32bd393 Compare August 16, 2023 13:40
@yvrhdn
Copy link
Member Author

yvrhdn commented Aug 16, 2023

The modified test fails on a data race ☝🏻

@yvrhdn yvrhdn marked this pull request as ready for review August 16, 2023 14:06
@yvrhdn yvrhdn force-pushed the kvrhdn/metrics-generator-panic branch from a3c5661 to f956733 Compare August 16, 2023 14:07
@joe-elliott joe-elliott added type/bug Something isn't working backport release-v2.2 labels Aug 17, 2023
@yvrhdn yvrhdn changed the title Copy desiredProcessors before configuring subprocessors to avoid concurrent read and writes Fix panic in the metrics-generator when using multiple tenants with default overrides Aug 17, 2023
@yvrhdn yvrhdn enabled auto-merge (squash) August 17, 2023 14:13
@yvrhdn yvrhdn merged commit 9af6bb2 into grafana:main Aug 17, 2023
@github-actions
Copy link
Contributor

The backport to release-v2.2 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new branch
git switch --create backport-2786-to-release-v2.2 origin/release-v2.2
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x 9af6bb2eccd7c04c2f06097278f6b92f8d74860b
# When the conflicts are resolved, stage and commit the changes
git add . && git cherry-pick --continue

If you have the GitHub CLI installed:

# Create the PR body template
PR_BODY=$(gh pr view 2786 --json body --template 'Backport 9af6bb2eccd7c04c2f06097278f6b92f8d74860b from #2786{{ "\n\n---\n\n" }}{{ index . "body" }}')
# Push the branch to GitHub and a PR
echo "${PR_BODY}" | gh pr create --title "[release-v2.2] Fix panic in the metrics-generator when using multiple tenants with default overrides" --body-file - --label "type/bug" --label "backport" --base release-v2.2 --milestone release-v2.2 --web

Or, if you don't have the GitHub CLI installed (we recommend you install it!):

# If you don't have the GitHub CLI installed: Push the branch to GitHub and manually create a PR:
git push --set-upstream origin backport-2786-to-release-v2.2
# Remove the local backport branch
git switch main
git branch -D backport-2786-to-release-v2.2

Unless you've used the GitHub CLI above, now create a pull request where the base branch is release-v2.2 and the compare/head branch is backport-2786-to-release-v2.2.

@yvrhdn yvrhdn deleted the kvrhdn/metrics-generator-panic branch August 17, 2023 17:41
joe-elliott pushed a commit that referenced this pull request Aug 18, 2023
…efault overrides (#2786)

* Modify concurrency test to consider multitenant / concurrent updateProcessors calls

* Copy desiredProcessors map before modifying it

* Update CHANGELOG.md

(cherry picked from commit 9af6bb2)
joe-elliott added a commit that referenced this pull request Aug 18, 2023
…efault overrides (#2786) (#2806)

* Modify concurrency test to consider multitenant / concurrent updateProcessors calls

* Copy desiredProcessors map before modifying it

* Update CHANGELOG.md

(cherry picked from commit 9af6bb2)

Co-authored-by: Koenraad Verheyden <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metrics-generator in 2.2 panics with: fatal error: concurrent map iteration and map write
2 participants