Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove use of thread_local from ModuleAllocMonitor #46606

Merged
merged 3 commits into from
Nov 8, 2024

Conversation

Dr15Jones
Copy link
Contributor

PR description:

Problems were seen related to thread local algorithm being called recursively. Replaced thread local with static allocation assignment using thread ids.

PR validation:

Tested using step1 of workflow 11834.21 which previously had a segmentation fault when used with ModuleAllocMonitor and multiple threads. Now it worked.

fixes #45964

Problems were seen related to thread local algorithm being called
recursively. Replaced thread local with static allocation assignment
using thread ids.
@Dr15Jones
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 5, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 5, 2024

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 5, 2024

A new Pull Request was created by @Dr15Jones for master.

It involves the following packages:

  • PerfTools/AllocMonitor (core)

@Dr15Jones, @makortel, @smuzaffar can you please review it and eventually sign? Thanks.
@felicepantaleo, @makortel this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 6, 2024

+1

Size: This PR adds an extra 16KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-991153/42605/summary.html
COMMIT: 11ad172
CMSSW: CMSSW_14_2_X_2024-11-05-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/46606/42605/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

@makortel
Copy link
Contributor

makortel commented Nov 7, 2024

I see the approach is similar to #45150, but not exactly the same. Why does ModuleAllocMonitor need more complex ThreadTracker? (are e.g. collisions more probable?)

I also wonder if it would feasible to share some of these complicated codes between here and memory_proxies.cc? (although I wouldn't be surprised our requirements for the separation of these components could prohibit straightforward ways of using common code)

@Dr15Jones
Copy link
Contributor Author

The AllocMonitorPreload uses a lock to guarantee that if there is a hash collision, only one thread at a time can be in the critical section (i.e. stop reporting of allocation/deallocation requests on that thread). The lock is expected to be held only for a tiny amount of time and no subsequent need for data after leaving that critical section. Therefore threads are allowed to share the same index, it just makes the job a bit slower if they do because of the possible lock contention.

In this PR, each thread must have its own info for a long period of time (i.e. the time period for a module processing an event). If I were to use a mutex to resolve any possible hash collisions of the thread ids, then we could potentially slow down the job a great deal as threads would have to wait a long time. Also, if for some reason an end service signal did not fire so the lock was never released the job would probably deadlock. Therefore assigning permanent indices into the allocation info containers for each thread id seemed to be the potentially faster and safer option. In this code the first event to hash into the container gets the 'O(1)' access and any subsequent threads that have the same hash get relegated to the linear lookup access.

Copy link
Contributor

@makortel makortel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a note in the README.md that the AllocMonitor implementations should not use thread_local, maybe with a pointer to ModuleAllocMonitor if per-thread storage is needed? (although if that happens I'd like to consider stronger a shared implementation)

v = kUnusedEntry;
}
for (auto& v : hashed_threads_) {
v = ++entry;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these free "hashed threads" are marked differently from the "extra threads"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this special value for a free "hashed thread" mean there is an assumption the thread ID can not be smaller than kHashedEntries? (probably it is a good assumption, but nevertheless)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the hashed thread all the thread ids have been modulo-ed by kHashedEntries. The value initially assigned to the hashed_threads_ entry is actually a value that is impossible to be allowed in that spot with such a modulo (i.e. it is always 1 larger than the smallest allowed value in that spot).

Given the extra thread is linearly search, there is notvalue that can be safely inserted as the initial value for an entry. I assumed that no thread would be assigned ~entry_type(0).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, got it. Could you add these comments to code itself (for the next reader)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comments.

@Dr15Jones
Copy link
Contributor Author

README.md was updated with a warning about thread_local and I added an entry for ModuleAllocMonitor which was missing.

@Dr15Jones
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 7, 2024

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 7, 2024

Pull request #46606 was updated. @Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please check and sign again.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 7, 2024

@Dr15Jones
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 7, 2024

Pull request #46606 was updated. @Dr15Jones, @makortel, @smuzaffar can you please check and sign again.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 8, 2024

+1

Size: This PR adds an extra 20KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-991153/42668/summary.html
COMMIT: 1f1fdcd
CMSSW: CMSSW_14_2_X_2024-11-07-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/46606/42668/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

@makortel
Copy link
Contributor

makortel commented Nov 8, 2024

Comparison differences are related to #46416

@makortel
Copy link
Contributor

makortel commented Nov 8, 2024

+core

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 8, 2024

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @sextonkennedy, @antoniovilela, @mandrenguyen (and backports should be raised in the release meeting by the corresponding L2)

@makortel
Copy link
Contributor

makortel commented Nov 8, 2024

I ran the same setup that resulted the failure reported in #45964 description with this PR a few times, and saw no failures.

@mandrenguyen
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 6b84ba6 into cms-sw:master Nov 8, 2024
11 checks passed
@Dr15Jones Dr15Jones deleted the fixModuleAllocMonitor branch November 11, 2024 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory corruption with AllocMonitors
4 participants