-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate to tbb::task_group #305
Conversation
Thanks @fwyzard for the check. Chris measured at a time the performance of this approach with results in https://indico.cern.ch/event/1005030/#6-adapting-cmssw-to-tbbs-inter. E.g. on slide 9 this setup corresponds the green line (single It could be that the setup here lies somewhere in between Chris' toy framework and CMSSW. I'm looking into this. |
The culprit appears to be this spinloop pixeltrack-standalone/src/alpaka/bin/EventProcessor.cc Lines 41 to 43 in 796f790
which is needed to guarantee the program won't terminate while there is no other work than waiting for an asynchronous work launched from ExternalWork acquire() to complete. perf showed high load on this, and the "best" way to hit to this code path is one-thread (or low-thread) jobs, where the impact on throughput is visible.
I gave a try to the "deferred task" preview feature of recent oneTBB versions (as suggested to us in uxlfoundation/oneTBB#346 (comment)), and that seemed to restore the performance (at least within ~2 % in my limited tests). I used oneTBB v2021.4.0 (which we use in CMSSW_12_3_X) for that test. While exploring this option I noticed GCC 8.4.0 to crash when compiling oneTBB v2021.4.0, while GCC 9.3.0 worked fine. I'd suggest we just bump the minimum GCC version to 9; I suppose we don't have hard requirement to keep supporting GCC 8 (I've kept that because that was used in the CMSSW version the I had been thinking to update the oneTBB anyway after this PR. The main question is then if it would be better to move to recent oneTBB update and use of deferred task already in this PR, or in two subsequent PRs. I don't have strong feelings at this point. @fwyzard, which would you prefer? |
@makortel thanks for the investigation ! I would make one PR to bump the GCC requirement, and them update this PR to include the newer version of TBB and the use of deferred tasks. |
Sounds good, I'll prepare another PR to bump the GCC requirement. |
…e task_arena for runToCompletion() Propagating changes from cms-patatrack#242 cms-patatrack#245 For consistency make --numerOfThreads 0 to use all CPU cores, and reorder #includes.
796f790
to
d3b4171
Compare
Updates
|
m_task->increment_ref_count(); | ||
m_handle = std::make_shared<tbb::task_handle>(m_group->defer([task = m_task]() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not terribly happy with the approach I took for tbb::task_handle
here, e.g. there are now three members doing reference counting (two shared_ptr
s implicitly and m_task
explicitly), but I'd leave improvements (likely a redesign) to a later time.
d3b4171
to
1a6fc1f
Compare
m_group->run([task]() { | ||
TaskSentry s{task}; | ||
task->execute(); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to understand what this pattern does:
m_group
is a pointer to thetbb::task_group
that will run the taskm_task
is the a pointer to theWaitingTask
to be runtask
is a copy ofm_task
(the comment above explains why the extra copy is needed)m_group
takes a functor (here a lambda) to be run- the lambda makes a copy of
task
- the
TaskSentry
takes ownership of thetask
, guaranteeing that it will be deleted after it has run (also in case an exception is thrown) - the task is executed inside the lambda inside the
task_group
The only confusing aspect is that TaskSentry
will delete the task once the lambda is complete, except if it is a FinalWaitingTask
, in which case the destructor of the TaskSentry
effectively does not do anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That difference comes from objects of all other task types than FinalWaitingTask
being allocated with new
and intended to be deleted right after the task execution has finished. The FinalWaitingTask
object is allocated on the stack of the function that initiates the "asynchronous region" and does a wait on the tbb::task_group
. It must be kept alive until all the asynchronous tasks of the region have finished (e.g. to check if exception was thrown).
@makortel do you prefer to keep the individual commits or squash them ? |
Thanks a lot Andrea for the detailed checks!
In general I've been trying to touch only one test program code in one commit, and I'd like to keep that. I'd like to also separate "migrate to |
1a6fc1f
to
47e4d32
Compare
Done now. |
Resolves #295. Following the approach in cms-sw/cmssw#32804.