Migrate to tbb::task_group #305

makortel · 2022-02-04T19:20:34Z

Resolves #295. Following the approach in cms-sw/cmssw#32804.

fwyzard · 2022-02-09T09:43:06Z

All the backends I tested (all but hip and kokkos) compile, run, and pass the validation.

With 8 or more threads, the performance is the same as before.
With less than 8 threads the performance is visibly worse than before:

Is this expected ?

makortel · 2022-02-10T01:20:55Z

Thanks @fwyzard for the check. Chris measured at a time the performance of this approach with results in https://indico.cern.ch/event/1005030/#6-adapting-cmssw-to-tbbs-inter. E.g. on slide 9 this setup corresponds the green line (single tbb::task_group). With very fast tasks a significant degradation would be expected, but in CMSSW with "trivial configuration" (slide 13) the degradation turned out to be small (with orders of magnitude higher throughput than here).

It could be that the setup here lies somewhere in between Chris' toy framework and CMSSW. I'm looking into this.

makortel · 2022-02-11T14:54:35Z

The culprit appears to be this spinloop

pixeltrack-standalone/src/alpaka/bin/EventProcessor.cc

Lines 41 to 43 in 796f790

    
           do { 
        
             group.wait(); 
        
           } while (not globalWaitTask.done());

which is needed to guarantee the program won't terminate while there is no other work than waiting for an asynchronous work launched from ExternalWork acquire() to complete. perf showed high load on this, and the "best" way to hit to this code path is one-thread (or low-thread) jobs, where the impact on throughput is visible.

I gave a try to the "deferred task" preview feature of recent oneTBB versions (as suggested to us in uxlfoundation/oneTBB#346 (comment)), and that seemed to restore the performance (at least within ~2 % in my limited tests). I used oneTBB v2021.4.0 (which we use in CMSSW_12_3_X) for that test. While exploring this option I noticed GCC 8.4.0 to crash when compiling oneTBB v2021.4.0, while GCC 9.3.0 worked fine. I'd suggest we just bump the minimum GCC version to 9; I suppose we don't have hard requirement to keep supporting GCC 8 (I've kept that because that was used in the CMSSW version the cuda code was originally exported).

I had been thinking to update the oneTBB anyway after this PR. The main question is then if it would be better to move to recent oneTBB update and use of deferred task already in this PR, or in two subsequent PRs. I don't have strong feelings at this point. @fwyzard, which would you prefer?

fwyzard · 2022-02-11T17:07:21Z

@makortel thanks for the investigation !

I would make one PR to bump the GCC requirement, and them update this PR to include the newer version of TBB and the use of deferred tasks.

makortel · 2022-02-11T19:22:27Z

I would make one PR to bump the GCC requirement, and them update this PR to include the newer version of TBB and the use of deferred tasks.

Sounds good, I'll prepare another PR to bump the GCC requirement.

…e task_arena for runToCompletion() Propagating changes from cms-patatrack#242 cms-patatrack#245 For consistency make --numerOfThreads 0 to use all CPU cores, and reorder #includes.

Following cms-sw/cmssw#32804

makortel · 2022-02-14T21:09:55Z

Updates

Propagate TBB usage improvements from alpaka(test) to other programs too #246 missed cudatest, so it is updated here
Update TBB to v2021.4.0 (this version is being used in CMSSW_12_3_X, along the spec file there I made TBB to depend on hwloc here as well)
Replace the aforementioned spinloop with deferred tasks

makortel · 2022-02-14T21:14:45Z

src/alpaka/Framework/WaitingTaskWithArenaHolder.cc

    m_task->increment_ref_count();
+    m_handle = std::make_shared<tbb::task_handle>(m_group->defer([task = m_task]() {


I'm not terribly happy with the approach I took for tbb::task_handle here, e.g. there are now three members doing reference counting (two shared_ptrs implicitly and m_task explicitly), but I'd leave improvements (likely a redesign) to a later time.

src/alpaka/Framework/WaitingTaskWithArenaHolder.cc

fwyzard · 2022-02-15T15:17:00Z

src/alpaka/Framework/WaitingTaskHolder.h

+        m_group->run([task]() {
+          TaskSentry s{task};
+          task->execute();
+        });


Trying to understand what this pattern does:

m_group is a pointer to the tbb::task_group that will run the task

m_task is the a pointer to the WaitingTask to be run

task is a copy of m_task (the comment above explains why the extra copy is needed)

m_group takes a functor (here a lambda) to be run

the lambda makes a copy of task

the TaskSentry takes ownership of the task, guaranteeing that it will be deleted after it has run (also in case an exception is thrown)

the task is executed inside the lambda inside the task_group

The only confusing aspect is that TaskSentry will delete the task once the lambda is complete, except if it is a FinalWaitingTask, in which case the destructor of the TaskSentry effectively does not do anything.

That difference comes from objects of all other task types than FinalWaitingTask being allocated with new and intended to be deleted right after the task execution has finished. The FinalWaitingTask object is allocated on the stack of the function that initiates the "asynchronous region" and does a wait on the tbb::task_group. It must be kept alive until all the asynchronous tasks of the region have finished (e.g. to check if exception was thrown).

fwyzard · 2022-02-15T19:53:20Z

With these latest changes, the performance using tbb::task_group is the same as using tbb::task:

Zoomed-in version of the CUDA performance:

Zoomed-in version of the CPU performance:

Note that these plots were made using the tbb::spin_mutex in the caching allocators.

fwyzard · 2022-02-15T19:54:21Z

@makortel do you prefer to keep the individual commits or squash them ?

makortel · 2022-02-15T20:02:33Z

Thanks a lot Andrea for the detailed checks!

do you prefer to keep the individual commits or squash them ?

In general I've been trying to touch only one test program code in one commit, and I'd like to keep that. I'd like to also separate "migrate to tbb::task_group", "TBB update", and "use of deferred tasks" as separate sets of commits. But I see now that I have some spurious "ScopedContext" etc commits that I intended to squash, so let me fix those.

makortel · 2022-02-15T20:07:22Z

But I see now that I have some spurious "ScopedContext" etc commits that I intended to squash, so let me fix those.

Done now.

makortel added general kokkos alpaka cudacompat cuda sycl hip serial labels Feb 4, 2022

makortel mentioned this pull request Feb 4, 2022

TBB removed tbb::task #295

Closed

makortel mentioned this pull request Feb 11, 2022

Bump minimum GCC version to 9 #310

Closed

makortel added 3 commits February 14, 2022 10:25

[cudatest] Set the number of threads with tbb::global_control, and us…

b419c7d

…e task_arena for runToCompletion() Propagating changes from cms-patatrack#242 cms-patatrack#245 For consistency make --numerOfThreads 0 to use all CPU cores, and reorder #includes.

[fwtest] Migrate to tbb::task_group

bec05c4

Following cms-sw/cmssw#32804

[alpakatest] Migrate to tbb::task_group

18a219a

makortel force-pushed the tbbTaskGroup branch from 796f790 to d3b4171 Compare February 14, 2022 21:08

makortel commented Feb 14, 2022

View reviewed changes

src/alpaka/Framework/WaitingTaskWithArenaHolder.cc Show resolved Hide resolved

makortel force-pushed the tbbTaskGroup branch from d3b4171 to 1a6fc1f Compare February 14, 2022 21:38

fwyzard reviewed Feb 15, 2022

View reviewed changes

makortel added 3 commits February 15, 2022 12:06

[alpaka] Migrate to tbb::task_group

ea13aa2

[cuda] Migrate to tbb::task_group

c91d8dc

[cudacompat] Migrate to tbb::task_group

6850276

makortel added 25 commits February 15, 2022 12:06

[cudadev] Migrate to tbb::task_group

d2ab126

[cudatest] Migrate to tbb::task_group

3664d62

[cudauvm] Migrate to tbb::task_group

dd0a902

[hip] Migrate to tbb::task_group

e0119c5

[hiptest] Migrate to tbb::task_group

d45e77b

[kokkos] Migrate to tbb::task_group

3a76ed0

[kokkostest] Migrate to tbb::task_group

0380cb8

[serial] Migrate to tbb::task_group

b50b062

[sycltest] Migrate to tbb::task_group

ffe2671

Update TBB to v2021.4.0

516da0a

Enable TBB_PREVIEW_TASK_GROUP_EXTENSIONS in Makefile for deferred tasks

0b0f34e

[alpakatest] Replace spinloop with deferred task

70100ce

[alpaka] Replace spinloop with deferred task

07984bb

[cuda] Replace spinloop with deferred task

f0e0f8d

[cudacompat] Replace spinloop with deferred task

f88eb35

[cudadev] Replace spinloop with deferred task

4a1da8c

[cudatest] Replace spinloop with deferred task

7b967f8

[cudauvm] Replace spinloop with deferred task

962a41c

[fwtest] Replace spinloop with deferred task

60e6741

[hip] Replace spinloop with deferred task

93e0ea2

[hiptest] Replace spinloop with deferred task

a806719

[kokkos] Replace spinloop with deferred task

2d002e6

[kokkostest] Replace spinloop with deferred task

88fc95a

[serial] Replace spinloop with deferred task

0889d87

[sycltest] Replace spinloop with deferred task

47e4d32

makortel force-pushed the tbbTaskGroup branch from 1a6fc1f to 47e4d32 Compare February 15, 2022 20:07

fwyzard merged commit cfd57e8 into cms-patatrack:master Feb 15, 2022

makortel mentioned this pull request Feb 15, 2022

[alpakatest] Update AlpakaCore from alpaka, and split shared objects to backend-specific libraries #304

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate to tbb::task_group #305

Migrate to tbb::task_group #305

makortel commented Feb 4, 2022

fwyzard commented Feb 9, 2022

makortel commented Feb 10, 2022

makortel commented Feb 11, 2022

fwyzard commented Feb 11, 2022

makortel commented Feb 11, 2022

makortel commented Feb 14, 2022 •

edited

Loading

makortel Feb 14, 2022

fwyzard Feb 15, 2022

makortel Feb 15, 2022

fwyzard commented Feb 15, 2022

fwyzard commented Feb 15, 2022

makortel commented Feb 15, 2022

makortel commented Feb 15, 2022

		m_task->increment_ref_count();
		m_handle = std::make_shared<tbb::task_handle>(m_group->defer([task = m_task]() {

Migrate to tbb::task_group #305

Migrate to tbb::task_group #305

Conversation

makortel commented Feb 4, 2022

fwyzard commented Feb 9, 2022

makortel commented Feb 10, 2022

makortel commented Feb 11, 2022

fwyzard commented Feb 11, 2022

makortel commented Feb 11, 2022

makortel commented Feb 14, 2022 • edited Loading

makortel Feb 14, 2022

Choose a reason for hiding this comment

fwyzard Feb 15, 2022

Choose a reason for hiding this comment

makortel Feb 15, 2022

Choose a reason for hiding this comment

fwyzard commented Feb 15, 2022

fwyzard commented Feb 15, 2022

makortel commented Feb 15, 2022

makortel commented Feb 15, 2022

makortel commented Feb 14, 2022 •

edited

Loading