[Runtime][ThreadPool]Refactor affinity function and support CPU affinity list setting. #9802

huajsj · 2021-12-24T01:26:27Z

PR: #7892
Tracking issue: #8596

Issue:

There are multiple affinity function using "LINUX" and "ANDROID" macro check and the multiple check make the logic maintain and change become complex.
Current logic of tvm [Runtime][ThreadPool] assume all of the cpu resources are available for
a single backend runtime to do the data flow computation. But such assumption may not
true when user running multiple task on the system and not want tvm task
exhaust all of the cpu resource, or when user going to run multiple backend
runtime of tvm on the system, each backend runtime of tvm should use different cpu
affinity settings to achieve best performance.

Solution:

Refactor the affinity functions to move the "LINUX" and "ANDROID" check into one function.
In this solution, we introduce a new "CPU AffinityMode type" named "kSpecify", by using "kSpecify" and the function named "tvm::runtime::threading ::Configure" user can specify the cpu list for the cpu affinity of a backend runtime. This solution reused the existing per thread thread pool logic of [Runtime][Threadpool] that created a worker thread pool for current thread which can running a particular runtime. for a multiple runtime use case, user can first launch multiple threads, then call "tvm::runtime::threading ::Configure" with cpu list to create tvm data flow worker thread pool, after doing this the execution of the multiple runtime on the multiple threads will use different cpu resource list .

include/tvm/runtime/threading_backend.h

…nity list setting. Issue: 1. There are multiple affinity function using "LINUX" and "ANDROID" macro check and the multiple check make the logic maintain and change become complex. 2. Current logic of tvm [Runtime][ThreadPool] assume all of the cpu resources are available for a single backend runtime to do the data flow computation. But such assumption may not true when user running multiple task on the system and not want tvm task exhaust all of the cpu resource, or when user going to run multiple backend runtime of tvm on the system, each backend runtime of tvm should use different cpu affinity settings to achieve best performance. Solution: 1.Refactor the affinity functions to move the "LINUX" and "ANDROID" check into one function. 2.In this solution, we introduce a new "CPU AffinityMode type" named "kSpecify", by using "kSpecify" and the function named "tvm::runtime::threading ::Configure" user can specify the cpu list for the cpu affinity of a backend runtime. This solution reused the existing per thread thread pool logic of [Runtime][Threadpool] that created a worker thread pool for current thread which can running a particular runtime. for a multiple runtime use case, user can first launch multiple threads, then call "tvm::runtime::threading ::Configure" with cpu list to create tvm data flow worker thread pool, after doing this the execution of the multiple runtime on the multiple threads will use different cpu resource list.

huajsj · 2022-02-02T18:57:02Z

@comaniac , @masahi , @liangfu, please take a look.

masahi · 2022-02-15T18:44:21Z

src/runtime/thread_pool.cc

+  if (args.num_args == 3) {
+    Array<String> cpu_array = args[2];
+    for (auto cpu : cpu_array) {
+      cpus.push_back(std::stoi(cpu));


Verify that the string represents a valid integer?

What you want to do is to restrict tvm thread run on specific cpu core ids? If so, how to handle Python's API interface? For example:

from tvm._ffi import get_global_func config_threadpool = get_global_func('runtime.config_threadpool') core_ids = (0, 1, 2) config_threadpool(0, 1, *core_ids)

?
And how to handle the specific core ids is greater than the 2nd argument(i.e. how many threads to lauch)?

thanks @FrozenGene for the follow up, if the core ids is greater than threads number, all the threads will be set the affinity with all of the cpu in cpu list, at the said case, thread 0 will affinity with cpu (0, 1, 2) , the related logic in threading_backend.cc::120 - 129 line.

What I asked is how to handle Python's API pack syntax. If I write the code as previous, your current code can not handle, because the unpacked argument will not the type of list like C++. @huajsj

Sorry for the misunderstanding. following are the answer

What you want to do is to restrict tvm thread run on specific cpu core ids?

Yes, restrict the worker thread running on specific cpu core or cpu core groups

how to handle Python's API interface?

the supported use case like following
config_threadpool('-1', 2, ['1', '2', '3'])

And how to handle the specific core ids is greater than the 2nd argument(i.e. how many threads to lauch).

In existing logic, the second parameter is not used to determine how many worker threads to launch, it is used as
a default value about how many task in a parallel running should be used when task number not get set.
and the final value of task number is the minimize of 'max_cocurrency' and this value .
The thread launched number determine by 'max_concurrency', in our solution ,this value will be the cpu id list size.
Under this solution, for example when cpu list is ['2', '8', '9'], nthreads is 2, exclude_worker0 is true(default) following will happen.

mode is 'kSpecifyOneCorePerThread',
1.1. 3 worker thread get launched , cpu affinity like following
T1 (2-9)
T2 (8)
T3 (9)
1.2 when run the task
task1 --> T1
task2 --> T2

mode is 'kSpecifyThreadShareAllCore',
2.1 3 worker thread get launched , cpu affinity like following
T1 (2-9)
T2 (2-9)
T3 (2-9)
2.2 when run the task
task1 --> T1
task2 --> T2

src/runtime/thread_pool.cc

src/runtime/threading_backend.cc

src/runtime/thread_pool.cc

src/runtime/threading_backend.cc

masahi · 2022-02-15T19:05:11Z

Please help reviewing @tqchen @junrushao1994 @FrozenGene

tqchen · 2022-02-17T21:08:18Z

cc @yidawang @FrozenGene @yongwww would be great if you can help to take a look

include/tvm/runtime/threading_backend.h

…ity list setting. (apache#9802) * [Runtime][ThreadPool] Refactor affinity function and support CPU affinity list setting. Issue: 1. There are multiple affinity function using "LINUX" and "ANDROID" macro check and the multiple check make the logic maintain and change become complex. 2. Current logic of tvm [Runtime][ThreadPool] assume all of the cpu resources are available for a single backend runtime to do the data flow computation. But such assumption may not true when user running multiple task on the system and not want tvm task exhaust all of the cpu resource, or when user going to run multiple backend runtime of tvm on the system, each backend runtime of tvm should use different cpu affinity settings to achieve best performance. Solution: 1.Refactor the affinity functions to move the "LINUX" and "ANDROID" check into one function. 2.In this solution, we introduce a new "CPU AffinityMode type" named "kSpecify", by using "kSpecify" and the function named "tvm::runtime::threading ::Configure" user can specify the cpu list for the cpu affinity of a backend runtime. This solution reused the existing per thread thread pool logic of [Runtime][Threadpool] that created a worker thread pool for current thread which can running a particular runtime. for a multiple runtime use case, user can first launch multiple threads, then call "tvm::runtime::threading ::Configure" with cpu list to create tvm data flow worker thread pool, after doing this the execution of the multiple runtime on the multiple threads will use different cpu resource list. * fix windows build issue. * fix build issue. * fix build issue. * fix windows build issue. * fix plint issue * polish comments. * address review comments. * address reivew comments. * address review comments. * address review comments. Co-authored-by: hua jiang <[email protected]>

huajsj requested review from areusch, comaniac, icemelon, jroesch, junrushao, kazum, liangfu, masahi, merrymercy, tmoreau89, tqchen, vinx13, yzhliu and ZihengJiang as code owners December 24, 2021 01:26

huajsj force-pushed the threadaffinity branch 2 times, most recently from d69d1f5 to a69aa58 Compare December 24, 2021 08:48

huajsj mentioned this pull request Dec 25, 2021

[RFC][Tracking Issue] Pipeline Executor For Compute graph pipeline #8596

Closed

15 tasks

huajsj force-pushed the threadaffinity branch from 2da18e5 to 52ba194 Compare December 28, 2021 02:23

huajsj changed the title ~~[Runtime][ThreadPool]Supporting parallel execution of multiple backend runtime by specifying a cpu affinity list for each runtime.~~ [Runtime][ThreadPool]Refactor affinity function and support CPU affinity list setting. Dec 28, 2021

huajsj force-pushed the threadaffinity branch 2 times, most recently from 9ed8806 to b73fc0e Compare December 30, 2021 07:34

HungYangChang reviewed Dec 30, 2021

View reviewed changes

include/tvm/runtime/threading_backend.h Outdated Show resolved Hide resolved

huajsj force-pushed the threadaffinity branch from b73fc0e to 40bf993 Compare December 31, 2021 08:01

huajsj and others added 6 commits January 6, 2022 20:58

fix windows build issue.

1b457e9

fix build issue.

0c8d189

fix build issue.

d8ec871

fix windows build issue.

7922d6e

fix plint issue

b3cdd2a

huajsj force-pushed the threadaffinity branch from 8cc293b to b3cdd2a Compare January 7, 2022 05:00

polish comments.

937aa77

masahi requested changes Feb 15, 2022

View reviewed changes

address review comments.

ada242c

huajsj requested a review from masahi February 18, 2022 06:07

yongwww reviewed Feb 22, 2022

View reviewed changes

include/tvm/runtime/threading_backend.h Show resolved Hide resolved

include/tvm/runtime/threading_backend.h Show resolved Hide resolved

huajsj added 3 commits February 24, 2022 15:10

address reivew comments.

13fe147

address review comments.

770fc92

address review comments.

7ea0c2e

masahi approved these changes Mar 1, 2022

View reviewed changes

masahi merged commit 5e353d5 into apache:main Mar 1, 2022

driazati mentioned this pull request Jul 14, 2022

TVM v0.9.0.rc0 Release Candidate Notes #12102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Runtime][ThreadPool]Refactor affinity function and support CPU affinity list setting. #9802

[Runtime][ThreadPool]Refactor affinity function and support CPU affinity list setting. #9802

huajsj commented Dec 24, 2021 •

edited

Loading

huajsj commented Feb 2, 2022

masahi Feb 15, 2022

huajsj Feb 17, 2022

FrozenGene Feb 21, 2022

huajsj Feb 23, 2022

FrozenGene Feb 25, 2022 •

edited

Loading

huajsj Feb 25, 2022

masahi commented Feb 15, 2022

tqchen commented Feb 17, 2022

[Runtime][ThreadPool]Refactor affinity function and support CPU affinity list setting. #9802

[Runtime][ThreadPool]Refactor affinity function and support CPU affinity list setting. #9802

Conversation

huajsj commented Dec 24, 2021 • edited Loading

huajsj commented Feb 2, 2022

masahi Feb 15, 2022

Choose a reason for hiding this comment

huajsj Feb 17, 2022

Choose a reason for hiding this comment

FrozenGene Feb 21, 2022

Choose a reason for hiding this comment

huajsj Feb 23, 2022

Choose a reason for hiding this comment

FrozenGene Feb 25, 2022 • edited Loading

Choose a reason for hiding this comment

huajsj Feb 25, 2022

Choose a reason for hiding this comment

masahi commented Feb 15, 2022

tqchen commented Feb 17, 2022

huajsj commented Dec 24, 2021 •

edited

Loading

FrozenGene Feb 25, 2022 •

edited

Loading