Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Runtime][ThreadPool]Refactor affinity function and support CPU affinity list setting. #9802

Merged
merged 11 commits into from
Mar 1, 2022

Conversation

huajsj
Copy link
Contributor

@huajsj huajsj commented Dec 24, 2021

PR: #7892
Tracking issue: #8596

Issue:

  1. There are multiple affinity function using "LINUX" and "ANDROID" macro check and the multiple check make the logic maintain and change become complex.

  2. Current logic of tvm [Runtime][ThreadPool] assume all of the cpu resources are available for
    a single backend runtime to do the data flow computation. But such assumption may not
    true when user running multiple task on the system and not want tvm task
    exhaust all of the cpu resource, or when user going to run multiple backend
    runtime of tvm on the system, each backend runtime of tvm should use different cpu
    affinity settings to achieve best performance.

Solution:

  1. Refactor the affinity functions to move the "LINUX" and "ANDROID" check into one function.

  2. In this solution, we introduce a new "CPU AffinityMode type" named "kSpecify", by using "kSpecify" and the function named "tvm::runtime::threading ::Configure" user can specify the cpu list for the cpu affinity of a backend runtime. This solution reused the existing per thread thread pool logic of [Runtime][Threadpool] that created a worker thread pool for current thread which can running a particular runtime. for a multiple runtime use case, user can first launch multiple threads, then call "tvm::runtime::threading ::Configure" with cpu list to create tvm data flow worker thread pool, after doing this the execution of the multiple runtime on the multiple threads will use different cpu resource list .

@huajsj huajsj force-pushed the threadaffinity branch 2 times, most recently from d69d1f5 to a69aa58 Compare December 24, 2021 08:48
@huajsj huajsj changed the title [Runtime][ThreadPool]Supporting parallel execution of multiple backend runtime by specifying a cpu affinity list for each runtime. [Runtime][ThreadPool]Refactor affinity function and support CPU affinity list setting. Dec 28, 2021
@huajsj huajsj force-pushed the threadaffinity branch 2 times, most recently from 9ed8806 to b73fc0e Compare December 30, 2021 07:34
huajsj and others added 6 commits January 6, 2022 20:58
…nity list setting.

Issue:
1. There are multiple affinity function using "LINUX" and "ANDROID" macro
check and the multiple check make the logic maintain and change become
complex.

2. Current logic of tvm [Runtime][ThreadPool] assume all of the cpu resources are available for
a single backend runtime to do the data flow computation. But such assumption may not
true when user running multiple task on the system and not want tvm task
exhaust all of the cpu resource, or when user going to run multiple backend
runtime of tvm on the system, each backend runtime of tvm should use different cpu
affinity settings to achieve best performance.

Solution:
1.Refactor the affinity functions to move the "LINUX" and "ANDROID" check
into one function.

2.In this solution, we introduce a new "CPU AffinityMode type" named "kSpecify", by using
"kSpecify" and the function named "tvm::runtime::threading ::Configure" user can specify
the cpu list for the cpu affinity of a backend runtime.

This solution reused the existing per thread thread pool logic of [Runtime][Threadpool] that
created a worker thread pool for current thread which can running a particular runtime. for a multiple
runtime use case, user can first launch multiple threads, then call "tvm::runtime::threading ::Configure"
with cpu list to create tvm data flow worker thread pool, after doing this the execution of the multiple
runtime on the multiple threads will use different cpu resource list.
@huajsj
Copy link
Contributor Author

huajsj commented Feb 2, 2022

@comaniac , @masahi , @liangfu, please take a look.

if (args.num_args == 3) {
Array<String> cpu_array = args[2];
for (auto cpu : cpu_array) {
cpus.push_back(std::stoi(cpu));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verify that the string represents a valid integer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you want to do is to restrict tvm thread run on specific cpu core ids? If so, how to handle Python's API interface? For example:

from tvm._ffi import get_global_func
config_threadpool = get_global_func('runtime.config_threadpool')
core_ids = (0, 1, 2)
config_threadpool(0, 1, *core_ids)

?
And how to handle the specific core ids is greater than the 2nd argument(i.e. how many threads to lauch)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @FrozenGene for the follow up, if the core ids is greater than threads number, all the threads will be set the affinity with all of the cpu in cpu list, at the said case, thread 0 will affinity with cpu (0, 1, 2) , the related logic in threading_backend.cc::120 - 129 line.

Copy link
Member

@FrozenGene FrozenGene Feb 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I asked is how to handle Python's API pack syntax. If I write the code as previous, your current code can not handle, because the unpacked argument will not the type of list like C++. @huajsj

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the misunderstanding. following are the answer

What you want to do is to restrict tvm thread run on specific cpu core ids?

Yes, restrict the worker thread running on specific cpu core or cpu core groups

how to handle Python's API interface?

the supported use case like following
config_threadpool('-1', 2, ['1', '2', '3'])

And how to handle the specific core ids is greater than the 2nd argument(i.e. how many threads to lauch).

In existing logic, the second parameter is not used to determine how many worker threads to launch, it is used as
a default value about how many task in a parallel running should be used when task number not get set.
and the final value of task number is the minimize of 'max_cocurrency' and this value .
The thread launched number determine by 'max_concurrency', in our solution ,this value will be the cpu id list size.
Under this solution, for example when cpu list is ['2', '8', '9'], nthreads is 2, exclude_worker0 is true(default) following will happen.

  1. mode is 'kSpecifyOneCorePerThread',
    1.1. 3 worker thread get launched , cpu affinity like following
    T1 (2-9)
    T2 (8)
    T3 (9)
    1.2 when run the task
    task1 --> T1
    task2 --> T2
  2. mode is 'kSpecifyThreadShareAllCore',
    2.1 3 worker thread get launched , cpu affinity like following
    T1 (2-9)
    T2 (2-9)
    T3 (2-9)
    2.2 when run the task
    task1 --> T1
    task2 --> T2

src/runtime/thread_pool.cc Outdated Show resolved Hide resolved
src/runtime/threading_backend.cc Outdated Show resolved Hide resolved
src/runtime/thread_pool.cc Outdated Show resolved Hide resolved
src/runtime/threading_backend.cc Outdated Show resolved Hide resolved
@masahi
Copy link
Member

masahi commented Feb 15, 2022

Please help reviewing @tqchen @junrushao1994 @FrozenGene

@tqchen
Copy link
Member

tqchen commented Feb 17, 2022

cc @yidawang @FrozenGene @yongwww would be great if you can help to take a look

@huajsj huajsj requested a review from masahi February 18, 2022 06:07
@masahi masahi merged commit 5e353d5 into apache:main Mar 1, 2022
pfk-beta pushed a commit to pfk-beta/tvm that referenced this pull request Apr 11, 2022
…ity list setting. (apache#9802)

* [Runtime][ThreadPool] Refactor affinity function and support CPU affinity list setting.

Issue:
1. There are multiple affinity function using "LINUX" and "ANDROID" macro
check and the multiple check make the logic maintain and change become
complex.

2. Current logic of tvm [Runtime][ThreadPool] assume all of the cpu resources are available for
a single backend runtime to do the data flow computation. But such assumption may not
true when user running multiple task on the system and not want tvm task
exhaust all of the cpu resource, or when user going to run multiple backend
runtime of tvm on the system, each backend runtime of tvm should use different cpu
affinity settings to achieve best performance.

Solution:
1.Refactor the affinity functions to move the "LINUX" and "ANDROID" check
into one function.

2.In this solution, we introduce a new "CPU AffinityMode type" named "kSpecify", by using
"kSpecify" and the function named "tvm::runtime::threading ::Configure" user can specify
the cpu list for the cpu affinity of a backend runtime.

This solution reused the existing per thread thread pool logic of [Runtime][Threadpool] that
created a worker thread pool for current thread which can running a particular runtime. for a multiple
runtime use case, user can first launch multiple threads, then call "tvm::runtime::threading ::Configure"
with cpu list to create tvm data flow worker thread pool, after doing this the execution of the multiple
runtime on the multiple threads will use different cpu resource list.

* fix windows build issue.

* fix build issue.

* fix build issue.

* fix windows build issue.

* fix plint issue

* polish comments.

* address review comments.

* address reivew comments.

* address review comments.

* address review comments.

Co-authored-by: hua jiang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants