-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Runtime][ThreadPool]Refactor affinity function and support CPU affinity list setting. #9802
Conversation
d69d1f5
to
a69aa58
Compare
2da18e5
to
52ba194
Compare
9ed8806
to
b73fc0e
Compare
b73fc0e
to
40bf993
Compare
…nity list setting. Issue: 1. There are multiple affinity function using "LINUX" and "ANDROID" macro check and the multiple check make the logic maintain and change become complex. 2. Current logic of tvm [Runtime][ThreadPool] assume all of the cpu resources are available for a single backend runtime to do the data flow computation. But such assumption may not true when user running multiple task on the system and not want tvm task exhaust all of the cpu resource, or when user going to run multiple backend runtime of tvm on the system, each backend runtime of tvm should use different cpu affinity settings to achieve best performance. Solution: 1.Refactor the affinity functions to move the "LINUX" and "ANDROID" check into one function. 2.In this solution, we introduce a new "CPU AffinityMode type" named "kSpecify", by using "kSpecify" and the function named "tvm::runtime::threading ::Configure" user can specify the cpu list for the cpu affinity of a backend runtime. This solution reused the existing per thread thread pool logic of [Runtime][Threadpool] that created a worker thread pool for current thread which can running a particular runtime. for a multiple runtime use case, user can first launch multiple threads, then call "tvm::runtime::threading ::Configure" with cpu list to create tvm data flow worker thread pool, after doing this the execution of the multiple runtime on the multiple threads will use different cpu resource list.
if (args.num_args == 3) { | ||
Array<String> cpu_array = args[2]; | ||
for (auto cpu : cpu_array) { | ||
cpus.push_back(std::stoi(cpu)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify that the string represents a valid integer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you want to do is to restrict tvm thread run on specific cpu core ids? If so, how to handle Python's API interface? For example:
from tvm._ffi import get_global_func
config_threadpool = get_global_func('runtime.config_threadpool')
core_ids = (0, 1, 2)
config_threadpool(0, 1, *core_ids)
?
And how to handle the specific core ids is greater than the 2nd argument(i.e. how many threads to lauch)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @FrozenGene for the follow up, if the core ids is greater than threads number, all the threads will be set the affinity with all of the cpu in cpu list, at the said case, thread 0 will affinity with cpu (0, 1, 2) , the related logic in threading_backend.cc::120 - 129 line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I asked is how to handle Python's API pack syntax. If I write the code as previous, your current code can not handle, because the unpacked argument will not the type of list like C++. @huajsj
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the misunderstanding. following are the answer
What you want to do is to restrict tvm thread run on specific cpu core ids?
Yes, restrict the worker thread running on specific cpu core or cpu core groups
how to handle Python's API interface?
the supported use case like following
config_threadpool('-1', 2, ['1', '2', '3'])
And how to handle the specific core ids is greater than the 2nd argument(i.e. how many threads to lauch).
In existing logic, the second parameter is not used to determine how many worker threads to launch, it is used as
a default value about how many task in a parallel running should be used when task number not get set.
and the final value of task number is the minimize of 'max_cocurrency' and this value .
The thread launched number determine by 'max_concurrency', in our solution ,this value will be the cpu id list size.
Under this solution, for example when cpu list is ['2', '8', '9'], nthreads is 2, exclude_worker0 is true(default) following will happen.
- mode is 'kSpecifyOneCorePerThread',
1.1. 3 worker thread get launched , cpu affinity like following
T1 (2-9)
T2 (8)
T3 (9)
1.2 when run the task
task1 --> T1
task2 --> T2 - mode is 'kSpecifyThreadShareAllCore',
2.1 3 worker thread get launched , cpu affinity like following
T1 (2-9)
T2 (2-9)
T3 (2-9)
2.2 when run the task
task1 --> T1
task2 --> T2
Please help reviewing @tqchen @junrushao1994 @FrozenGene |
cc @yidawang @FrozenGene @yongwww would be great if you can help to take a look |
…ity list setting. (apache#9802) * [Runtime][ThreadPool] Refactor affinity function and support CPU affinity list setting. Issue: 1. There are multiple affinity function using "LINUX" and "ANDROID" macro check and the multiple check make the logic maintain and change become complex. 2. Current logic of tvm [Runtime][ThreadPool] assume all of the cpu resources are available for a single backend runtime to do the data flow computation. But such assumption may not true when user running multiple task on the system and not want tvm task exhaust all of the cpu resource, or when user going to run multiple backend runtime of tvm on the system, each backend runtime of tvm should use different cpu affinity settings to achieve best performance. Solution: 1.Refactor the affinity functions to move the "LINUX" and "ANDROID" check into one function. 2.In this solution, we introduce a new "CPU AffinityMode type" named "kSpecify", by using "kSpecify" and the function named "tvm::runtime::threading ::Configure" user can specify the cpu list for the cpu affinity of a backend runtime. This solution reused the existing per thread thread pool logic of [Runtime][Threadpool] that created a worker thread pool for current thread which can running a particular runtime. for a multiple runtime use case, user can first launch multiple threads, then call "tvm::runtime::threading ::Configure" with cpu list to create tvm data flow worker thread pool, after doing this the execution of the multiple runtime on the multiple threads will use different cpu resource list. * fix windows build issue. * fix build issue. * fix build issue. * fix windows build issue. * fix plint issue * polish comments. * address review comments. * address reivew comments. * address review comments. * address review comments. Co-authored-by: hua jiang <[email protected]>
PR: #7892
Tracking issue: #8596
Issue:
There are multiple affinity function using "LINUX" and "ANDROID" macro check and the multiple check make the logic maintain and change become complex.
Current logic of tvm [Runtime][ThreadPool] assume all of the cpu resources are available for
a single backend runtime to do the data flow computation. But such assumption may not
true when user running multiple task on the system and not want tvm task
exhaust all of the cpu resource, or when user going to run multiple backend
runtime of tvm on the system, each backend runtime of tvm should use different cpu
affinity settings to achieve best performance.
Solution:
Refactor the affinity functions to move the "LINUX" and "ANDROID" check into one function.
In this solution, we introduce a new "CPU AffinityMode type" named "kSpecify", by using "kSpecify" and the function named "tvm::runtime::threading ::Configure" user can specify the cpu list for the cpu affinity of a backend runtime. This solution reused the existing per thread thread pool logic of [Runtime][Threadpool] that created a worker thread pool for current thread which can running a particular runtime. for a multiple runtime use case, user can first launch multiple threads, then call "tvm::runtime::threading ::Configure" with cpu list to create tvm data flow worker thread pool, after doing this the execution of the multiple runtime on the multiple threads will use different cpu resource list .