-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RUNTIME] Fix the manual determination of cores in FillDataForMeasure #13849
Conversation
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment. Generated by tvm-bot |
void Run(int i) { | ||
int64_t chunk_size = size / num_threads; | ||
void Run(int i, int num_threads) { | ||
int64_t chunk_size = ceil(size / num_threads); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case when penv->num_task
is a pretty big number, then is it still correct to use this number as a divider for size
? And how prev->num_task
is correlate with number of threads?
Is it possible that size
is less than num_threads
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it still correct to use this number as a divider for size.
Yes and no simultaneously. With current implementation it is incorrect. There is missed check at line below int64_t st = std::min(i * chunk_size, size);
. With adding this line it will be correct.
how prev->num_task is correlate with number of threads?
It's one and the same. TVMBackendParallelLaunch API reference
Is it possible that size is less than num_threads?
Yes, it's possible.
return 0; | ||
} | ||
|
||
void Run(int i) { | ||
int64_t chunk_size = size / num_threads; | ||
void Run(int i, int num_threads) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void Run(int i, int num_threads) { | |
void Run(int i, int num_tasks) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a unit test?
@tvm-bot rerun |
1 similar comment
@tvm-bot rerun |
57cc953
to
7a7a8f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks
Motivation: Assertion failed during tuning
Error message from thread_pool.cc:295:
Check failed: num_task <= num_workers_used_ (8 vs. 1) : Request parallel sync task larger than number of threads used workers=1 request=8
Main problem description:
Tuning of the ARM Snapdragon 888 CPU architecture ends with an error above.
Suspected reason:
Incorrect (manual) determination of the number of threads. The number of threads is determined using MaxConcurrency and returns 8 threads for this architecture, but the number of actually used threads is 4. This fix urges to use automatic determination of the number of threads by passing 'zero' as 'num_threads' attribute in 'TVMBackendParallelLaunch' to avoid the abovementioned discrepancy.