-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix][Runtime] Fix sched_setaffinity in Android #11599
[Bugfix][Runtime] Fix sched_setaffinity in Android #11599
Conversation
i think @huajsj might have context around this. @sunjiweiswift could you update the PR title and body as well similar to the other PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@supersat or @Lunderberg maybe either of you wanna look at this too?
@@ -164,9 +172,13 @@ class ThreadGroup::Impl { | |||
CPU_SET(id, &cpuset); | |||
} | |||
#if defined(__ANDROID__) | |||
sched_setaffinity(thread, sizeof(cpu_set_t), &cpuset); | |||
if (sched_setaffinity(tid, sizeof(cpu_set_t), &cpuset) == -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per the man page we should check for != 0 here, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The man page also specifies that it returns -1 on failure, but that's for glibc. I looked at the bionic (Android C runtime) sources, and it also returns -1 on failure. Bionic tests seem to use a mix of != 0 and == -1.
src/runtime/threading_backend.cc
Outdated
@@ -164,9 +172,13 @@ class ThreadGroup::Impl { | |||
CPU_SET(id, &cpuset); | |||
} | |||
#if defined(__ANDROID__) | |||
sched_setaffinity(thread, sizeof(cpu_set_t), &cpuset); | |||
if (sched_setaffinity(tid, sizeof(cpu_set_t), &cpuset) == -1) { | |||
LOG(WARNING) << "SetThreadAffinity fail!"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possible to obtain more detail from strerror() or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good advice.
It feels a little weird doing something different on regular Linux vs. Android. Can we get rid of the pthread_setaffinity_np call and just use sched_setaffinity on Linux? I think we could get rid of the duplicate thread argument that way. The only reason to keep it is if we supported SetThreadAffinity on platforms other than Linux (and had to use std::thread::native_handle_type) but we don't. |
@supersat Can we get rid of the pthread_setaffinity_np call and just use sched_setaffinity on Linux? I think we can just use seched_setaffinity on Linux and Android |
src/runtime/threading_backend.cc
Outdated
@@ -220,7 +235,7 @@ class ThreadGroup::Impl { | |||
} else { | |||
core_id = sorted_order_[i + exclude_worker0]; | |||
} | |||
SetThreadAffinity(threads_[i].native_handle(), {core_id}); | |||
SetThreadAffinity(threads_[i].native_handle(), threads_tid_[i], {core_id}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part seems like have a timing issue, threads_tid_ shared by main thread and worker threads, current logic not guarantee that the threads_tid_[i] is valid when main thread to read it. I think we should can reproduce such issue by adding a debug "sleep" between line 118 and line 119
looks like some of the hexagon tests are segfaulting now, e.g.:
anyone able to look? |
src/runtime/threading_backend.cc
Outdated
threads_.emplace_back([worker_callback, i] { worker_callback(i); }); | ||
threads_.emplace_back([worker_callback, i, this] { | ||
#ifndef __hexagon__ | ||
SetTid(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SetTid running in a worker thread, that means the execution order is random, but currently logic assume it follow a order to execution and this would cause logic issue.
we may need to figure out if there is any better and light weight solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to unify Linux and Android. Providing pid_t and p_thread at the same time is not a good modification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with that we should unify the interface ,but we may need to fix the bugs first then step in the unify interface change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add yeild to wait settid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should bind tid to i. Push_back should not be used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#endif | ||
} | ||
void SetTid(size_t index) { threads_tid_[index] = Tid(); } | ||
pid_t GetTid(size_t thread_index) { return threads_tid_[thread_index]; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This not fix the timing issue, that means SetTid may happen after GetTid, and threads_tid_[thread_index] will point to a random value or a '0' value depend on the compiler settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If getid() is called before settid(). Gettid will return 0. Call sched_setaffinity will return -1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returns -1, indicating that the affinity setting failed. In line with expectations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returns -1, indicating that the affinity setting failed. In line with expectations
this sounds not make sense, why "affinity setting failed" is " In line with expectations", why need to expect "affinity setting failed"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SetTid after Gettid means we set affinity for a nonexistent thread. I think should return -1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"SetTid after Gettid" not means "we set affinity for a nonexistent thread", thread already there it just not yet execute the "SetTid" function.
sched_setaffinity should use pid_t instread of pthread_t
https://man7.org/linux/man-pages/man2/sched_setaffinity.2.html
cc @areusch