Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix][Runtime] Fix sched_setaffinity in Android #11599

Closed

Conversation

sunjiweiswift
Copy link
Contributor

@sunjiweiswift sunjiweiswift commented Jun 7, 2022

sched_setaffinity should use pid_t instread of pthread_t
https://man7.org/linux/man-pages/man2/sched_setaffinity.2.html

cc @areusch

@github-actions github-actions bot requested a review from areusch June 7, 2022 09:59
@sunjiweiswift
Copy link
Contributor Author

@areusch

@areusch
Copy link
Contributor

areusch commented Jun 9, 2022

i think @huajsj might have context around this. @sunjiweiswift could you update the PR title and body as well similar to the other PR?

@sunjiweiswift sunjiweiswift changed the title [Runtime][threadpool]bugfix [bugfix] sched_setaffinity in Android Jun 10, 2022
@sunjiweiswift sunjiweiswift changed the title [bugfix] sched_setaffinity in Android [Bugfix] sched_setaffinity in Android Jun 10, 2022
@sunjiweiswift sunjiweiswift changed the title [Bugfix] sched_setaffinity in Android [Bugfix][Runtime] sched_setaffinity in Android Jun 10, 2022
@sunjiweiswift sunjiweiswift changed the title [Bugfix][Runtime] sched_setaffinity in Android [Bugfix][Runtime] Fix sched_setaffinity in Android Jun 10, 2022
Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@supersat or @Lunderberg maybe either of you wanna look at this too?

@@ -164,9 +172,13 @@ class ThreadGroup::Impl {
CPU_SET(id, &cpuset);
}
#if defined(__ANDROID__)
sched_setaffinity(thread, sizeof(cpu_set_t), &cpuset);
if (sched_setaffinity(tid, sizeof(cpu_set_t), &cpuset) == -1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per the man page we should check for != 0 here, correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The man page also specifies that it returns -1 on failure, but that's for glibc. I looked at the bionic (Android C runtime) sources, and it also returns -1 on failure. Bionic tests seem to use a mix of != 0 and == -1.

@@ -164,9 +172,13 @@ class ThreadGroup::Impl {
CPU_SET(id, &cpuset);
}
#if defined(__ANDROID__)
sched_setaffinity(thread, sizeof(cpu_set_t), &cpuset);
if (sched_setaffinity(tid, sizeof(cpu_set_t), &cpuset) == -1) {
LOG(WARNING) << "SetThreadAffinity fail!";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible to obtain more detail from strerror() or similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good advice.

@supersat
Copy link
Contributor

It feels a little weird doing something different on regular Linux vs. Android. Can we get rid of the pthread_setaffinity_np call and just use sched_setaffinity on Linux? I think we could get rid of the duplicate thread argument that way. The only reason to keep it is if we supported SetThreadAffinity on platforms other than Linux (and had to use std::thread::native_handle_type) but we don't.

@sunjiweiswift
Copy link
Contributor Author

@supersat Can we get rid of the pthread_setaffinity_np call and just use sched_setaffinity on Linux? I think we can just use seched_setaffinity on Linux and Android

@@ -220,7 +235,7 @@ class ThreadGroup::Impl {
} else {
core_id = sorted_order_[i + exclude_worker0];
}
SetThreadAffinity(threads_[i].native_handle(), {core_id});
SetThreadAffinity(threads_[i].native_handle(), threads_tid_[i], {core_id});
Copy link
Contributor

@huajsj huajsj Jun 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part seems like have a timing issue, threads_tid_ shared by main thread and worker threads, current logic not guarantee that the threads_tid_[i] is valid when main thread to read it. I think we should can reproduce such issue by adding a debug "sleep" between line 118 and line 119

@areusch
Copy link
Contributor

areusch commented Jun 21, 2022

looks like some of the hexagon tests are segfaulting now, e.g.:

tests/python/contrib/test_hexagon/topi/test_avg_pool2d_slice.py::TestAvgPool2dSlice::test_avg_pool2d_slice[nhwc-8h2w32c2w-2d-stride6-kernel6-float16-dilation6-padding6-True-nhwc-8h2w32c2w-2d-output_shape6-False]

anyone able to look?

threads_.emplace_back([worker_callback, i] { worker_callback(i); });
threads_.emplace_back([worker_callback, i, this] {
#ifndef __hexagon__
SetTid();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetTid running in a worker thread, that means the execution order is random, but currently logic assume it follow a order to execution and this would cause logic issue.

we may need to figure out if there is any better and light weight solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to unify Linux and Android. Providing pid_t and p_thread at the same time is not a good modification.

Copy link
Contributor

@huajsj huajsj Jun 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with that we should unify the interface ,but we may need to fix the bugs first then step in the unify interface change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add yeild to wait settid

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How the "yield" can help disorder issue in threads_tid_ ? there are two issue in this solution #1 threads_tid_[i] invalid, #2 threads_tid_[i] is not the tid of i thread(disorder issue). the line 112 problem is #2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should bind tid to i. Push_back should not be used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunjiweiswift sunjiweiswift requested a review from areusch July 1, 2022 08:41
#endif
}
void SetTid(size_t index) { threads_tid_[index] = Tid(); }
pid_t GetTid(size_t thread_index) { return threads_tid_[thread_index]; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This not fix the timing issue, that means SetTid may happen after GetTid, and threads_tid_[thread_index] will point to a random value or a '0' value depend on the compiler settings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If getid() is called before settid(). Gettid will return 0. Call sched_setaffinity will return -1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returns -1, indicating that the affinity setting failed. In line with expectations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returns -1, indicating that the affinity setting failed. In line with expectations

this sounds not make sense, why "affinity setting failed" is " In line with expectations", why need to expect "affinity setting failed"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetTid after Gettid means we set affinity for a nonexistent thread. I think should return -1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"SetTid after Gettid" not means "we set affinity for a nonexistent thread", thread already there it just not yet execute the "SetTid" function.

@areusch areusch added needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it and removed needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it labels Oct 19, 2022
@tqchen tqchen closed this Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants