Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using std::hint::spin_loop, at least for Windows #11

Closed
ishitatsuyuki opened this issue Jan 11, 2022 · 5 comments
Closed

Consider using std::hint::spin_loop, at least for Windows #11

ishitatsuyuki opened this issue Jan 11, 2022 · 5 comments

Comments

@ishitatsuyuki
Copy link

The resolution of #1 was to use yield_now, which calls Sleep(0) on Windows. However, from https://randomascii.wordpress.com/2012/06/05/in-praise-of-idleness/:

Sleep(0)

If Sleep(0) switches to another thread then we have lost the CPU, paid all the cost of context switches, but we have failed to tell the operating system what we are waiting for. Thus, there is no way for the operating system to wake us up promptly when the lock is released. This means that we will likely sleep for an entire quantum, which could be 10-20 ms, and then be given a chance to try again. Meanwhile the lock we wanted will probably have been released and acquired hundreds of times and we might have to repeat the process again!

Not all of the above is relevant, but the point is that Sleep(0) either behaves as a syscall-inducing busy wait or Sleep(1) (and sometimes even worse). It's strictly worse than simply spinning, because syscalls takes longer time and if it transfers scheduling to another thread, then we'll be 100% missing the deadline.

For Linux, spinning is questionable because hrtimer seems to actually provide sub-microsecond accuracy, and the wakeup delay is only affected by timer slack, which defaults to 50us. IMO we should not spin at all on Linux, but this would be a different discussion.

@alexheretic
Copy link
Owner

Perhaps we could construct a scenario test that demonstrates hint being better to yield on Windows. If so I'd probably be happy to change that on that platform.

Regarding Linux, I would say the better sleep accuracy just means we need to spin less. The point of the crate is avoiding oversleeping and that still applies.

@ishitatsuyuki
Copy link
Author

I made a crude benchmark that basically makes the worst-case scenario for yield. It generates all-core load on the computer, which will make yield consistently give up the time slice to other threads.

ishitatsuyuki@c59197c

(This will bring the computer to a crawl.)

As expected, yield will miss the deadline frequently:

[src\main.rs:24] sum / 100 = 14.370365ms
[src\main.rs:24] sum / 100 = 9.190529ms
[src\main.rs:24] sum / 100 = 4.389404ms
[src\main.rs:24] sum / 100 = 10.933956ms
[src\main.rs:24] sum / 100 = 6.806247ms
[src\main.rs:24] sum / 100 = 4.691241ms
[src\main.rs:24] sum / 100 = 9.860078ms
[src\main.rs:24] sum / 100 = 6.923361ms
[src\main.rs:24] sum / 100 = 11.93099ms
[src\main.rs:24] sum / 100 = 11.204364ms

While spin_loop hint will wake up on time in a lot of cases since the thread is running the least amount of time and most scheduler would prioritize such thread and allow preemption.

[src\main.rs:24] sum / 100 = 171.669µs
[src\main.rs:24] sum / 100 = 347.165µs
[src\main.rs:24] sum / 100 = 147ns
[src\main.rs:24] sum / 100 = 4.653037ms
[src\main.rs:24] sum / 100 = 8.866196ms
[src\main.rs:24] sum / 100 = 15.580745ms
[src\main.rs:24] sum / 100 = 6.150587ms
[src\main.rs:24] sum / 100 = 1.15737ms
[src\main.rs:24] sum / 100 = 158ns
[src\main.rs:24] sum / 100 = 1.025694ms
[src\main.rs:24] sum / 100 = 528.963µs
[src\main.rs:24] sum / 100 = 1.360286ms
[src\main.rs:24] sum / 100 = 319ns
[src\main.rs:24] sum / 100 = 2.271µs

@alexheretic
Copy link
Owner

Thanks. From these results it does look worth switching to spin loop hint. I'll try running your test on Linux too.

@alexheretic
Copy link
Owner

I've formalised some latency experiments in #12 they support that Windows would benefit greatly from using std::hint::spin_loop instead of yielding under load. Though when not under load yielding is more efficient.

On Linux yielding seems to work well all the time. Do the changes in #12 look good to you?

@alexheretic
Copy link
Owner

I'll publish a new version around the end of the week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants