fix(mpsc): fix a deadlock in async send_ref #20

hawkw · 2021-12-10T19:26:11Z

This fixes a deadlock issue in the async MPSC's send_ref method. The
deadlock occurs when a new waker needs to be registered for a task whose
wait node is already in the wait queue. Previously, the new waker would
not be registered because the waker registering closure was only called
when the node was being enqueued. If the node was already in the queue,
polling the future would never touch the waker. This means that if the
task was polled with a new waker, it would leave its old waker in the
queue, and might never be notified again.

This branch fixes that by separating pushing the task and registering
the waker. We check if the node already has a waker prior to registering,
and if it did, we don't push it again.

Signed-off-by: Eliza Weisman <[email protected]>

This fixes a deadlock issue in the async MPSC's `send_ref` method. The deadlock occurs when a new waker needs to be registered for a task whose wait node is already in the wait queue. Previously, the new waker would not be registered because the waker registering closure was only called when the node was being enqueued. If the node was already in the queue, polling the future would never touch the waker. This means that if the task was polled with a new waker, it would leave its old waker in the queue, and might never be notified again. This branch fixes that by separating pushing the task and registering the waker. The closure that registers the waker now returns a boolean indicating if the node needs to be re-queued.

Signed-off-by: Eliza Weisman <[email protected]>

This reverts commit 29f56f6.

Signed-off-by: Eliza Weisman <[email protected]>

This branch rewrites the MPSC channel wait queue implementation (again), in order to improve performance. This undoes a decently large amount of the perf regression from PR #20. In particular, I've made the following changes: * Simplified the design a bit, and reduced the number of CAS loops in both the notify and wait paths * Factored out fast paths (which touch the state variable without locking) from the notify and wait operations into separate functions, and marked them as `#[inline(always)]`. If we weren't able to perform the operation without actually touching the linked list, we call into a separate `#[inline(never)]` function that actually locks the list and performs the slow path. This means that code that uses these functions still has a function call in it, but a few instructions for performing a CAS can be inlined and the function call avoided in some cases. This *significantly* improves performance! * Separated the `wait` function into `start_wait` (called the first time a node waits) and `continue_wait` (called if the node is woken, to handle spurious wakeups). This allows simplifying the code for modifying the waker so that we don't have to pass big closures around. * Other miscellaneous optimizations, such as cache padding some variables that should have been cache padded. ## Performance Comparison These benchmarks were run against the current `main` branch (f77d534). ### async/mpsc_reusable ``` async/mpsc_reusable/ThingBuf/10 time: [43.953 us 44.522 us 45.057 us] change: [+0.0419% +1.7594% +3.5099%] (p = 0.05 < 0.05) Change within noise threshold. Found 5 outliers among 100 measurements (5.00%) 1 (1.00%) low severe 2 (2.00%) low mild 1 (1.00%) high mild 1 (1.00%) high severe async/mpsc_reusable/ThingBuf/50 time: [140.91 us 142.24 us 143.53 us] change: [-31.201% -29.539% -27.824%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high mild async/mpsc_reusable/ThingBuf/100 time: [250.31 us 255.03 us 259.68 us] change: [-18.966% -17.190% -15.202%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe ``` ### async/mpsc_integer ``` async/mpsc_integer/ThingBuf/10 time: [208.99 us 215.30 us 221.32 us] change: [+0.6957% +3.8603% +6.9400%] (p = 0.02 < 0.05) Change within noise threshold. async/mpsc_integer/ThingBuf/50 time: [407.46 us 412.74 us 418.31 us] change: [-39.128% -36.567% -33.267%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 2 (2.00%) low mild 4 (4.00%) high mild 7 (7.00%) high severe async/mpsc_integer/ThingBuf/100 time: [534.35 us 541.42 us 548.91 us] change: [-44.820% -41.502% -37.120%] (p = 0.00 < 0.05) Performance has improved. Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) low mild 3 (3.00%) high mild 7 (7.00%) high severe ``` ### async/spsc/try_send_reusable ``` async/spsc/try_send_reusable/ThingBuf/100 time: [12.310 us 12.353 us 12.398 us] thrpt: [8.0656 Melem/s 8.0952 Melem/s 8.1236 Melem/s] change: time: [-7.5146% -7.1996% -6.8566%] (p = 0.00 < 0.05) thrpt: [+7.3613% +7.7582% +8.1252%] Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild async/spsc/try_send_reusable/ThingBuf/500 time: [46.691 us 46.778 us 46.871 us] thrpt: [10.668 Melem/s 10.689 Melem/s 10.709 Melem/s] change: time: [-9.4767% -9.2760% -9.0811%] (p = 0.00 < 0.05) thrpt: [+9.9881% +10.224% +10.469%] Performance has improved. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high mild async/spsc/try_send_reusable/ThingBuf/1000 time: [89.763 us 90.757 us 91.843 us] thrpt: [10.888 Melem/s 11.018 Melem/s 11.140 Melem/s] change: time: [-9.4302% -8.8637% -8.2018%] (p = 0.00 < 0.05) thrpt: [+8.9346% +9.7257% +10.412%] Performance has improved. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low mild 3 (3.00%) high mild 8 (8.00%) high severe async/spsc/try_send_reusable/ThingBuf/5000 time: [415.34 us 417.89 us 420.42 us] thrpt: [11.893 Melem/s 11.965 Melem/s 12.038 Melem/s] change: time: [-13.113% -12.774% -12.411%] (p = 0.00 < 0.05) thrpt: [+14.170% +14.644% +15.093%] Performance has improved. Found 7 outliers among 100 measurements (7.00%) 7 (7.00%) high mild async/spsc/try_send_reusable/ThingBuf/10000 time: [847.35 us 848.63 us 849.98 us] thrpt: [11.765 Melem/s 11.784 Melem/s 11.802 Melem/s] change: time: [-11.345% -10.820% -10.388%] (p = 0.00 < 0.05) thrpt: [+11.592% +12.133% +12.796%] Performance has improved. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe ``` ### async/spsc/try_send_integer ``` async/spsc/try_send_integer/ThingBuf/100 time: [7.2254 us 7.2467 us 7.2690 us] thrpt: [13.757 Melem/s 13.799 Melem/s 13.840 Melem/s] change: time: [-13.292% -12.912% -12.520%] (p = 0.00 < 0.05) thrpt: [+14.312% +14.826% +15.330%] Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild async/spsc/try_send_integer/ThingBuf/500 time: [34.358 us 34.477 us 34.582 us] thrpt: [14.458 Melem/s 14.503 Melem/s 14.553 Melem/s] change: time: [-18.539% -18.312% -18.072%] (p = 0.00 < 0.05) thrpt: [+22.058% +22.417% +22.758%] Performance has improved. async/spsc/try_send_integer/ThingBuf/1000 time: [69.107 us 69.273 us 69.434 us] thrpt: [14.402 Melem/s 14.436 Melem/s 14.470 Melem/s] change: time: [-17.759% -17.604% -17.444%] (p = 0.00 < 0.05) thrpt: [+21.130% +21.365% +21.594%] Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild async/spsc/try_send_integer/ThingBuf/5000 time: [349.44 us 353.41 us 357.81 us] thrpt: [13.974 Melem/s 14.148 Melem/s 14.309 Melem/s] change: time: [-14.832% -14.252% -13.447%] (p = 0.00 < 0.05) thrpt: [+15.537% +16.621% +17.415%] Performance has improved. Found 13 outliers among 100 measurements (13.00%) 5 (5.00%) high mild 8 (8.00%) high severe async/spsc/try_send_integer/ThingBuf/10000 time: [712.89 us 732.58 us 754.24 us] thrpt: [13.258 Melem/s 13.650 Melem/s 14.027 Melem/s] change: time: [-16.082% -15.161% -14.129%] (p = 0.00 < 0.05) thrpt: [+16.454% +17.870% +19.164%] Performance has improved. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe ``` I'm actually not really sure why this also improved the `try_send` benchmarks, which don't touch the wait queue...but I'll take it! Signed-off-by: Eliza Weisman <[email protected]>

hawkw added 9 commits December 10, 2021 10:09

test: fix missing test_dbg macros

b327f5e

Signed-off-by: Eliza Weisman <[email protected]>

cleanup (impls don't need to know about requeues)

733cbe5

Signed-off-by: Eliza Weisman <[email protected]>

fix nodes getting re-pushed when they shouldn't be

29f56f6

Signed-off-by: Eliza Weisman <[email protected]>

Revert "fix nodes getting re-pushed when they shouldn't be"

567f5a5

This reverts commit 29f56f6.

this fixes both the deadlock *and* dangling nodes

f51a6f6

Signed-off-by: Eliza Weisman <[email protected]>

shorten notify critical section a bit

e70c2d4

Signed-off-by: Eliza Weisman <[email protected]>

shorten notify critical sections, but safely

f6a270d

Signed-off-by: Eliza Weisman <[email protected]>

simplify unneeded CASes in wait

8788df1

Signed-off-by: Eliza Weisman <[email protected]>

hawkw merged commit c58c620 into main Dec 11, 2021

hawkw mentioned this pull request Dec 24, 2021

perf(mpsc): rewrite and optimize wait queue #22

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mpsc): fix a deadlock in async send_ref #20

fix(mpsc): fix a deadlock in async send_ref #20

hawkw commented Dec 10, 2021

fix(mpsc): fix a deadlock in async send_ref #20

fix(mpsc): fix a deadlock in async send_ref #20

Conversation

hawkw commented Dec 10, 2021