Prevent pending `run_pending_tasks` of `future::Cache` from causing busy loop in `schedule_write_op` #415

tatsuya6502 · 2024-04-10T11:15:38Z

Fixes #412.

Summary

This PR prevents the async runtime's schedulers from going into infinite busy loops in an internal schedule_write_op method when there are pending run_pending_tasks calls on the future::Cache.

This bug was introduced in v0.12.0 when the background threads were removed from future::Cache.
This bug can occur when run_pending_task method is called by user code while cache is receiving a high number of concurrent cache write operations such as insert, get_with or invalidate.
When it occurs, the schedule_write_op method will be spinning in a busy loop forever, causing high CPU usage and all other async tasks to be starved..

The fix is to replace a async_lock::RwLock used in schedule_write_op method with an event notification mechanism provided by the event-listener crate. (Note that event-listener is used by async_lock::RwLock too.)

Other Changes

This PR also does the followings:

Bump the moka version to v0.12.6.
Upgrades async-lock crate used by future::Cache to the latest version 3.3. (This is not related to the bug)

The Root Cause

When a cache write operation such as insert is performed on future::Cache, schedule_write_op method is called internally to send a write operation log to a channel. Later, an internal do_run_pending_tasks method will be called and it will receive the log from the channel. The channel is bounded so it can only hold up to a fixed number of items (logs). When the channel gets full, schedule_write_op method will fail to send a log to the channel, so it will retry to send until it succeeds.

future::Cache used async_lock::RwLock to prevent the retry loop to take all CPU time. It will wait for a read lock becomes available, so that the scheduler threads of the async runtime can do other stuff including running do_run_pending_tasks. do_run_pending_tasks acquires an exclusive write lock at the beginning, so the retry loop will be kept waiting while do_run_pending_tasks is running.

The problem with the current implementation is the retry loop will keep spinning when do_run_pending_tasks is not running because no write lock is taken. So the retry loop can occupy the scheduler thread and never yield to other async tasks until do_run_pending_tasks is started. If there are enough number of retry loop spinning in parallel, all the scheduler threads will be occupied by them, preventing do_run_pending_tasks to start! This causes the retry loop to keep spinning forever by occupied schedulers.

The Fix

This PR fixes the problem by replacing the RwLock with an event notification mechanism provided by the event-listener crate. The retry loop in schedule_write_op will wait for an event to arrive no matter if do_run_pending_tasks is running. This ensures async schedulers not to be occupied by the retry loop in schedule_write_op, so the schedulers can run do_run_pending_tasks whenever needed.

The event will be sent when one of the following conditions meet:

The do_run_pending_tasks method has removed some logs from the channel.
The Housekeeper's run_pending_tasks or try_run_pending_tasks method has freed a lock on a Mutex called current_task.
- current_task is used to ensure only one run_pending_tasks or try_run_pending_tasks method to run at a time.
- run_pending_tasks and try_run_pending_tasks call do_run_pending_tasks.

The latter is needed to make the retry loop to spin once more. This may start do_run_pending_tasks again because the retry loop will call try_run_pending_tasks if necessary before resending the log to the channel.

NOTE: You may wonder why the spinning retry loop cannot start do_run_pending_tasks if the loop itself is calling try_run_pending_tasks. This is because if there are any run_pending_task calls waiting for current_task lock to be available, one of them will be given the right to acquire the lock, making no other to do so. Unless an async scheduler runs exactly the one having the right to acquire, do_run_pending_tasks will not start. See the starved lock in the description of #412 for the details.

Tests

Tested by the program attached to Pending run_pending_tasks may cause busy loop in schedule_write_op #412 (comment) and verified that the problem was not reproduced.
Tested by running Mokabench with cargo run --release and it completed without any new issues.
- NOTE: Mokabench does not call run_pending_tasks so it will not reproduce the problem. The purpose of this test was to find any regressions.

Prevent the async runtime's schedulers from going into infinite busy loops when there are pending `run_pending_tasks` calls on the `future::Cache`. This is done by replacing a `RwLock` used in an internal `schedule_write_op` method with an event notification.

tatsuya6502 · 2024-04-10T11:20:11Z

The CI for Miri tests are failing. It is not related to this PR and we will track it by #414.

src/future/base_cache.rs

Simplify the implementation of an internal ` notify_write_op_ch_is_ready` method.

tatsuya6502

Merging.

tatsuya6502 self-assigned this Apr 10, 2024

tatsuya6502 added the bug Something isn't working label Apr 10, 2024

tatsuya6502 added this to the v0.12.6 milestone Apr 10, 2024

tatsuya6502 mentioned this pull request Apr 10, 2024

Pending run_pending_tasks may cause busy loop in schedule_write_op #412

Closed

tatsuya6502 commented Apr 10, 2024

View reviewed changes

src/future/base_cache.rs Outdated Show resolved Hide resolved

Fix typos in comments

dc960af

tatsuya6502 changed the title ~~Prevent the busy loop of async schedulers~~ Prevent pending run_pending_tasks of future::Cache from causing busy loop in schedule_write_op Apr 10, 2024

tatsuya6502 commented Apr 11, 2024

View reviewed changes

src/future/base_cache.rs Outdated Show resolved Hide resolved

tatsuya6502 added 3 commits April 11, 2024 09:21

Prevent the busy loop of async schedulers

8805940

Simplify the implementation of an internal ` notify_write_op_ch_is_ready` method.

Bump the version to v0.12.6

6c4f0ac

Update the changelog

c50d186

tatsuya6502 commented Apr 11, 2024

View reviewed changes

tatsuya6502 merged commit 2f23e5c into main Apr 11, 2024
46 of 47 checks passed

tatsuya6502 deleted the avoid-async-scheduler-busy-loop branch April 11, 2024 10:22

Swatinem mentioned this pull request Apr 16, 2024

Do a cargo update getsentry/symbolicator#1439

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent pending `run_pending_tasks` of `future::Cache` from causing busy loop in `schedule_write_op` #415

Prevent pending `run_pending_tasks` of `future::Cache` from causing busy loop in `schedule_write_op` #415

tatsuya6502 commented Apr 10, 2024 •

edited

Loading

tatsuya6502 commented Apr 10, 2024

tatsuya6502 left a comment

Prevent pending run_pending_tasks of future::Cache from causing busy loop in schedule_write_op #415

Prevent pending run_pending_tasks of future::Cache from causing busy loop in schedule_write_op #415

Conversation

tatsuya6502 commented Apr 10, 2024 • edited Loading

Summary

Other Changes

The Root Cause

The Fix

Tests

tatsuya6502 commented Apr 10, 2024

tatsuya6502 left a comment

Choose a reason for hiding this comment

Prevent pending `run_pending_tasks` of `future::Cache` from causing busy loop in `schedule_write_op` #415

Prevent pending `run_pending_tasks` of `future::Cache` from causing busy loop in `schedule_write_op` #415

tatsuya6502 commented Apr 10, 2024 •

edited

Loading