Prevent pending run_pending_tasks
of future::Cache
from causing busy loop in schedule_write_op
#415
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #412.
Summary
This PR prevents the async runtime's schedulers from going into infinite busy loops in an internal
schedule_write_op
method when there are pendingrun_pending_tasks
calls on thefuture::Cache
.v0.12.0
when the background threads were removed fromfuture::Cache
.run_pending_task
method is called by user code while cache is receiving a high number of concurrent cache write operations such asinsert
,get_with
orinvalidate
.schedule_write_op
method will be spinning in a busy loop forever, causing high CPU usage and all other async tasks to be starved..The fix is to replace a
async_lock::RwLock
used inschedule_write_op
method with an event notification mechanism provided by theevent-listener
crate. (Note thatevent-listener
is used byasync_lock::RwLock
too.)Other Changes
This PR also does the followings:
moka
version tov0.12.6
.async-lock
crate used byfuture::Cache
to the latest version 3.3. (This is not related to the bug)The Root Cause
When a cache write operation such as
insert
is performed onfuture::Cache
,schedule_write_op
method is called internally to send a write operation log to a channel. Later, an internaldo_run_pending_tasks
method will be called and it will receive the log from the channel. The channel is bounded so it can only hold up to a fixed number of items (logs). When the channel gets full,schedule_write_op
method will fail to send a log to the channel, so it will retry to send until it succeeds.future::Cache
usedasync_lock::RwLock
to prevent the retry loop to take all CPU time. It will wait for a read lock becomes available, so that the scheduler threads of the async runtime can do other stuff including runningdo_run_pending_tasks
.do_run_pending_tasks
acquires an exclusive write lock at the beginning, so the retry loop will be kept waiting whiledo_run_pending_tasks
is running.The problem with the current implementation is the retry loop will keep spinning when
do_run_pending_tasks
is not running because no write lock is taken. So the retry loop can occupy the scheduler thread and never yield to other async tasks untildo_run_pending_tasks
is started. If there are enough number of retry loop spinning in parallel, all the scheduler threads will be occupied by them, preventingdo_run_pending_tasks
to start! This causes the retry loop to keep spinning forever by occupied schedulers.The Fix
This PR fixes the problem by replacing the
RwLock
with an event notification mechanism provided by theevent-listener
crate. The retry loop inschedule_write_op
will wait for an event to arrive no matter ifdo_run_pending_tasks
is running. This ensures async schedulers not to be occupied by the retry loop inschedule_write_op
, so the schedulers can rundo_run_pending_tasks
whenever needed.The event will be sent when one of the following conditions meet:
do_run_pending_tasks
method has removed some logs from the channel.Housekeeper
'srun_pending_tasks
ortry_run_pending_tasks
method has freed a lock on aMutex
calledcurrent_task
.current_task
is used to ensure only onerun_pending_tasks
ortry_run_pending_tasks
method to run at a time.run_pending_tasks
andtry_run_pending_tasks
calldo_run_pending_tasks
.The latter is needed to make the retry loop to spin once more. This may start
do_run_pending_tasks
again because the retry loop will calltry_run_pending_tasks
if necessary before resending the log to the channel.NOTE: You may wonder why the spinning retry loop cannot start
do_run_pending_tasks
if the loop itself is callingtry_run_pending_tasks
. This is because if there are anyrun_pending_task
calls waiting forcurrent_task
lock to be available, one of them will be given the right to acquire the lock, making no other to do so. Unless an async scheduler runs exactly the one having the right to acquire,do_run_pending_tasks
will not start. See the starved lock in the description of #412 for the details.Tests
run_pending_tasks
may cause busy loop inschedule_write_op
#412 (comment) and verified that the problem was not reproduced.cargo run --release
and it completed without any new issues.run_pending_tasks
so it will not reproduce the problem. The purpose of this test was to find any regressions.