-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deadlock race condition on compression shutdown #616
Fix deadlock race condition on compression shutdown #616
Conversation
1c3da2a
to
1a6dc0b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to wrap sections where compression_is_running_
is assigned with a lock on the same mutex?
rosbag2_compression/src/rosbag2_compression/sequential_compression_writer.cpp
Show resolved
Hide resolved
9971460
to
a9e435c
Compare
rosbag2_compression/src/rosbag2_compression/sequential_compression_writer.cpp
Show resolved
Hide resolved
f09345a
to
32519d6
Compare
Signed-off-by: Emerson Knapp <[email protected]>
Signed-off-by: Emerson Knapp <[email protected]>
Signed-off-by: Emerson Knapp <[email protected]>
Signed-off-by: Emerson Knapp <[email protected]>
32519d6
to
3bf0fc2
Compare
Gist: https://gist.githubusercontent.com/emersonknapp/1d3e63e26efdc9f1596e750454471f19/raw/198dbe7ad4255058e89eefccdfb923da7ff52b7e/ros2.repos |
* Synchronize compression shutdown correctly, avoiding occasional deadlock Signed-off-by: Emerson Knapp <[email protected]>
* Synchronize compression shutdown correctly, avoiding occasional deadlock Signed-off-by: Emerson Knapp <[email protected]>
When running the test
test_sequential_compression_writer
repeatedly, the test sometimes hangs on shutdown, due to a gap in the threading synchronization between the compressor threads and the main thread.The order of operations is:
compressor_thread_fn
readswhile(compression_is_running_)
to betrue
, so it enters the while blockstop_compressor_threads
runscompression_is_running_=false; compressor_condition_.notify_all()
, notifying any currently waiting compression threads to wake up, which lets them exit after finding no work to do.join()
on all the threadscompressor_thread_fn
, having entered the while loop successfully, initializes some variables, acquires thecompressor_queue_mutex_
, and callscompressor_condition_.wait()
wait
after the finalnotify_all
, it will never receive a notification, and the.join()
call waits forever for this waiting thread to exitThis PR fixes the problem by synchronizing the checks of both exit conditions so that the main thread can't interleave the notifications in a way that causes the deadlock.
NOTE: I am not sure how to add a repeatable regression test for this - the way I can reproduce it is by running the following, which reliably makes it happen quite quickly (within a second). After my change it'll run indefinitely without failure.