[exporterhelper] fix deadlock when initializing persistent queue #7400
+59
−8
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
Fixing a potential deadlock in persistent queue initialization.
The queue maintains a channel of dummy items whose size should be equal to the queue size at all times. This makes it easier for consumers to wait when the queue is empty. During initialization, we simply add a dummy item to the channel for each queue item.
However, we also attempt to requeue items which were previously dispatched, but not sent, before this step, which actually does add dummy items to the channel as well. As a result, we could have more dummy items in the channel than there are actual items in the storage. This is normally not a big problem, as the queue will quietly discard the extraneous dummy items when it's empty.
It is, however, a problem, if this causes the dummy item channel to go over capacity during initialization. If
number_of_dispatched_items
+queue_size
>queue_capacity
, then we try to put more dummy items in the channel than we have capacity, resulting in a deadlock.The reason this problem doesn't appear in practice very often, is that if the queue is full, the dispatched items are simply discarded. I have seen it happen in combination with other storage-related problems, where it muddies the waters and makes troubleshooting more difficult.
Testing: Added a test that deadlocks without the change.