-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bluetooth: Fix various deadlock issues when running low on buffers #16870
Conversation
Added DNM label until it's verified that this actually fixes the issue and nobody is unhappy with the added HCI driver API for allocating event buffers. |
All checks are passing now. Review history of this comment for details about previous failed status. |
405e82d
to
a9bd744
Compare
} | ||
|
||
/* Insert non-discarded packets back to the RX queue */ | ||
k_fifo_put_slist(&rx.fifo, &list); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andyross asking you since you seem to have touched the k_queue code most recently. The above code lines, starting with line 205, are iterating the contents of a k_fifo with the aim to remove one element from there. In this place of the code we have a guarantee that no other thread or ISR will attempt to access the k_fifo while we do this, however in the absence of any more suitable public APIs I'm having to remove each element one-by one and then re-insert the non-discarded elements using k_fifo_put_slist()
. What I'd really want to do is the equivalent of SYS_SFLIST_FOR_EACH_NODE()
and sys_sflist_remove()
and stop iterating once a matching element is found and removed, however currently that requires accessing what I assume is private data of k_fifo (and k_queue). Would that still be ok, or any cleaner way to solve this?
This event is a priority one, so it's not safe to have it use the RX buffer pool which may be depleted due to non-priority events (e.g. advertising events). Since the event is consumed synchronously it's safe to have a single-buffer pool for it. Also introduce a new bt_buf_get_evt() API for HCI drivers to simplify the driver-side code, this effectively also deprecates bt_buf_get_cmd_complete() which now has no in-tree HCI driver users anymore. Fixes zephyrproject-rtos#16864 Signed-off-by: Johan Hedberg <[email protected]>
@jukkar I've added a net_buf patch here to make it officially ok to use some bits of buf->flags as extended user data, could you take a look? @carlescufi @joerchan since you use the H:4 driver on the nRF91 it'd be good if you also check at least the changes I'm doing to that driver. |
Through my tests, the latest patches work well on my system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to net_buf changes.
include/net/buf.h
Outdated
@@ -474,6 +474,12 @@ static inline void net_buf_simple_restore(struct net_buf_simple *buf, | |||
*/ | |||
#define NET_BUF_EXTERNAL_DATA BIT(1) | |||
|
|||
/** | |||
* Bitmask of the ower bits of buf->flags that are reserved for net_buf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: ower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix, thanks
* events from the RX queue. | ||
*/ | ||
if (IS_ENABLED(CONFIG_BT_HCI_ACL_FLOW_CONTROL) && rx.type == H4_ACL) { | ||
return get_rx(K_FOREVER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the nature of such patches is to make sure that buffer-getting functions are never called with K_FOREVER.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scenario here is that we're in the RX thread with the UART RX interrupt disabled, i.e. there's nothing else that can be done than to sit and wait for a buffer. With ACL flow control this should be guaranteed to succeed since otherwise the controller is sending us an ACL packet when we haven't given it permission (credits) to do so. Another option would be K_NO_WAIT + ASSERT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. there's nothing else that can be done than to sit and wait for a buffer.
There's always something to do, e.g. to report a warning after some period of time, that this extended wait happens.
That said, that's just a random comment from someone not familiar with the BT subsys. (In IP networking, I believe we decided that we don't want any K_FOREVER's, and it was even implemented IIRC).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For debugging purposes something that might be useful, but it should be made for the entire stack. This actually gave me a nice debug feature idea for the entire kernel: have a Kconfig option, which when enabled causes any threads using K_FOREVER sleep to wake up periodically after a certain amount of time (e.g. every minute) and announce to the system log that they're still sleeping. The kernel APIs wouldn't return, rather just go back to waiting, i.e. the semantics of the APIs being guaranteed to return non-NULL with K_FOREVER wouldn't change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For debugging purposes something that might be useful, but it should be made for the entire stack. This actually gave me a nice debug feature idea for the entire kernel: have a Kconfig option, which when enabled causes any threads using K_FOREVER sleep to wake up periodically after a certain amount of time (e.g. every minute) and announce to the system log that they're still sleeping. The kernel APIs wouldn't return, rather just go back to waiting, i.e. the semantics of the APIs being guaranteed to return non-NULL with K_FOREVER wouldn't change.
Interrestingm I was thinking something similar sometime ago, though if we just wake up the threads just to make them wait again that perhaps would change the order in the waiters list so we it might have to be done without touching the waiters list.
"Make less likely", perhaps. "Eliminate", unlikely. |
Currently net_buf only needs the lower two bits. It may be useful to let users of the API take advantage of some bits as a form of extended user data (if using user data isn't appropriate or there's no space there). Signed-off-by: Johan Hedberg <[email protected]>
Add support to iterate the RX queue and discard (discardable) packets in case the RX buffer pool is empty. This will make potential deadlock scenarios less likely. Signed-off-by: Johan Hedberg <[email protected]>
@carlescufi @Vudentz @joerchan this PR has been left hanging around - it has an informal approval from the person who reported the original issue, as well as one formal approval, however nothing from actual Bluetooth maintainers yet. |
I've read through the original issue and then through the code. I see that the new patch for number of completed packets addresses the issue with unblocking the TX thread that might be waiting on those. But what is the need for |
BT_WARN("Attempting to discard packets from RX queue"); | ||
|
||
while ((buf = net_buf_get(&rx.fifo, K_NO_WAIT))) { | ||
if (!discarded && (buf->flags & BUF_DISCARDABLE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this seems to discard only one buffer at time? If that is the correct behavior perhaps you can check if there have been anything added to the list, if there isn't anything that means the order is preserved and we can return interrupting the while loop since you don't have to call k_fifo_put_slist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite. Even if we don't find anything to discard we'll have emptied rx.fifo through the net_buf_get calls. So we anyway have to do the put_slist(). And this code is discarding at most one buffer to minimise packet loss for something like mesh that's waiting for advertising reports.
@carlescufi IIRC I explained this on Slack already, but I'll repeat here: There may be multiple pending HCI packets that the controller wants to send to the host. At the same time the host might be stuck waiting for something like a Number of Completed Packets event. If that event is the next one in the queue for the controller to send to the host then all is fine (as guaranteed by the other patch in this PR), but if there are other, non-discardable, packets before it then we have to be able to receive those packets in order to eventually receive the Number of Completed Packets event. By digging into the host RX queue we're able to free up buffers to receive the pending packets from the controller. |
@carlescufi seems I failed to address this specific question: the purpose of this initial wait is that digging into the RX queue is expensive. So we give the other threads in the system a chance to process and free up buffers before commencing our own expensive free-up operation. I.e. this wait isn't necessary, but I was thinking it might give better performance. If the wait is successful it also means we don't have to loose any data from the controller. |
Right, but where is the actual deadlock? why is a pending Number of Completed Packets, preventing us from processing other, unrelated events? Since the memory pools are separate for events and data anyway, that's the bit I don't understand. Sorry if you already explained this and I missed it. |
@carlescufi |
Yes, but how is that a deadlock? The currently allocated event buffers will be processed and freed up by the RX thread regardless of what is happening to the TX thread? |
They don't have to be processed and freed up by the RX thread. E.g. both in Mesh and L2CAP cases they may be passed to the system workqueue for processing. net_buf's in general have the whole ref/unref feature and the ability to be passed through FIFOs to other contexts. So there's no guarantee that once you've done a bt_recv() call that the buffer will have gotten freed up. Disclaimer: In all honesty I'm not 100% clear on what the precise deadlock is in #16864 or the various theoretical ways things can deadlock, but I'm fairly confident that we have to try to avoid the H:4 driver from halting the flow of packets from the controller so that we eventually get to process any packets that may unblock threads that are waiting for them. |
Thanks for explaining. I agree that this is safe and reasonable. |
@carlescufi Mesh stack sends two ACL packets in Rx thread synchronously. This action will block Rx thread before numCompleted is received. Because of the continuous reporting of advertising data packets, the Rx buffers is exhausted. Therefore, Tx thread cannot continue to report numCompleted events, resulting in deadlock. This is the call stack for Rx thread: |
@xiaoliang314 this is something I don't understand. In #16864 you said you're using the H4 driver, which enables CONFIG_BT_RECV_IS_RX_THREAD. This option in turn disables hci_rx_thread in hci_core.c, so I don't understand how you have that in your backtrace? |
@jhedberg Sorry, I may not have told you more details. On our chip, we transplanted the Bluetooth controller. The HCI driver of our controller reference the H4 driver. But the actual H4 interface is not used. |
@@ -191,7 +255,7 @@ static void rx_thread(void *p1, void *p2, void *p3) | |||
* original command buffer (if available). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this comment need updating now that there is a separate pool for num complete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. To be honest, I don't fully understand how the host stack works.
The above comment brings up an important point though: if it's the bt_recv() call that's blocking there's still an issue since it's in the same thread that we try to free up buffers, i.e. the freeing would need to be moved to some independent thread. Since the problem seems to be fairly complex I'm thinking I'll move the less controversial patches to a separate PR that could be merged first (i.e. the new one-buffer pool and the net_buf flags usage) |
Closing this PR - I'll open a new one for the num_completed pool and some patches to a different approach I'm planning to take. This also means that no change is needed to the net_buf API (the flags stuff) for now. |
The first and third patches are for fixing potential deadlocks when running low on buffers. The Number of Completed Packets patch (the first one) affects pretty much all HCI drivers, including the native controller one. The last patch only affects the H:4 HCI driver. The middle patch (second one) is to make it valid to use some of the upper bits of buf->flags as extended user data.
Fixes #16864