-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bluetooth: fix HCI command / data deadlock #52536
Bluetooth: fix HCI command / data deadlock #52536
Conversation
5bdfaef
to
38bc060
Compare
09de2ec
to
805e2a3
Compare
/* Get next ACL packet for connection. The buffer will only get dequeued | ||
* if there is a free controller buffer to put it in. | ||
*/ | ||
buf = k_fifo_peek_head(&conn->tx_queue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this is that you're skipping the fragment handling that net_buf_get()
takes care of. We don't (AFAIK) currently take advantage of the net_buf fragment feature in the Bluetooth subsystem, but at least we've used the proper net_buf APIs until now that should keep the code working if we did use fragments.
At the very least, in addition to asserting that buf != NULL, you should also BT_ASSERT((buf->flags & NET_BUF_FRAGS) == 0);
. Though even that's pretty unclean, since these are really net_buf-internal features.
A cleaner way would be to introduce a new net_buf_peek()
API that does the same as net_buf_get()
but also fetches any pending fragments.
Yet another, perhaps more long term solution, would be to finally move the net_buf fragments list into it's own independent list that's not mangled together with the linked list pointer as part of the same union (which is the reason for the NET_BUF_FRAGS flag and why fragments are placed as independent objects into the FIFO). Currently the net_buf definition starts off like this:
struct net_buf {
union {
/** Allow placing the buffer into sys_slist_t */
sys_snode_t node;
/** Fragments associated with this buffer. */
struct net_buf *frags;
};
The penalty of moving frags
out from the union would be that the struct size increases by the size of a pointer, but the benefit would be that we don't have this hassle of serializing & deserializing fragments whenever buffers pass through LIFO or FIFO objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the very least, in addition to asserting that buf != NULL, you should also
BT_ASSERT((buf->flags & NET_BUF_FRAGS) == 0);
. Though even that's pretty unclean, since these are really net_buf-internal features.
Oh, and in addition to the assert, you need to explicitly set buf->frags = NULL
since the k_fifo implementation leaves the linked list pointer untouched when removing from the list.
Also see #38830 (review) and related discussion in the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is that really a problem since I'm using the _get
before passing it to bt_send
? That takes care of the fragments, and the peek is a non-destructive operation afaik.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the very least you need to clear the buf->frags
pointer since k_fifo_peek_head()
doesn't do that for you. You have the EIO
branch below where e.g. net_buf_unref()
is called, and net_buf_unref()
will also unref the frags pointer if it is non-NULL. Same goes for any other net_buf API calls you might do on the buffer before calling net_buf_get()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, for correctness, it would be better to get a proper reference to the buffer when peeking, i.e. call net_buf_ref()
on the pointer. Then, when you finally call net_buf_ger()
the code would look something like the following (i.e.no need for the dequeued
variable):
net_buf_unref(buf);
buf = net_buf_get(...);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and in addition to the assert, you need to explicitly set buf->frags = NULL since the k_fifo implementation leaves the linked list pointer untouched when removing from the list.
I just tried and it didn't work. I think it's because buf->frags
is in a union with the node ptr.
So if I overwrite it with NULL
while it is still on the queue, and it ends up that I don't pull it from the fifo (e.g. the sem_take fails), the fifo list is effectively broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, good point. And this is another reason why using peak() together with the union is bad - you end up with the buffer in an inconsistent state until you actually remove it from from the FIFO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The short term solution would then probably be to document this clearly and try to make the code flow path from peek() to get() as short and obvious as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep agreed. I will place a comment above the peek
saying this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I reverted the assert since #52760 got merged.
subsys/bluetooth/host/conn.c
Outdated
/* De-queue the buffer now that we know we can send it. | ||
* Only applies if the buffer to be sent is the original buffer, | ||
* and not one of its fragments. | ||
*/ | ||
struct net_buf *dequeued = net_buf_get(&conn->tx_queue, K_NO_WAIT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code comment should explicitly mention that the buffer was fetched from the FIFO using a "peek" operation.
subsys/bluetooth/host/conn.c
Outdated
#endif /* CONFIG_BT_CONN */ | ||
} | ||
|
||
static int _send_frag(struct bt_conn *conn, struct net_buf *buf, uint8_t flags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be static int do_send_frag()
1de1b7e
to
33185db
Compare
33185db
to
d813310
Compare
d813310
to
6e49d67
Compare
@jhedberg could you review again? |
BT_ASSERT(buf); | ||
if (!send_buf(conn, buf)) { | ||
|
||
buf = net_buf_ref(buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a code comment above this, something along the lines of "Since we used peek(), the queue still owns the reference to the buffer, so we need to take an explicit additional reference here.
6e49d67
to
4850cce
Compare
@runegrunnet could you please take a look? |
Make the ACL fragmentation asynchronous, freeing the TX thread for sending commands when the ACL buffers are full. Signed-off-by: Jonathan Rico <[email protected]>
When there are no buffers, it doesn't make sense to repeatedly try to send the host TX queue. Signed-off-by: Jonathan Rico <[email protected]>
Nice, glad to hear! Sadly, I am currently not working with BLE and unable to review this in detail. Thanks for mentioning me though, I am pleased to see the progress :) |
Only copy the data from the 'parent' buf into the segment buf if we successfully acquired the HCI semaphore (ie there are available controller buffers). This way we avoid pulling and pushing back the data in the case where there aren't any controller buffers available. Signed-off-by: Jonathan Rico <[email protected]>
Do these things: - use the proper macros for reserving the SDU header - make every L2CAP PDU fragment into 3 ACL packets - set tx data and verify rx matches the pattern - measure segment pool usage Signed-off-by: Jonathan Rico <[email protected]>
4850cce
to
e205bd9
Compare
@@ -372,6 +386,11 @@ static void test_central_main(void) | |||
LOG_DBG("*L2CAP STRESS Central started*"); | |||
int err; | |||
|
|||
/* Prepare tx_data */ | |||
for (size_t i = 0; i < sizeof(tx_data); i++) { | |||
tx_data[i] = (uint8_t)i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type of the variable being assigned to (tx_data[i]
) is already uint8_t, so truncation happens either way, ie the explicit typecast is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at least with the cast it's visible that it's on purpose and not a mistake
See commits for more info.
TLDR: makes the ACL fragmentation asynchronous, freeing the TX thread for sending commands when the ACL buffers are full.
Fixes #25917