Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bluetooth: fix HCI command / data deadlock #52536
Bluetooth: fix HCI command / data deadlock #52536
Changes from all commits
562293a
76deedb
a538cc3
e205bd9
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this is that you're skipping the fragment handling that
net_buf_get()
takes care of. We don't (AFAIK) currently take advantage of the net_buf fragment feature in the Bluetooth subsystem, but at least we've used the proper net_buf APIs until now that should keep the code working if we did use fragments.At the very least, in addition to asserting that buf != NULL, you should also
BT_ASSERT((buf->flags & NET_BUF_FRAGS) == 0);
. Though even that's pretty unclean, since these are really net_buf-internal features.A cleaner way would be to introduce a new
net_buf_peek()
API that does the same asnet_buf_get()
but also fetches any pending fragments.Yet another, perhaps more long term solution, would be to finally move the net_buf fragments list into it's own independent list that's not mangled together with the linked list pointer as part of the same union (which is the reason for the NET_BUF_FRAGS flag and why fragments are placed as independent objects into the FIFO). Currently the net_buf definition starts off like this:
The penalty of moving
frags
out from the union would be that the struct size increases by the size of a pointer, but the benefit would be that we don't have this hassle of serializing & deserializing fragments whenever buffers pass through LIFO or FIFO objects.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and in addition to the assert, you need to explicitly set
buf->frags = NULL
since the k_fifo implementation leaves the linked list pointer untouched when removing from the list.Also see #38830 (review) and related discussion in the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is that really a problem since I'm using the
_get
before passing it tobt_send
? That takes care of the fragments, and the peek is a non-destructive operation afaik.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the very least you need to clear the
buf->frags
pointer sincek_fifo_peek_head()
doesn't do that for you. You have theEIO
branch below where e.g.net_buf_unref()
is called, andnet_buf_unref()
will also unref the frags pointer if it is non-NULL. Same goes for any other net_buf API calls you might do on the buffer before callingnet_buf_get()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, for correctness, it would be better to get a proper reference to the buffer when peeking, i.e. call
net_buf_ref()
on the pointer. Then, when you finally callnet_buf_ger()
the code would look something like the following (i.e.no need for thedequeued
variable):There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried and it didn't work. I think it's because
buf->frags
is in a union with the node ptr.So if I overwrite it with
NULL
while it is still on the queue, and it ends up that I don't pull it from the fifo (e.g. the sem_take fails), the fifo list is effectively broken.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, good point. And this is another reason why using peak() together with the union is bad - you end up with the buffer in an inconsistent state until you actually remove it from from the FIFO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The short term solution would then probably be to document this clearly and try to make the code flow path from peek() to get() as short and obvious as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep agreed. I will place a comment above the
peek
saying this.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I reverted the assert since #52760 got merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a code comment above this, something along the lines of "Since we used peek(), the queue still owns the reference to the buffer, so we need to take an explicit additional reference here.