ATT Notification buffer not released after reconnection #47649

carlst · 2022-07-11T17:47:50Z

Describe the bug
A buffer allocated by Bluetooth ATT (and passed on to conn) but not yet sent when a disconnect occurs is never freed.

Please also mention any information which could help others to understand
the problem you're facing:
-Added BT_DBG messages to tx_meta_data_alloc and tx_meta_data_free to track buffer allocation count

To Reproduce
The issue occurs on a custom application which is sending several Notifications per second in an environment where disconnects occur every few minutes (perhaps electrically noisy environment).

Expected behavior
Either the buffer should be freed upon disconnect, or else freed after reconnect and send operation.

Impact
Annoyance. Workaround is currently to reboot system on disconnect.

Logs and console output
In boflo2.log, the first 2 disconnects occur when no Notification data is pending. In this case the buffer alloc and free counts are either 0 or 1 as expected (see log lines after 'Client MTU 512'). Subsequent 2 disconnects occur when Notification data is pending. In these cases, the buffer alloc count increases.

In boflo3.log (after more disconnects), the system eventually hangs (no response to ATT Client MTU request).

Environment (please complete the following information):

nRF52840DK
Commit 2a8e3fe

boflo2.log
boflo3.log

The text was updated successfully, but these errors were encountered:

carlescufi · 2022-07-12T16:29:13Z

@carlst thanks for reporting. Is this a regression? Or you haven't tried older Zephyr versions?

carlst · 2022-07-12T20:19:13Z

I have not tried older Zephyr versions. I have reviewed parts of the conn module handling disconnects, and will add further logging to gather more details.

carlst · 2022-07-13T00:18:26Z

I added further logging to bluetooth/host/conn.c and captured a send_buf operation followed by a disconnect in log file boflo4_47649

The added log entries (see send_frag 2 failed and send_buf: failed) indicate failure to completely send the buffer. As the connection proceeds through disconnect state, it is marked for cleanup (the added log entries bt_conn_set_state: set cleanup : old_state disconnect-complete and conn_cleanup: enter 1) but no callback to ATT to free its buffers (e.g. conn_tx_destroy?) occurs.

Please let me know if I can provide any further information.

boflo4_47649.log
.

jori-nordic · 2022-07-15T05:37:39Z

Hmm I thought it was resolved, but I guess I didn't look hard enough. This very much looks like #43718
Edit: FYI @hermabe

hermabe · 2022-07-15T06:46:35Z

Hmm I thought it was resolved, but I guess I didn't look hard enough. This very much looks like #43718

Yes, most likely missing a call to conn_tx_destroy() somewhere. I am also wondering why the buffers are not freed on disconnect. conn_cleanup() should be called on disconnect and free all buffers.

Signed-off-by: Jonathan Rico <[email protected]>

jori-nordic · 2022-07-19T13:19:20Z

@carlst I'm having some trouble reproducing it, do you mind uploading the build/zephyr/.config file so I can use your exact configuration ? If you have proprietary Kconfigs you can just grep any lines that contain CONFIG_BT, that will be enough.

Also, do you do anything on (re)connection before sending the notifications ? e.g. MTU exchange, encryption, stuff like that.

carlst · 2022-07-20T13:47:31Z

@jori-nordic Uploaded as config_zephyr.log to (to avoid OS and browser limitations). Let me know if you need anything else.

Yes, the issue is difficult to reproduce. It requires Notification data to be in flight when a disconnect occurs. i.e. data is handed over from ATT/L2CAP buffer to Bluetooth host but network buffer(s) not fully sent. Using long Notifications (e.g. 150 bytes) with short LL data lengths (e.g. default of 27) seems to help trigger it (multiple transfers during connection event). To increase odds, I also run with the external antenna of the sender disconnected so the signal is very weak.

Upon connection, there is a request to change the MTU (to 512). No encryption.

The peripheral also makes a Connection Parameter Update Request over the L2CAP signaling channel, but only after the connection exists for ~5 seconds.

config_zephyr.log
.

jori-nordic · 2022-07-28T07:00:41Z

@carlst thanks to the config, I think I managed to reproduce it (using babblesim, which is pretty neat btw) here: main...jori-nordic:zephyr:att-buffer-leak .

I opened a proper PR to fix it. Thanks for the bug report !

edit: just thought about it, could you try to reproduce your issue with the patch from #48395 applied ? Then you can add a comment to the PR directly if that really fixes the issue.

carlst · 2022-07-28T18:21:16Z

@jori-nordic I can no longer reproduce the issue with #48395 applied. And I verified the patch code was called (and the in-flight buffer freed by ATT). A nice, clean fix.

Babblesim is indeed quite good; I have not used it much where timing is important but may do so in the future.

Symptoms were that a disconnect happening when a buffer was enqueued but not sent resulted in a meta-data memory leak. This is problematic because it seemed (as per the OP) that it resulted in the whole host getting locked up when a device sending long L2CAP packets was prone to disconnect-reconnect cycles (e.g. in a bad RF environment). Fixes zephyrproject-rtos#47649 Signed-off-by: Jonathan Rico <[email protected]>

Symptoms were that a disconnect happening when a buffer was enqueued but not sent resulted in a meta-data memory leak. This is problematic because it seemed (as per the OP) that it resulted in the whole host getting locked up when a device sending long L2CAP packets was prone to disconnect-reconnect cycles (e.g. in a bad RF environment). Fixes #47649 Signed-off-by: Jonathan Rico <[email protected]>

Symptoms were that a disconnect happening when a buffer was enqueued but not sent resulted in a meta-data memory leak. This is problematic because it seemed (as per the OP) that it resulted in the whole host getting locked up when a device sending long L2CAP packets was prone to disconnect-reconnect cycles (e.g. in a bad RF environment). Fixes zephyrproject-rtos#47649 Signed-off-by: Jonathan Rico <[email protected]>

Symptoms were that a disconnect happening when a buffer was enqueued but not sent resulted in a meta-data memory leak. This is problematic because it seemed (as per the OP) that it resulted in the whole host getting locked up when a device sending long L2CAP packets was prone to disconnect-reconnect cycles (e.g. in a bad RF environment). Fixes #47649 Signed-off-by: Jonathan Rico <[email protected]> (cherry picked from commit 00709e5)

carlst added the bug The issue is a bug, or the PR is fixing a bug label Jul 11, 2022

cvinayak added priority: medium Medium impact/importance bug area: Bluetooth area: Bluetooth Host Bluetooth Host (excluding BR/EDR) labels Jul 12, 2022

carlescufi assigned alwa-nordic and jori-nordic Jul 12, 2022

jori-nordic unassigned alwa-nordic Jul 15, 2022

jori-nordic added a commit to jori-nordic/zephyr that referenced this issue Jul 15, 2022

Bluetooth: ATT: [DMM] reproduce buffer leak zephyrproject-rtos#47649

9a0a004

Signed-off-by: Jonathan Rico <[email protected]>

jori-nordic mentioned this issue Jul 28, 2022

Bluetooth: host: add missing meta data destroy callback #48395

Merged

fabiobaltieri closed this as completed in #48395 Aug 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ATT Notification buffer not released after reconnection #47649

ATT Notification buffer not released after reconnection #47649

carlst commented Jul 11, 2022

carlescufi commented Jul 12, 2022

carlst commented Jul 12, 2022

carlst commented Jul 13, 2022

jori-nordic commented Jul 15, 2022 •

edited

Loading

hermabe commented Jul 15, 2022

jori-nordic commented Jul 19, 2022

carlst commented Jul 20, 2022 •

edited

Loading

jori-nordic commented Jul 28, 2022 •

edited

Loading

carlst commented Jul 28, 2022

ATT Notification buffer not released after reconnection #47649

ATT Notification buffer not released after reconnection #47649

Comments

carlst commented Jul 11, 2022

carlescufi commented Jul 12, 2022

carlst commented Jul 12, 2022

carlst commented Jul 13, 2022

jori-nordic commented Jul 15, 2022 • edited Loading

hermabe commented Jul 15, 2022

jori-nordic commented Jul 19, 2022

carlst commented Jul 20, 2022 • edited Loading

jori-nordic commented Jul 28, 2022 • edited Loading

carlst commented Jul 28, 2022

jori-nordic commented Jul 15, 2022 •

edited

Loading

carlst commented Jul 20, 2022 •

edited

Loading

jori-nordic commented Jul 28, 2022 •

edited

Loading