Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLE Connection fails to establish between two nRF52840-USB Dongles with Zephyr controller #29008

Closed
RoyAnupam opened this issue Oct 7, 2020 · 72 comments
Assignees
Labels
area: Bluetooth Controller area: Bluetooth bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@RoyAnupam
Copy link

RoyAnupam commented Oct 7, 2020

Describe the bug
Observation: LE Connection fails to establish between two nRF52840-USB Dongles
Device Roles:

  • Device a: GATT Server
  • Device b: GATT Client
    Reason: Host receives Disconnect Complete with Reason: Connection Timeout

To Reproduce
Steps to reproduce the behavior:
Device A: Create GATT server using bluetoothctl tool (add a GATT server 0x1834, characteristic: 0x2834, Descriptor: 0x2902)
Device B: Scan & connect Device A from bluetoothctl tool

Expected behavior
Connection should not drop.

Impact
Not able to proceed further tests

Logs and console output
HCI Dump log in GATT Central Peripheral (Advertiser)

> HCI Event: LE Meta Event (0x3e) plen 19                                                                                                                                                                            #108 [hci0] 1243.692651
      LE Connection Complete (0x01)
        Status: Success (0x00)
        Handle: 0
        Role: Slave (0x01)
        Peer address type: Random (0x01)
        Peer address: F8:F4:B9:51:03:46 (Static)
        Connection interval: 50.00 msec (0x0028)
        Connection latency: 0 (0x0000)
        Supervision timeout: 420 msec (0x002a)
        Master clock accuracy: 0x07
@ MGMT Event: Device Connected (0x000b) plen 13                                                                                                                                                                  {0x0001} [hci0] 1243.692706
        LE Address: F8:F4:B9:51:03:46 (Static)
        Flags: 0x00000000
        Data length: 0
< HCI Command: LE Read Remote Used Features (0x08|0x0016) plen 2                                                                                                                                                     #109 [hci0] 1243.692810
        Handle: 0
> HCI Event: LE Meta Event (0x3e) plen 4                                                                                                                                                                             #110 [hci0] 1243.693564
      LE Channel Selection Algorithm (0x14)
        Handle: 0
        Algorithm: #2 (0x01)
> HCI Event: Command Status (0x0f) plen 4                                                                                                                                                                            #111 [hci0] 1243.694562
      LE Read Remote Used Features (0x08|0x0016) ncmd 1
        Status: Success (0x00)
@ RAW Open: hciconfig (privileged) version 2.22                                                                                                                                                                         {0x0002} 1243.697054
@ RAW Close: hciconfig                                                                                                                                                                                                  {0x0002} 1243.697066
> HCI Event: LE Meta Event (0x3e) plen 12                                                                                                                                                                            #112 [hci0] 1243.893602
      LE Read Remote Used Features (0x04)
        Status: Success (0x00)
        Handle: 0
        Features: 0xff 0x59 0x01 0x00 0x00 0x00 0x00 0x00
          LE Encryption
          Connection Parameter Request Procedure
          Extended Reject Indication
          Slave-initiated Features Exchange
          LE Ping
          LE Data Packet Length Extension
          LL Privacy
          Extended Scanner Filter Policies
          LE 2M PHY
          LE Coded PHY
          LE Extended Advertising
          Channel Selection Algorithm #2
          Minimum Number of Used Channels Procedure
= bluetoothd: src/device.c:load_gatt_db() No cache for F8:F4:B9:51:03:46                                                                                                                                                         1243.894308
< ACL Data TX: Handle 0 flags 0x00 dlen 7                                                                                                                                                                            #113 [hci0] 1243.894349
      ATT: Exchange MTU Request (0x02) len 2
        Client RX MTU: 517
> HCI Event: Number of Completed Packets (0x13) plen 5                                                                                                                                                               #114 [hci0] 1244.144576
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 9                                                                                                                                                                            #115 [hci0] 1244.144721
      ATT: Error Response (0x01) len 4
        Exchange MTU Request (0x02)
        Handle: 0x0000
        Error: Request Not Supported (0x06)
< ACL Data TX: Handle 0 flags 0x00 dlen 11                                                                                                                                                                           #116 [hci0] 1244.144950
      ATT: Read By Type Request (0x08) len 6
        Handle range: 0x0001-0xffff
        Attribute type: Unknown (0x2b3a)
> HCI Event: Number of Completed Packets (0x13) plen 5                                                                                                                                                               #117 [hci0] 1244.394578
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 9                                                                                                                                                                            #118 [hci0] 1244.394731
      ATT: Error Response (0x01) len 4
        Read By Type Request (0x08)
        Handle: 0x0000
        Error: Request Not Supported (0x06)
< ACL Data TX: Handle 0 flags 0x00 dlen 11                                                                                                                                                                           #119 [hci0] 1244.394869
      ATT: Read By Group Type Request (0x10) len 6
        Handle range: 0x0001-0xffff
        Attribute group type: Primary Service (0x2800)
> HCI Event: Number of Completed Packets (0x13) plen 5                                                                                                                                                               #120 [hci0] 1244.544561
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 9                                                                                                                                                                            #121 [hci0] 1244.544715
      ATT: Error Response (0x01) len 4
        Read By Group Type Request (0x10)
        Handle: 0x0000
        Error: Request Not Supported (0x06)
....
........
........
> HCI Event: Disconnect Complete (0x05) plen 4                                                                                                                                                                       #122 [hci0] 1287.642010
        Status: Success (0x00)
        Handle: 0
        Reason: Connection Timeout (0x08)
@ MGMT Event: Device Disconnected (0x000c) plen 8                                                                                                                                                                {0x0001} [hci0] 1287.642056
        LE Address: F8:F4:B9:51:03:46 (Static)
        Reason: Connection timeout (0x01)

Environment (please complete the following information):

  1. Host BT Stack: Bluez 5.55 (latest master branch)
  2. Host OS: Ubuntu 16.04 ( x86_64 x86_64 x86_64 GNU/Linux
  3. Host kernel: 4.15.0-041500-generic
  4. BT chip: nRF52840-USB dongle
  5. BT Controller Code: Latest Zephyr (master branch)
  6. Build: west build -b nrf52840dongle_nrf52840 zephyr/samples/bluetooth/hci_usb
  7. Flash: using nRF Connect v3.5.0 Windows Application

Additional context
Observation: Sometimes connection drops with few seconds and sometimes in few minutes, without any host involvement or any command being sent to Controller from host. Also, it is observed that, the BLE connection between one nRF52840-USB dongle and another device (Android P) never terminates automatically by Connection Timeout, with nRF as GATT Server and Android as GATT client.

@RoyAnupam RoyAnupam added the bug The issue is a bug, or the PR is fixing a bug label Oct 7, 2020
@cvinayak cvinayak self-assigned this Oct 8, 2020
@cvinayak
Copy link
Contributor

cvinayak commented Oct 8, 2020

@RoyAnupam please provide the Zephyr commit hash you are building the hci_usb sample.

@RoyAnupam
Copy link
Author

RoyAnupam commented Oct 8, 2020

Hello @cvinayak
Following is tip of my Zephyr git:-
commit adb62ea

BR,

@nashif nashif added the priority: medium Medium impact/importance bug label Oct 13, 2020
@RoyAnupam
Copy link
Author

Hello @cvinayak ,
I am currently blocked due to this Connection Drop issue described above. Please let me know if there is any additional info, which I can provide, which may help?

BR,

@cvinayak
Copy link
Contributor

cvinayak commented Oct 19, 2020

@RoyAnupam I have been busy. But I have tested one hci_usb nRF52840 dongle connected to my virtualbox with linux, and setup gatt services as stated by you. I do not see any connection supervision timeouts when I connect to it from a phone and when I connect back from Linux to phone, too.

Also, it is observed that, the BLE connection between one nRF52840-USB dongle and another device (Android P) never terminates automatically by Connection Timeout, with nRF as GATT Server and Android as GATT client.

Do you mean, hci_usb as a peripheral and a phone as central, you do not have connection loss?

Could you try with different nRF52840 dongles? (in case you may have damaged the crystals on your dongles).

You can contact me on slack and we can have a screen sharing, if you want me to debug live.

@RoyAnupam
Copy link
Author

RoyAnupam commented Oct 20, 2020

@cvinayak , I appreciate your time to test & respond about this issue.

Do you mean, hci_usb as a peripheral and a phone as central, you do not have connection loss?

Yes, I don't face any connection loss between hci_usb as peripheral & android phone as client & vice versa. Basically, when only one hci_usb(dongle) is involved and other device is commercial product like phone, there is no connection drop. But, whenever, I use hci_usb in both sides, connection drop issue is occurring.

Could you try with different nRF52840 dongles? (in case you may have damaged the crystals on your dongles).

Yes, I have tried with two different dongles, behavior is same. Could you get a chance to check the scenario with two hci_usb dongles once?

You can contact me on slack and we can have a screen sharing, if you want me to debug live.

I have never used slack, but will check if my company policy allows to use it for screen sharing. Thank You

@cvinayak
Copy link
Contributor

cvinayak commented Oct 20, 2020

I have tested connection between two dongles, I do not see any disconnections.

image

Please attach btmon logs for both the dongles, the exact commands you have issued in console and in bluetoothctl.

@joerchan
Copy link
Contributor

@RoyAnupam Just an FYI, I edited your comment, you need a blank line after quoting someone.

> Quote

response

@RoyAnupam
Copy link
Author

hello @cvinayak ,

I have tested connection between two dongles, I do not see any disconnections.

oh, I will check your btmon logs in detail and try to compare with mine. Thank you very much for sharing the logs.
By the way, can you please share which version of bluez you are using in Ubuntu? bluetoothd --version?
Although currently I am not sure, if that difference will make any difference.

@cvinayak
Copy link
Contributor

cvinayak commented Oct 20, 2020

bluetoothd --version?

You meant bluetoothctl --version
bluetoothctl: 5.55

My bluez version is 5.55. BlueZ should not cause connection timeout.

If you can provide detailed step by step commandline or a script, it will make it easy to reproduce your issue.
I am currently guessing on how you have setup your client and server etc. And I am using single virtual machine to connect between two controllers etc.

@RoyAnupam
Copy link
Author

Hello @cvinayak

My bluez version is 5.55. BlueZ should not cause connection timeout.

Okay, sure

If you can provide detailed step by step commandline or a script, it will make it easy to reproduce your issue.
I am currently guessing on how you have setup your client and server etc. And I am using single virtual machine to connect between two controllers etc.

Okay, sure. Let me work on this and share with you details.

@RoyAnupam
Copy link
Author

Hello @cvinayak
I am preparing detailed steps on each side with nRF dongle. I will be able to share very soon (with btmon logs, bluetoothctl steps & bluez logs on both sides)

From your description, if I understood correctly,,, you are using single PC with Virtual Box with two nRF dongles with hci_usb attached to the same PC?

In the mean time, just for your info, my setup is following.
Setup1:

  • Ubuntu PC 16.04
  • Bluez 5.55 (latest)
  • nRF dongle with hci_usb (git pulled to latest)
  • Role (GATT Server)
  • test tool (bluetoothctl)

Setup2:

  • Ubuntu PC 16.04
  • Bluez 5.55 (latest)
  • nRF dongle with hci_usb (git pulled to latest)
  • Role (GATT Central)
  • test tool (bluetoothctl)

@cvinayak
Copy link
Contributor

you are using single PC with Virtual Box with two nRF dongles with hci_usb attached to the same PC?

Yes. and I used select command in bluetoothctl to advertise on one controller and connect to from another.

@RoyAnupam
Copy link
Author

Hello @cvinayak ,

Yes. and I used select command in bluetoothctl to advertise on one controller and connect to from another.

Okay, got it. I will also try this scenario at my end.

I reproduced the issue with controllers in separate Ubuntu Host's(with each running bluez upstream v5.55).
I am attaching detailed steps and logs from both hosts (Client & Server)

Attachments contain following

  • Linux_Client_Setups & Linux_Peripheral_Setups (Steps with comments in bluetoothctl operations)
  • Linux_Peripheral_btmon (It shows how LE Adv is enabled, Connection is established, few data communication over ATT & then Disconnection with Timeout)
  • Linux_Client_btmon (It shows how LE Scan and find peripheral device, Connection initiation & estabilishment, few data communication over ATT & then Disconnection with Timeout)
  • Linux_Client_bluetoothd & Linux_Server_bluetoothd (bluez logs for both hosts)

I hope these information's will be useful to reproduce the scenario. By the way, I would also like to share hci_usb prj.cnf file changes.
diff --git a/samples/bluetooth/hci_usb/prj.conf b/samples/bluetooth/hci_usb/prj.conf
index 9a86ef3..009cf4a 100644
--- a/samples/bluetooth/hci_usb/prj.conf
+++ b/samples/bluetooth/hci_usb/prj.conf
@@ -5,6 +5,9 @@ CONFIG_UART_INTERRUPT_DRIVEN=y
CONFIG_BT=y
CONFIG_BT_HCI_RAW=y
+CONFIG_BT_CTLR_ADV_EXT=y
+CONFIG_BT_BROADCASTER=y
+CONFIG_BT_CTLR_DATA_LENGTH_MAX=251
CONFIG_USB=y
CONFIG_USB_DEVICE_STACK=y

Please let me know if any further information would be helpful. Thank You

@cvinayak
Copy link
Contributor

@RoyAnupam Did you forget to upload files? (you can drag/drop or paste zip files here in the comments)

@RoyAnupam
Copy link
Author

Hello @cvinayak ,
I am extremely sorry. Seems, file uploading got by default blocked due to some reason. I will check with IT and get back. Thank You very much for pointing it out.

@cvinayak cvinayak added the Waiting for response Waiting for author's response label Oct 22, 2020
@carlescufi
Copy link
Member

@RoyAnupam you can also upload the files to dropbox or any other service and give us a link to them here.

@RoyAnupam
Copy link
Author

@carlescufi ,
Yes, I figured it out. Actually, My company IT policy blocked uploading to github(firewall issue). Will fix it ASAP and upload the files. Requesting you to please allow me some time for it. Thank you very much!

@RoyAnupam
Copy link
Author

Hello @cvinayak @carlescufi
I think, it should work now. I am uploading the files.

Attachments contain following

  • Linux_Client_Setups & Linux_Peripheral_Setups (Steps with comments in bluetoothctl operations)
  • Linux_Peripheral_btmon (It shows how LE Adv is enabled, Connection is established, few data communication over ATT & then Disconnection with Timeout)
  • Linux_Client_btmon (It shows how LE Scan and find peripheral device, Connection initiation & estabilishment, few data communication over ATT & then Disconnection with Timeout)
  • Linux_Client_bluetoothd & Linux_Server_bluetoothd (bluez logs for both hosts)

Please let me know if you face any issue in accessing these files. Thanks

Linux_Client_bluetoothd.txt
Linux_Client_btmon.txt
Linux_Client_Setups.txt
Linux_Peripheral_bluetoothd.txt
Linux_Peripheral_btmon.txt
Linux_Peripheral_Setups.txt

@cvinayak cvinayak removed the Waiting for response Waiting for author's response label Oct 25, 2020
@cvinayak
Copy link
Contributor

From Linux_Client_btmon.txt:

> ACL Data RX: Handle 0 flags 0x02 dlen 9                                                                                                                                                                            #1354 [hci0] 664.352961
      ATT: Write Request (0x12) len 4
        Handle: 0x0009
          Data: 0200
< ACL Data TX: Handle 0 flags 0x00 dlen 5                                                                                                                                                                            #1355 [hci0] 664.353123
      ATT: Write Response (0x13) len 0
> HCI Event: Number of Completed Packets (0x13) plen 5                                                                                                                                                               #1356 [hci0] 664.402772
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 9                                                                                                                                                                            #1357 [hci0] 664.402946
      Channel: 65 len 5 sdu 3 [PSM 39 mode LE Flow Control (0x80)] {chan 0}
      ATT: Read Request (0x0a) len 2
        Handle: 0x0003
< ACL Data TX: Handle 0 flags 0x00 dlen 12                                                                                                                                                                           #1358 [hci0] 664.403095
      Channel: 65 len 8 sdu 6 [PSM 39 mode LE Flow Control (0x80)] {chan 0}
      ATT: Read Response (0x0b) len 5
        Value: 6162686179		

From Linux_Peripheral_btmon.txt:

< ACL Data TX: Handle 0 flags 0x00 dlen 9                                                                                                                                                                             #89 [hci0] 2805.994446
      ATT: Write Request (0x12) len 4
        Handle: 0x0009
          Data: 0200
< ACL Data TX: Handle 0 flags 0x00 dlen 9                                                                                                                                                                             #90 [hci0] 2805.994447
      Channel: 65 len 5 sdu 3 [PSM 39 mode LE Flow Control (0x80)] {chan 0}
      ATT: Read Request (0x0a) len 2
        Handle: 0x0003
> HCI Event: Number of Completed Packets (0x13) plen 5                                                                                                                                                                #91 [hci0] 2806.356975
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 7                                                                                                                                                                             #92 [hci0] 2806.357048
      Channel: 64 len 3 [PSM 0 mode Basic (0x00)] {chan 65535}
        01 00 13                                         ...
< ACL Data TX: Handle 0 flags 0x00 dlen 9                                                                                                                                                                             #93 [hci0] 2806.357178
      Channel: 64 len 5 [PSM 0 mode Basic (0x00)] {chan 65535}
        03 00 0a 05 00                                   .....
> HCI Event: Number of Completed Packets (0x13) plen 5                                                                                                                                                                #94 [hci0] 2806.406966
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 5                                                                                                                                                                             #95 [hci0] 2806.407058
      ATT: Write Response (0x13) len 0
< ACL Data TX: Handle 0 flags 0x00 dlen 7                                                                                                                                                                             #96 [hci0] 2806.407215
      ATT: Read Request (0x0a) len 2
        Handle: 0x0012
> HCI Event: Number of Completed Packets (0x13) plen 5                                                                                                                                                                #97 [hci0] 2806.456969
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 12                                                                                                                                                                            #98 [hci0] 2806.457037
      Channel: 65 len 8 sdu 6 [PSM 39 mode LE Flow Control (0x80)] {chan 0}
      ATT: Read Response (0x0b) len 5
        Value: 6162686179

The last ATT Read Request handle (0x0012 sent, and 0x0003 received) do not match, are you sure this log is from a connection between the two dongles?

@RoyAnupam
Copy link
Author

Hello @cvinayak ,

are you sure this log is from a connection between the two dongles?

Yes, I cross-checked, the logs are from two dongles running on Bluez 5.55 host of two separate Ubuntu Machines.

I have captured logs again. This time, along with btmon logs, I am also sharing cfa files which can be directly viewed in Frontline's Comprobe Protocol Analyzer system (CPAS) file viewer software.

Below are the Random addresses of the dongles:-

  • Peripheral Random Address: C9:82:D7:F4:00:EF (Static)
  • Client Random address: F8:F4:B9:51:03:46 (Static)

Client connection initiation (can be found in Client_btmon.txt)
< HCI Command: LE Create Connection (0x08|0x000d) plen 25 #2776 [hci0] 1634.967793
Scan interval: 60.000 msec (0x0060)
Scan window: 60.000 msec (0x0060)
Filter policy: White list is not used (0x00)
Peer address type: Random (0x01)
Peer address: C9:82:D7:F4:00:EF (Static)
Own address type: Random (0x01)
Min connection interval: 30.00 msec (0x0018)
Max connection interval: 50.00 msec (0x0028)
Connection latency: 0 (0x0000)
Supervision timeout: 420 msec (0x002a)
Min connection length: 0.000 msec (0x0000)
Max connection length: 0.000 msec (0x0000)

Peripheral connection Complete (can be found in Peripheral_btmon.txt)

HCI Event: LE Meta Event (0x3e) plen 19 #2710 411376.250074
LE Connection Complete (0x01)
Status: Success (0x00)
Handle: 0
Role: Slave (0x01)
Peer address type: Random (0x01)
Peer address: F8:F4:B9:51:03:46 (Static)
Connection interval: 50.00 msec (0x0028)
Connection latency: 0 (0x0000)
Supervision timeout: 420 msec (0x002a)
Master clock accuracy: 0x07

Please find attached the btmon and cfa logs
Client_btmon.txt
Peripheral_btmon.txt
client_cfa.log
peripheral_cfa.log

@carlescufi
Copy link
Member

@RoyAnupam could you please try to capture a sniffer trace using the Nordic sniffer, if you have spare Nordic Development Kit. You can find the instructions here:

https://infocenter.nordicsemi.com/index.jsp?topic=%2Fug_sniffer_ble%2FUG%2Fsniffer_ble%2Fintro.html&cp=9_4

@carlescufi
Copy link
Member

@cvinayak FYI this is Connection Timeout (0x08) on both sides, so likely a controller issue.

@RoyAnupam
Copy link
Author

Hello @carlescufi ,

@RoyAnupam could you please try to capture a sniffer trace using the Nordic sniffer, if you have spare Nordic Development Kit. You can find the instructions here:

Unfortunately, I don't have a Nordic Sniffer. I am sorry about it. However, I did try to use a Frontline Comprobe Air Sniffer. In the logs, I remember, there was not much communication happening After CONN_IND PDU is sent by the initiator.

@cvinayak
Copy link
Contributor

cvinayak commented Nov 9, 2020

Please give a try with the follow in prj.conf of the hci_usb sample.
CONFIG_CLOCK_CONTROL_NRF_K32SRC_RC=y
CONFIG_CLOCK_CONTROL_NRF_K32SRC_500PPM=y

@cvinayak
Copy link
Contributor

cvinayak commented Dec 4, 2020

@RoyAnupam select in the Wireshark menus->view->Interface Toolbars-> "nRF Sniffer for Bluetooth LE"

Now you have a toolbar with "Device" as "All advertising devices", drop this down and select the advertising device you which to capture the connection for.

@RoyAnupam
Copy link
Author

Hello @cvinayak ,
Thanks, it works now~
Now, I can see Empty PDU's being exchanged between master & slave devices in the Wireshark. I have attached it for your reference. Please let me know if you are expecting same data from Sniffer tool capture for our current Disconnection issue?

nRFSniffer

@cvinayak
Copy link
Contributor

cvinayak commented Dec 4, 2020

@RoyAnupam please attach the sniffer log file with information from connection to disconnection. From the screenshot of few seconds of connection, there is nothing much I can analyze. Did you get the disconnection with supervision timeout (packets stop in the sniffer after only packet from master for the duration of supervision timeout)?

@RoyAnupam
Copy link
Author

Hello @cvinayak
My intention of sharing the screenshot was to just confirm with you if you are looking for similar log from wireshark sniffer dump. I will certainly test and share the whole log file. Sorry for confusion!

While I was testing yesterday, I got disconnection once after around 3-4~ mins of connection, but at that time, wireshark filter settings was not correct, so unfortunately that log is not useful. I will test again and share my observation and correct sniffer log.

@RoyAnupam
Copy link
Author

RoyAnupam commented Dec 5, 2020

Hello @cvinayak ,
The issue is reproduced with correct Wireshark setting this time. But, facing one issue with uploading the zipped dump file (6 MB zip file). I am trying to figure out what is going wrong. In the mean time, lease find below some observations.

Did you get the disconnection with supervision timeout (packets stop in the sniffer after only packet from master for the duration of supervision timeout)?

Yes, the last packet was from Master at time : 14399.221, after that, there was nothing from Slave, due to which Connection terminated. I can see Peripheral device switched to Advertisement state after connection was terminated at time:14401.047. Anyways, please refer screenshot for the same.
DisConnection_From_Supervision_Timeout

@cvinayak
Copy link
Contributor

cvinayak commented Dec 8, 2020

It just looks like Peripheral did a proper local initiated disconnect to which the Central acknowledged in packet no. 476411, except that the Info says L2CAP Fragment (is the connection encrypted?). Without the sniffer logs with details to look at, I cant be certain.... Can you email me the sniffer log zip file? My email address is in any of controller commit messages in Zephyr.

@RoyAnupam
Copy link
Author

DC_Latest.zip

Hello @cvinayak ,
Please find attached the complete sniffer log file.

@cvinayak
Copy link
Contributor

cvinayak commented Dec 9, 2020

@RoyAnupam Thank you for the sniffer log, and following up with the requests on this issue.

From the time reference set between CRC errors in the master's packets, it appears the 1.28 s advertising interval when colliding with the connection event, may have caused the event counter of the master to have been corrupted.

Please take in the fixes in here: #30119
or try with the latest master, and please share the results whether the connection is stable with the fixes in the PR.

CC @mtpr-ot

@RoyAnupam
Copy link
Author

Hello @cvinayak

Please take in the fixes in here: #30119
or try with the latest master, and please share the results whether the connection is stable with the fixes in the PR.

Sure, I will check ASAP & update my observation. Thank you for sharing the info~

BR,
-Anupam Roy

@RoyAnupam
Copy link
Author

Hello @cvinayak ,
Today I git pulled project to latest and tried reproducing the scenario. However, the issue happens to reproduce still with same setup & scenario. Sharing some information's below to cross-check with you if I am missing something. Kindly check

git project: [email protected]:zephyrproject-rtos/zephyr.git

$git branch

  • master

git tip commit hash : 4f4dd9f

Following Commits are part of latest master
2d0107a
8bf3ee9

git diff output
$ git diff
diff --git a/samples/bluetooth/hci_usb/prj.conf b/samples/bluetooth/hci_usb/prj.conf
index 9a86ef3..634b681 100644
--- a/samples/bluetooth/hci_usb/prj.conf
+++ b/samples/bluetooth/hci_usb/prj.conf
@@ -5,8 +5,14 @@ CONFIG_UART_INTERRUPT_DRIVEN=y

CONFIG_BT=y
CONFIG_BT_HCI_RAW=y
+CONFIG_BT_CTLR_ADV_EXT=y
+CONFIG_BT_BROADCASTER=y
+CONFIG_BT_CTLR_DATA_LENGTH_MAX=251

CONFIG_USB=y
CONFIG_USB_DEVICE_STACK=y
CONFIG_USB_DEVICE_BLUETOOTH=y
CONFIG_USB_DEVICE_BLUETOOTH_VS_H4=y
+
+CONFIG_CLOCK_CONTROL_NRF_K32SRC_500PPM=y
+CONFIG_CLOCK_CONTROL_NRF_K32SRC_RC=y

Please find below screenshot of the AirSniffer log. Also, please find the Wireshark capture file as attachment.
Kindly suggest if I am missing something. Thank you~

Disconnection.zip

Disconnection

BR,
-Anupam Roy

@cvinayak
Copy link
Contributor

Could you do a scan for number of WiFI access points in the vicinity of your setup? From the sniffer log review, there it is a wide spread of CRC errors and packet drops in the range of almost 1-13 WiFI channels overlapping the BLE radio channels.

@RoyAnupam
Copy link
Author

Hello @cvinayak ,

Could you do a scan for number of WiFI access points in the vicinity of your setup?

I can see list of ~25 AP found in my Android S8+ phone.

BR,
-Anupam

@joerchan
Copy link
Contributor

@RoyAnupam From what I can see both devices in the Connection reach supervision timeout. This also happens in the sniffer (the sniffer reaches supervision timeout and starts scanning advertiser channels, confirmed by actually seeing them).
The supervision timeout appears to be set quite low at 420 ms, with 40 ms connection interval that is just 11 connection intervals.

Can you try with a longer supervision timeout? See if the connection can resynchronize then?

@RoyAnupam
Copy link
Author

Connection1.zip

Hello @cvinayak & @joerchan ,
I have tested Connection between one nF52840 USB dongle (same dongle which I am using for Peripheral) and one commerical Android Phone (S20+) as Central/Initiating device.

Result: Connection seems to be rock solid! Its almost 23+ hours now FYI, & connection is still ON as I write this email. I am testing under same environment with many AP's as mentioned in trailing thread. Could you suggest why the connection is not stable when testing between two nRF dongles, while it is rock solid in case of nRF + Android Phone, considering same test environment?

Note: I have attached sniffer log for the same for the above connection. Please note I could share log only for 12 hours as Ihad to stop capture after ~12 hours, as file size was getting huge.
I am also attaching btmon log of the peripheral device (nRF dongle)
btmon_peripheral.txt

I would be happy to share more info, if needed. Thank You.

BR,
-Anupam Roy

@cvinayak
Copy link
Contributor

cvinayak commented Dec 15, 2020

The phone does channel map update, see the number of channels it has disabled:
image

You can do the same using the channel map update from BlueZ, this will either need information from WiFi controller or implement channel utilization statistics in the Zephyr controller. This will be proprietary feature, and will need host support too.

@RoyAnupam
Copy link
Author

Hello @cvinayak,
Thank you for quick reply. Does it mean, connection instability between 2 nRF dongles is normal/known issue with certain number of AP's in nearby environment?

@cvinayak
Copy link
Contributor

@RoyAnupam No, it is not an issue between dongles, it is the responsibility of the host to perform periodic channel map updates or to increase the connection supervision timeout. As you can see, the phone's host uses periodic channel map updates and is using a supervision timeout of 5 seconds compared to your setup that does not perform any channel map update and uses 420 ms as supervision timeout.

@RoyAnupam
Copy link
Author

Hello @cvinayak ,
Okay, I see. Thanks for explaining~
May i request you to suggest any quick workaround in Zephyr Controller code by which this blocker can be resolved? I am not so familiar with Zephyr Controller code & making host changes might need a bit of time & analysis. Sorry about it~

BR,
-Anupam Roy

@cvinayak
Copy link
Contributor

@carlescufi
Copy link
Member

@RoyAnupam you can also use the bt_le_set_chan_map existing API call instead to avoid having to modify the code at all.

@RoyAnupam
Copy link
Author

RoyAnupam commented Dec 15, 2020

Hello @cvinayak ,

you can hard code your desired channel map in this file: https://github.com/zephyrproject-rtos/zephyr/blob/master/subsys/bluetooth/controller/ll_sw/ull_chan.c

Thank you for your suggestion I will look into the code & try it.

Hello @carlescufi ,

you can also use the bt_le_set_chan_map existing API call instead to avoid having to modify the code at all.

Is this API part of BT HCI USB controller code? Actually, I am using Zephyr HCI USB controller for my nRF52840 USB dongle & Bluez host (Ubuntu PC). Please help me understand if I am missing something. Thank You

@carlescufi
Copy link
Member

Is this API part of BT HCI USB controller code? Actually, I am using Zephyr HCI USB controller for my nRF52840 USB dongle & Bluez host (Ubuntu PC). Please help me understand if I am missing something. Thank You

No, but it's an HCI command, so you can use BlueZ to send that command. The command is "LE Set Host Channel Classification command"

@RoyAnupam
Copy link
Author

RoyAnupam commented Dec 16, 2020

Hello @carlescufi ,
Sure, got it~ Thank you for the clarification. I will try to make some host modification for channel map update, but as I understand, it apparently seems to be not so straightforward & perhaps dependent on run-time wi-fi channel states. Will give it a try though!

@cvinayak & @carlescufi ,
Just FYI...today, I tried a small test whereby I just attempted to change Supervision timeout from 420 msec to 6000 msec & with default Channel Map, set during connection by Zephyr Controller.
I issued HCI LE Connection Update command from bluez's hcitool utility with following values.

$hcitool cmd 0x08 0x0013 0x00 0x00 0x18 0x00 0x1C 0x00 0x00 0x00 0x58 0x02 0x09 0x00 0x09 0x00

< HCI Command: LE Connection Update (0x08|0x0013) plen 14
Handle: 0
Min connection interval: 30.00 msec (0x0018)
Max connection interval: 35.00 msec (0x001c)
Connection latency: 0 (0x0000)
Supervision timeout: 6000 msec (0x0258)
Min connection length: 5.625 msec (0x0009)
Max connection length: 5.625 msec (0x0009)

Observation: Connection is stable between two nRF dongles and still active after ~5 hours, while I write this post.
I also did some PHY updates in between the active connection to check the connection stability (I did random change like 1M -> 2M -> 1M-> CODED ->2M etc.)
It seems, connection is stable irrespective of that. However, I am not sure, if making supervision timeout to large value alone is sufficient to guarantee a stable connection. Do you have any recommendation?

One observation(off-topic): The nRF sniffer dongle seemed to just die when I updated PHY to use CODED PHY between the nRF dongles. Somehow sniffer stopped working completely and stopped capturing. It was working fine until PHY was 1M & 2M. I was wondering, can it not listen to CODED PHY channel?

@joerchan
Copy link
Contributor

One observation(off-topic): The nRF sniffer dongle seemed to just die when I updated PHY to use CODED PHY between the nRF dongles. Somehow sniffer stopped working completely and stopped capturing. It was working fine until PHY was 1M & 2M. I was wondering, can it not listen to CODED PHY channel?

As of latest release 3.1.0, the sniffer does not support LE Coded PHY.

@RoyAnupam
Copy link
Author

RoyAnupam commented Dec 24, 2020

@joerchan ,

As of latest release 3.1.0, the sniffer does not support LE Coded PHY.

Ok, thanks for confirming. Does the nRF52840 dongle (sniffer_nrf52840dongle_nrf52840_7cc811f.hex) support Extended Advertising & Extended Connection over 2M PHY?

@cvinayak
Copy link
Contributor

cvinayak commented Jan 6, 2021

Observation: Connection is stable between two nRF dongles and still active after ~5 hours, while I write this post.
I also did some PHY updates in between the active connection to check the connection stability (I did random change like 1M -> 2M -> 1M-> CODED ->2M etc.)
It seems, connection is stable irrespective of that. However, I am not sure, if making supervision timeout to large value alone is sufficient to guarantee a stable connection.

@RoyAnupam Based on the conclusion please close this issue. You may create new question issue for further discussions.

@RoyAnupam
Copy link
Author

RoyAnupam commented Jan 7, 2021

Hello @cvinayak,
Thank You very much for your tremendous help & guidance to solve the BLE connection issue. After increasing Supervision timeout to 6000 msec, I am able to now establish connection between two nRF-52840 dongles. Will close this issue. Thanks

Based on the conclusion please close this issue. You may create new question issue for further discussions.

Sure, will do. Thanks.

@joerchan
Copy link
Contributor

joerchan commented Jan 8, 2021

Ok, thanks for confirming. Does the nRF52840 dongle (sniffer_nrf52840dongle_nrf52840_7cc811f.hex) support Extended Advertising & Extended Connection over 2M PHY

No, this is planned for the next release (along with LE Coded PHY).

@RoyAnupam
Copy link
Author

Hello @joerchan ,

No, this is planned for the next release (along with LE Coded PHY).

Allright, thank you very much for this valuable information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth Controller area: Bluetooth bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

No branches or pull requests

5 participants