Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/nimble: migrate to ztimer #16317

Merged
merged 6 commits into from
Apr 15, 2021
Merged

Conversation

haukepetersen
Copy link
Contributor

Contribution description

With #15782 merged all prerequisites for migrating NimBLE from xtimer to ztimer are in. For a next step, I PRed the required NimBLE NPL changes to NimBLE (apache/mynewt-nimble#967). These are needed in conjunction with the changes of this PR and finally allow for xtimer-free RIOT/NimBLE builds :-)

As stated in apache/mynewt-nimble#967, appart from simplifying the NPL code in NImBLE, this change allows for significant energy savings. E.g. when running the examples/nimble_gatt application without STDIO (USEMODULE+=stdio_null) and the node is in connected state with a peer, the average current consumption drops from ~420µA to 72µA.

Testing procedure

Verify that NimBLE is behaving as before. I suggest to run at least the following on any nrf52x board of you choice:

  • examples/nimble_gatt -> connect with a smartphone (e.g. using the nrf connect APP on Android) and verify GATT services are working
  • examples/nimble_scanner -> use the scan shell command and it should list some devices that it found
  • tests/nimble_l2cap/tests/nimble_l2cap_server -> connect two boards and make sure flooding and inctest are running as expected
  • examples/gnrc_networking -> connect two IP-over-BLE devices using the ble shell command and make sure they can ping each other

Issues/PRs references

depends on apache/mynewt-nimble#967

@haukepetersen haukepetersen added Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation Area: BLE Area: Bluetooth Low Energy support labels Apr 12, 2021
@@ -32,6 +32,7 @@
#include "thread.h"
#include "thread_flags.h"
#include "net/bluetil/ad.h"
#include "xtimer.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a shame to keep xtimer here... but I understand that this is unrelated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. Also for this use case it does not really matter, as energy consumption is not really a the focus of this test :-)

@haukepetersen
Copy link
Contributor Author

Removed the TODO label from the NimBLE pkg update commit.

Although apache/mynewt-nimble#967 is ready to be merged, we have the problem that merging it prior to this PR it will make NimBLEs CI fail, as the RIOT build will not succeed. So my strategy is the following:

@fjmolinas
Copy link
Contributor

examples/gnrc-networking OK
  • node 1
2021-04-14 08:59:59,135 # ifconfig
2021-04-14 08:59:59,139 # Iface  8  HWaddr: CA:5E:99:AE:11:63
2021-04-14 08:59:59,143 #           L2-PDU:1280  MTU:1280  HL:64  RTR
2021-04-14 08:59:59,146 #           6LO  IPHC
2021-04-14 08:59:59,148 #           Source address length: 6
2021-04-14 08:59:59,152 #           Link type: wireless
2021-04-14 08:59:59,157 #           inet6 addr: fe80::ca5e:99ff:feae:1163  scope: link  VAL
2021-04-14 08:59:59,160 #           inet6 group: ff02::2
2021-04-14 08:59:59,163 #           inet6 group: ff02::1
2021-04-14 08:59:59,166 #           inet6 group: ff02::1:ffae:1163
2021-04-14 08:59:59,167 #
2021-04-14 08:59:59,170 #           Statistics for Layer 2
2021-04-14 08:59:59,173 #             RX packets 0  bytes 0
2021-04-14 08:59:59,178 #             TX packets 0 (Multicast: 0)  bytes 9320
2021-04-14 08:59:59,181 #             TX succeeded 0 errors 0
2021-04-14 08:59:59,184 #           Statistics for IPv6
2021-04-14 08:59:59,187 #             RX packets 0  bytes 0
2021-04-14 08:59:59,191 #             TX packets 2 (Multicast: 2)  bytes 112
2021-04-14 08:59:59,194 #             TX succeeded 2 errors 0
2021-04-14 08:59:59,195 #
> ble adv
2021-04-14 09:00:15,112 # ble adv
2021-04-14 09:00:15,116 # success: advertising this node as 'bleRIOT'
> 2021-04-14 09:00:24,137 # event: handle 0 -> CONNECTED as SLAVE (D3:59:5D:10:4C:F8)

  • node 2
2021-04-14 09:00:16,472 # ble scan
2021-04-14 09:00:16,474 # scanning (for 500ms) ...
2021-04-14 09:00:16,975 # done
2021-04-14 09:00:16,984 # [ 0] ca:5e:99:ae:11:63 (RANDOM) [IND] "bleRIOT", adv_msg_cnt: 8, adv_int: 58117us, last_rssi: -51
> ble connect  ca:5e:99:ae:11:63
2021-04-14 09:00:23,971 # ble connect  ca:5e:99:ae:11:63
2021-04-14 09:00:23,976 # initiated connection procedure with CA:5E:99:AE:11:63
> 2021-04-14 09:00:24,186 # event: handle 0 -> CONNECTED as MASTER (CA:5E:99:AE:11:63)
ping6 fe80::ca5e:99ff:feae:1163
2021-04-14 09:00:34,419 # ping6 fe80::ca5e:99ff:feae:1163
2021-04-14 09:00:34,488 # 12 bytes from fe80::ca5e:99ff:feae:1163%8: icmp_seq=0 ttl=64 time=62.945 ms
2021-04-14 09:00:35,488 # 12 bytes from fe80::ca5e:99ff:feae:1163%8: icmp_seq=1 ttl=64 time=74.729 ms
2021-04-14 09:00:36,488 # 12 bytes from fe80::ca5e:99ff:feae:1163%8: icmp_seq=2 ttl=64 time=86.479 ms
2021-04-14 09:00:36,488 #
2021-04-14 09:00:36,492 # --- fe80::ca5e:99ff:feae:1163 PING statistics ---
2021-04-14 09:00:36,498 # 3 packets transmitted, 3 packets received, 0% packet loss
2021-04-14 09:00:36,502 # round-trip min/avg/max = 62.945/74.717/86.479 ms

examples/nimble_gatt OK

Can connect on phone as well

2021-04-14 09:03:38,648 # service 'device info: manufacturer' callback triggered
2021-04-14 09:03:38,653 # This is RIOT! (Version: 2021.04-devel-1344-g398c3-pr-16317)
2021-04-14 09:03:38,653 #
2021-04-14 09:03:38,653 #
2021-04-14 09:03:38,745 # service 'device info: manufacturer' callback triggered
2021-04-14 09:03:38,750 # This is RIOT! (Version: 2021.04-devel-1344-g398c3-pr-16317)
2021-04-14 09:03:38,751 #
2021-04-14 09:03:38,751 #
2021-04-14 09:03:38,843 # service 'device info: manufacturer' callback triggered
2021-04-14 09:03:38,848 # This is RIOT! (Version: 2021.04-devel-1344-g398c3-pr-16317)
2021-04-14 09:03:38,848 #
2021-04-14 09:03:38,848 #
2021-04-14 09:03:41,036 # service 'device info: model' callback triggered
2021-04-14 09:03:41,042 # You are running RIOT on a(n) dwm1001 board, which features a(n) nrf52 MCU.
2021-04-14 09:03:41,042 #
2021-04-14 09:03:41,133 # service 'device info: model' callback triggered
2021-04-14 09:03:41,140 # You are running RIOT on a(n) dwm1001 board, which features a(n) nrf52 MCU.
2021-04-14 09:03:41,140 #
2021-04-14 09:03:41,231 # service 'device info: model' callback triggered
2021-04-14 09:03:41,237 # You are running RIOT on a(n) dwm1001 board, which features a(n) nrf52 MCU.
2021-04-14 09:03:41,237 #
2021-04-14 09:03:41,328 # service 'device info: model' callback triggered
2021-04-14 09:03:41,335 # You are running RIOT on a(n) dwm1001 board, which features a(n) nrf52 MCU.
2021-04-14 09:03:41,335 #
2021-04-14 09:03:48,201 # service 'rw demo' callback triggered
2021-04-14 09:03:48,205 # access to characteristic 'rw demo (write)'
2021-04-14 09:03:48,207 # read from characteristic
2021-04-14 09:03:48,214 # current value of rm_demo_write_data: 'This characteristic is read- and writeable!'
2021-04-14 09:03:48,214 #
2021-04-14 09:03:48,298 # service 'rw demo' callback triggered
2021-04-14 09:03:48,302 # access to characteristic 'rw demo (write)'
2021-04-14 09:03:48,304 # read from characteristic
2021-04-14 09:03:48,312 # current value of rm_demo_write_data: 'This characteristic is read- and writeable!'
2021-04-14 09:03:48,312 #
2021-04-14 09:03:49,663 # service 'rw demo' callback triggered
2021-04-14 09:03:49,668 # access to characteristic 'rw demo (read-only)'
2021-04-14 09:03:49,669 # new random number: 3

@fjmolinas
Copy link
Contributor

With nimble/scanner I sometimes get a hard fault:

2021-04-14 09:06:42,320 # scan
2021-04-14 09:06:42,478 # Scanning for 1000ms now ...Stack pointer corrupted, reset to top of stack
2021-04-14 09:06:42,478 # FSR/FAR:
2021-04-14 09:06:42,480 #  CFSR: 0x00000092
2021-04-14 09:06:42,482 #  HFSR: 0x40000000
2021-04-14 09:06:42,483 #  DFSR: 0x00000000
2021-04-14 09:06:42,485 #  AFSR: 0x00000000
2021-04-14 09:06:42,486 # MMFAR: 0x20000018
2021-04-14 09:06:42,487 # Misc
2021-04-14 09:06:42,488 # EXC_RET: 0xfffffff1
2021-04-14 09:06:42,747 # main(): This is RIOT! (Version: 2021.04-devel-1344-g398c3-pr-16317)
2021-04-14 09:06:42,750 # NimBLE Scanner Example Application
2021-04-14 09:06:42,753 # Type `scan help` for more information
> scan
2021-04-14 09:06:51,407 # scan
2021-04-14 09:06:52,410 # Scanning for 1000ms now ... done
2021-04-14 09:06:52,411 #
2021-04-14 09:06:52,411 # Results:
2021-04-14 09:06:52,412 #
> scan
2021-04-14 09:07:38,870 # scan
2021-04-14 09:07:39,439 # Scanning for 1000ms now ...Stack pointer corrupted, reset to top of stack
2021-04-14 09:07:39,439 # FSR/FAR:
2021-04-14 09:07:39,441 #  CFSR: 0x00000092
2021-04-14 09:07:39,442 #  HFSR: 0x40000000
2021-04-14 09:07:39,444 #  DFSR: 0x00000000
2021-04-14 09:07:39,445 #  AFSR: 0x00000000
2021-04-14 09:07:39,447 # MMFAR: 0x20000018
2021-04-14 09:07:39,448 # Misc
2021-04-14 09:07:39,449 # EXC_RET: 0xfffffff1
2021-04-14 09:07:39,707 # main(): This is RIOT! (Version: 2021.04-devel-1344-g398c3-pr-16317)
2021-04-14 09:07:39,710 # NimBLE Scanner Example Application
2021-04-14 09:07:39,713 # Type `scan help` for more information
> scan
2021-04-14 09:08:17,516 # scan
2021-04-14 09:08:18,081 # Scanning for 1000ms now ...Stack pointer corrupted, reset to top of stack
2021-04-14 09:08:18,082 # FSR/FAR:
2021-04-14 09:08:18,083 #  CFSR: 0x00000092
2021-04-14 09:08:18,085 #  HFSR: 0x40000000
2021-04-14 09:08:18,086 #  DFSR: 0x00000000
2021-04-14 09:08:18,088 #  AFSR: 0x00000000
2021-04-14 09:08:18,090 # MMFAR: 0x20000018
2021-04-14 09:08:18,090 # Misc
2021-04-14 09:08:18,092 # EXC_RET: 0xfffffff1
2021-04-14 09:08:18,350 # main(): This is RIOT! (Version: 2021.04-devel-1344-g398c3-pr-16317)
2021-04-14 09:08:18,353 # NimBLE Scanner Example Application
2021-04-14 09:08:18,356 # Type `scan help` for more information
> scan
2021-04-14 09:08:35,270 # scan
2021-04-14 09:08:36,273 # Scanning for 1000ms now ... done
2021-04-14 09:08:36,274 #
2021-04-14 09:08:36,275 # Results:
2021-04-14 09:08:36,283 # [ 0] f8:da:0c:58:5e:1c (PUBLIC) [SCAN_IND] "undefined", adv_msg_cnt: 8, adv_int: 75232us, last_rssi: -81
2021-04-14 09:08:36,284 #
> scan
2021-04-14 09:08:38,093 # scan
2021-04-14 09:08:39,096 # Scanning for 1000ms now ... done
2021-04-14 09:08:39,096 #
2021-04-14 09:08:39,097 # Results:
2021-04-14 09:08:39,106 # [ 0] f8:da:0c:58:5e:1c (PUBLIC) [SCAN_IND] "undefined", adv_msg_cnt: 11, adv_int: 68181us, last_rssi: -78
2021-04-14 09:08:39,107 #

@fjmolinas
Copy link
Contributor

With nimble/scanner I sometimes get a hard fault:

I can't reliably reproduce yet...

@fjmolinas
Copy link
Contributor

tests/l2cap_% OK
  • node 1
2021-04-14 09:14:25,101 # # Received: len   100, seq     1
2021-04-14 09:14:25,105 # # Received: len   100, seq     2
2021-04-14 09:14:25,108 # # Received: len   100, seq     3
2021-04-14 09:14:25,112 # # Received: len   100, seq     4
2021-04-14 09:14:25,115 # # Received: len   100, seq     5
2021-04-14 09:14:25,119 # # Received: len   100, seq     6
2021-04-14 09:14:25,122 # # Received: len   100, seq     7
2021-04-14 09:14:25,126 # # Received: len   100, seq     8
2021-04-14 09:14:25,129 # # Received: len   100, seq     9
2021-04-14 09:14:25,133 # # Received: len   100, seq    10
2021-04-14 09:14:25,136 # # Received: len   100, seq    11
2021-04-14 09:14:25,139 # # Received: len   100, seq    12
2021-04-14 09:14:25,143 # # Received: len   100, seq    13
2021-04-14 09:14:25,146 # # Received: len   100, seq    14
2021-04-14 09:14:25,149 # # Received: len   100, seq    15
2021-04-14 09:14:25,153 # # Received: len   100, seq    16
2021-04-14 09:14:25,156 # # Received: len   100, seq    17
2021-04-14 09:14:25,159 # # Received: len   100, seq    18
2021-04-14 09:14:25,162 # # Received: len   100, seq    19
2021-04-14 09:14:25,166 # # Received: len   100, seq    20
2021-04-14 09:14:25,169 # # Received: len   100, seq    21
2021-04-14 09:14:25,172 # # Received: len   100, seq    22
2021-04-14 09:14:25,175 # # Received: len   100, seq    23
2021-04-14 09:14:25,179 # # Received: len   100, seq    24
2021-04-14 09:14:25,182 # # Received: len   100, seq    25
2021-04-14 09:14:25,185 # # Received: len   100, seq    26
2021-04-14 09:14:25,188 # # Received: len   100, seq    27
2021-04-14 09:14:25,191 # # Received: len   100, seq    28
2021-04-14 09:14:25,201 # # Received: len   100, seq    29
2021-04-14 09:14:25,204 # # Received: len   100, seq    30
2021-04-14 09:14:25,207 # # Received: len   100, seq    31
2021-04-14 09:14:25,211 # # Received: len   100, seq    32
2021-04-14 09:14:25,214 # # Received: len   100, seq    33
2021-04-14 09:14:25,217 # # Received: len   100, seq    34
2021-04-14 09:14:25,222 # # Received: len   100, seq    35
2021-04-14 09:14:25,224 # # Received: len   100, seq    36
2021-04-14 09:14:25,227 # # Received: len   100, seq    37
2021-04-14 09:14:25,230 # # Received: len   100, seq    38
2021-04-14 09:14:25,234 # # Received: len   100, seq    39
2021-04-14 09:14:25,237 # # Received: len   100, seq    40
2021-04-14 09:14:25,240 # # Received: len   100, seq    41
2021-04-14 09:14:25,243 # # Received: len   100, seq    42
2021-04-14 09:14:25,247 # # Received: len   100, seq    43
2021-04-14 09:14:25,254 # # Received: len   100, seq    44
2021-04-14 09:14:25,258 # # Received: len   100, seq    45
2021-04-14 09:14:25,261 # # Received: len   100, seq    46
2021-04-14 09:14:25,265 # # Received: len   100, seq    47
2021-04-14 09:14:25,268 # # Received: len   100, seq    48
2021-04-14 09:14:25,271 # # Received: len   100, seq    49
2021-04-14 09:14:25,275 # # Received: len   100, seq    50

  • node 2
> flood
2021-04-14 09:14:25,062 # flood
2021-04-14 09:14:25,065 # # Sending: size   100 seq     1
2021-04-14 09:14:25,068 # # Sending: size   100 seq     2
2021-04-14 09:14:25,071 # # Sending: size   100 seq     3
2021-04-14 09:14:25,074 # # Sending: size   100 seq     4
2021-04-14 09:14:25,077 # # Sending: size   100 seq     5
2021-04-14 09:14:25,080 # # Sending: size   100 seq     6
2021-04-14 09:14:25,083 # # Sending: size   100 seq     7
2021-04-14 09:14:25,086 # # Sending: size   100 seq     8
2021-04-14 09:14:25,089 # # Sending: size   100 seq     9
2021-04-14 09:14:25,092 # # Sending: size   100 seq    10
2021-04-14 09:14:25,095 # # Sending: size   100 seq    11
2021-04-14 09:14:25,098 # # Sending: size   100 seq    12
2021-04-14 09:14:25,102 # # Sending: size   100 seq    13
2021-04-14 09:14:25,105 # # Sending: size   100 seq    14
2021-04-14 09:14:25,109 # # Sending: size   100 seq    15
2021-04-14 09:14:25,112 # # Sending: size   100 seq    16
2021-04-14 09:14:25,116 # # Sending: size   100 seq    17
2021-04-14 09:14:25,119 # # Sending: size   100 seq    18
2021-04-14 09:14:25,123 # # Sending: size   100 seq    19
2021-04-14 09:14:25,126 # # Sending: size   100 seq    20
2021-04-14 09:14:25,130 # # Sending: size   100 seq    21
2021-04-14 09:14:25,133 # # Sending: size   100 seq    22
2021-04-14 09:14:25,137 # # Sending: size   100 seq    23
2021-04-14 09:14:25,140 # # Sending: size   100 seq    24
2021-04-14 09:14:25,144 # # Sending: size   100 seq    25
2021-04-14 09:14:25,147 # # Sending: size   100 seq    26
2021-04-14 09:14:25,150 # # Sending: size   100 seq    27
2021-04-14 09:14:25,153 # # Sending: size   100 seq    28
2021-04-14 09:14:25,156 # # Sending: size   100 seq    29
2021-04-14 09:14:25,159 # # Sending: size   100 seq    30
2021-04-14 09:14:25,162 # # Sending: size   100 seq    31
2021-04-14 09:14:25,165 # # Sending: size   100 seq    32
2021-04-14 09:14:25,168 # # Sending: size   100 seq    33
2021-04-14 09:14:25,171 # # Sending: size   100 seq    34
2021-04-14 09:14:25,204 # # Sending: size   100 seq    35
2021-04-14 09:14:25,208 # # Sending: size   100 seq    36
2021-04-14 09:14:25,211 # # Sending: size   100 seq    37
2021-04-14 09:14:25,215 # # Sending: size   100 seq    38
2021-04-14 09:14:25,218 # # Sending: size   100 seq    39
2021-04-14 09:14:25,222 # # Sending: size   100 seq    40
2021-04-14 09:14:25,225 # # Sending: size   100 seq    41
2021-04-14 09:14:25,229 # # Sending: size   100 seq    42
2021-04-14 09:14:25,232 # # Sending: size   100 seq    43
2021-04-14 09:14:25,236 # # Sending: size   100 seq    44
2021-04-14 09:14:25,239 # # Sending: size   100 seq    45
2021-04-14 09:14:25,242 # # Sending: size   100 seq    46
2021-04-14 09:14:25,253 # # Sending: size   100 seq    47
2021-04-14 09:14:25,257 # # Sending: size   100 seq    48
2021-04-14 09:14:25,261 # # Sending: size   100 seq    49
2021-04-14 09:14:25,264 # # Sending: size   100 seq    50
2021-04-14 09:14:25,310 # # TEST COMPLETE
2021-04-14 09:14:25,312 # -> runtime: 247ms
2021-04-14 09:14:25,314 # -> ~ 20647 bytes/s

@fjmolinas
Copy link
Contributor

Removed the TODO label from the NimBLE pkg update commit.

Although apache/mynewt-nimble#967 is ready to be merged, we have the problem that merging it prior to this PR it will make NimBLEs CI fail, as the RIOT build will not succeed. So my strategy is the following:

* we merge this PR as is into RIOT, updating the NimBLE package to use my feature branch as PRed in [apache/mynewt-nimble#967](https://github.com/apache/mynewt-nimble/pull/967) temporarily

* merge [apache/mynewt-nimble#967](https://github.com/apache/mynewt-nimble/pull/967) -> the NimBLE CI will be happy then

* update NimBLE package once more to switch back to NimBLEs upstream master

Makes sense, @kaspar030 is this OK to merge in soft feature freeze?

@fjmolinas
Copy link
Contributor

With nimble/scanner I sometimes get a hard fault:

I can't reliably reproduce yet...

I seem to be getting it quite reliably after a fresh flash.

@haukepetersen
Copy link
Contributor Author

I seem to be getting it quite reliably after a fresh flash.

I have not notices this before, but I will try some boards to see if I can reproduce this.

@haukepetersen
Copy link
Contributor Author

Seems like there was an issue with the shell command still using xtimer. I simply pulled in the commit from #16324 that migrates the nimble_netif shell command to use ztimer as well. At least locally here everything runs as expected now.

@haukepetersen
Copy link
Contributor Author

2021-04-14 09:27:26,863 # help
2021-04-14 09:27:26,866 # Command              Description
2021-04-14 09:27:26,870 # ---------------------------------------
2021-04-14 09:27:26,873 # scan                 trigger a BLE scan
> scan 1000
2021-04-14 09:27:28,872 # scan 1000
2021-04-14 09:27:29,875 # Scanning for 1000ms now ... done
2021-04-14 09:27:29,875 # 
2021-04-14 09:27:29,876 # Results:
2021-04-14 09:27:29,885 # [ 0] fc:76:a8:c4:1d:cf (RANDOM) [NONCONN_IND] "undefined", adv_msg_cnt: 1, adv_int: 0us, last_rssi: -74
2021-04-14 09:27:29,895 # [ 1] 28:39:5e:32:5a:30 (PUBLIC) [NONCONN_IND] "undefined", adv_msg_cnt: 6, adv_int: 126519us, last_rssi: -68
2021-04-14 09:27:29,904 # [ 2] 04:2c:2f:89:8f:a9 (RANDOM) [NONCONN_IND] "undefined", adv_msg_cnt: 3, adv_int: 182079us, last_rssi: -78
2021-04-14 09:27:29,913 # [ 3] 70:51:39:c2:62:4e (RANDOM) [IND] "undefined", adv_msg_cnt: 8, adv_int: 114217us, last_rssi: -76
2021-04-14 09:27:29,921 # [ 4] c0:28:8d:fa:af:5e (PUBLIC) [IND] "", adv_msg_cnt: 3, adv_int: 111665us, last_rssi: -85
2021-04-14 09:27:29,930 # [ 5] 46:bc:91:11:93:dd (RANDOM) [IND] "undefined", adv_msg_cnt: 3, adv_int: 91455us, last_rssi: -77
2021-04-14 09:27:29,940 # [ 6] d0:d0:03:70:8a:a9 (PUBLIC) [NONCONN_IND] "undefined", adv_msg_cnt: 4, adv_int: 192659us, last_rssi: -85
2021-04-14 09:27:29,949 # [ 7] a7:74:a9:ea:28:0a (RANDOM) [NONCONN_IND] "undefined", adv_msg_cnt: 2, adv_int: 2224us, last_rssi: -72
2021-04-14 09:27:29,958 # [ 8] a5:c9:39:e0:3b:b6 (RANDOM) [NONCONN_IND] "undefined", adv_msg_cnt: 1, adv_int: 0us, last_rssi: -77
2021-04-14 09:27:29,967 # [ 9] 46:8b:fe:58:3a:2e (RANDOM) [NONCONN_IND] "undefined", adv_msg_cnt: 1, adv_int: 0us, last_rssi: -81
2021-04-14 09:27:29,977 # [10] 00:db:22:ae:dc:9e (RANDOM) [NONCONN_IND] "undefined", adv_msg_cnt: 5, adv_int: 81750us, last_rssi: -81
2021-04-14 09:27:29,985 # [11] c9:1e:0e:ea:35:80 (RANDOM) [IND] "undefined", adv_msg_cnt: 2, adv_int: 359us, last_rssi: -76
2021-04-14 09:27:29,994 # [12] 60:51:99:03:4c:97 (RANDOM) [IND] "undefined", adv_msg_cnt: 2, adv_int: 306us, last_rssi: -78
2021-04-14 09:27:30,003 # [13] cc:ac:6b:4a:ae:df (RANDOM) [NONCONN_IND] "undefined", adv_msg_cnt: 1, adv_int: 0us, last_rssi: -53
2021-04-14 09:27:30,003 # 
> scan 3000
2021-04-14 09:27:42,979 # scan 3000
2021-04-14 09:27:45,982 # Scanning for 3000ms now ... done
2021-04-14 09:27:45,982 # 
2021-04-14 09:27:45,983 # Results:
2021-04-14 09:27:45,992 # [ 0] 60:09:b8:81:36:1c (RANDOM) [IND] "undefined", adv_msg_cnt: 4, adv_int: 275148us, last_rssi: -80
2021-04-14 09:27:46,001 # [ 1] 70:51:39:c2:62:4e (RANDOM) [IND] "undefined", adv_msg_cnt: 29, adv_int: 102025us, last_rssi: -83
2021-04-14 09:27:46,010 # [ 2] 46:bc:91:11:93:dd (RANDOM) [IND] "undefined", adv_msg_cnt: 8, adv_int: 341813us, last_rssi: -85
2021-04-14 09:27:46,019 # [ 3] a7:74:a9:ea:28:0a (RANDOM) [NONCONN_IND] "undefined", adv_msg_cnt: 1, adv_int: 0us, last_rssi: -71
2021-04-14 09:27:46,029 # [ 4] 00:db:22:ae:dc:9e (RANDOM) [NONCONN_IND] "undefined", adv_msg_cnt: 17, adv_int: 147607us, last_rssi: -83
2021-04-14 09:27:46,038 # [ 5] 28:39:5e:32:5a:30 (PUBLIC) [NONCONN_IND] "undefined", adv_msg_cnt: 13, adv_int: 214596us, last_rssi: -70
2021-04-14 09:27:46,046 # [ 6] c0:28:8d:fa:af:5e (PUBLIC) [IND] "", adv_msg_cnt: 15, adv_int: 183744us, last_rssi: -84
2021-04-14 09:27:46,056 # [ 7] f6:ef:c4:b9:23:11 (RANDOM) [NONCONN_IND] "undefined", adv_msg_cnt: 2, adv_int: 996774us, last_rssi: -88
2021-04-14 09:27:46,066 # [ 8] a5:c9:39:e0:3b:b6 (RANDOM) [NONCONN_IND] "undefined", adv_msg_cnt: 3, adv_int: 668448us, last_rssi: -79
2021-04-14 09:27:46,075 # [ 9] d0:d0:03:70:8a:a9 (PUBLIC) [NONCONN_IND] "undefined", adv_msg_cnt: 12, adv_int: 207165us, last_rssi: -85
2021-04-14 09:27:46,085 # [10] 46:8b:fe:58:3a:2e (RANDOM) [NONCONN_IND] "undefined", adv_msg_cnt: 3, adv_int: 668650us, last_rssi: -65
2021-04-14 09:27:46,094 # [11] 60:51:99:03:4c:97 (RANDOM) [IND] "undefined", adv_msg_cnt: 7, adv_int: 349097us, last_rssi: -80
2021-04-14 09:27:46,103 # [12] 04:2c:2f:89:8f:a9 (RANDOM) [NONCONN_IND] "undefined", adv_msg_cnt: 9, adv_int: 249693us, last_rssi: -83
2021-04-14 09:27:46,113 # [13] 3a:70:ca:cd:35:85 (RANDOM) [NONCONN_IND] "undefined", adv_msg_cnt: 6, adv_int: 317535us, last_rssi: -78
2021-04-14 09:27:46,122 # [14] cc:ac:6b:4a:ae:df (RANDOM) [NONCONN_IND] "undefined", adv_msg_cnt: 4, adv_int: 500739us, last_rssi: -67
2021-04-14 09:27:46,132 # [15] fc:76:a8:c4:1d:cf (RANDOM) [NONCONN_IND] "undefined", adv_msg_cnt: 3, adv_int: 667458us, last_rssi: -76
2021-04-14 09:27:46,141 # [16] bc:7e:8b:c3:a9:bb (PUBLIC) [IND] "undefined", adv_msg_cnt: 2, adv_int: 170025us, last_rssi: -83
2021-04-14 09:27:46,141 # 
> scan 500
2021-04-14 09:27:49,065 # scan 500
2021-04-14 09:27:49,568 # Scanning for 500ms now ... done
2021-04-14 09:27:49,568 # 
2021-04-14 09:27:49,569 # Results:
2021-04-14 09:27:49,577 # [ 0] bc:7e:8b:c3:a9:bb (PUBLIC) [IND] "undefined", adv_msg_cnt: 1, adv_int: 0us, last_rssi: -83
2021-04-14 09:27:49,585 # [ 1] c0:28:8d:fa:af:5e (PUBLIC) [IND] "", adv_msg_cnt: 4, adv_int: 121746us, last_rssi: -84
2021-04-14 09:27:49,594 # [ 2] a7:74:a9:ea:28:0a (RANDOM) [NONCONN_IND] "undefined", adv_msg_cnt: 1, adv_int: 0us, last_rssi: -72
2021-04-14 09:27:49,604 # [ 3] 28:39:5e:32:5a:30 (PUBLIC) [NONCONN_IND] "undefined", adv_msg_cnt: 3, adv_int: 105677us, last_rssi: -70
2021-04-14 09:27:49,613 # [ 4] 04:2c:2f:89:8f:a9 (RANDOM) [NONCONN_IND] "undefined", adv_msg_cnt: 2, adv_int: 135002us, last_rssi: -81
2021-04-14 09:27:49,623 # [ 5] a5:c9:39:e0:3b:b6 (RANDOM) [NONCONN_IND] "undefined", adv_msg_cnt: 1, adv_int: 0us, last_rssi: -72
2021-04-14 09:27:49,632 # [ 6] d0:d0:03:70:8a:a9 (PUBLIC) [NONCONN_IND] "undefined", adv_msg_cnt: 1, adv_int: 0us, last_rssi: -84
2021-04-14 09:27:49,640 # [ 7] 46:bc:91:11:93:dd (RANDOM) [IND] "undefined", adv_msg_cnt: 1, adv_int: 0us, last_rssi: -87
2021-04-14 09:27:49,649 # [ 8] 3a:70:ca:cd:35:85 (RANDOM) [NONCONN_IND] "undefined", adv_msg_cnt: 1, adv_int: 0us, last_rssi: -80
2021-04-14 09:27:49,658 # [ 9] 70:51:39:c2:62:4e (RANDOM) [IND] "undefined", adv_msg_cnt: 2, adv_int: 306us, last_rssi: -73
2021-04-14 09:27:49,658 # 

@@ -1,6 +1,6 @@
PKG_NAME = nimble
PKG_URL = https://github.com/apache/mynewt-nimble.git
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are temporarily switching the upstream needs to be changed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, as the fork is PRed, it seems like git is fine in finding the correct branch by the provided hash

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for me:

make -C examples/nimble_scanner/ distclean clean flash term
rm -rf /home/francisco/workspace/RIOT/examples/nimble_scanner/bin/nrf52dk/pkg-build/nimble
rm -rf /home/francisco/workspace/RIOT/build/pkg/nimble
rm -rf /home/francisco/workspace/RIOT/examples/nimble_scanner/bin/nrf52dk/pkg-build/nimble
Building application "nimble_scanner" for "nrf52dk" with MCU "nrf52".

[INFO] cloning nimble
Cloning into '/home/francisco/workspace/RIOT/build/pkg/nimble'...
remote: Enumerating objects: 140, done.
remote: Counting objects: 100% (140/140), done.
remote: Compressing objects: 100% (105/105), done.
remote: Total 41218 (delta 54), reused 66 (delta 29), pack-reused 41078
Receiving objects: 100% (41218/41218), 12.28 MiB | 10.42 MiB/s, done.
Resolving deltas: 100% (25153/25153), done.
fatal: reference is not a tree: 7a2bf2420a1a2bcd8bbb5bf8f5927882b3c499c0
make[1]: *** [/home/francisco/workspace/RIOT/pkg/pkg.mk:123: /home/francisco/workspace/RIOT/build/pkg/nimble/.git] Error 128
make: [/home/francisco/workspace/RIOT/examples/nimble_scanner/../../Makefile.include:696: pkg-prepare] Error 2 (ignored)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs addressing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this is addressed please trigger the ci.

@fjmolinas
Copy link
Contributor

Seems like there was an issue with the shell command still using xtimer. I simply pulled in the commit from #16324 that migrates the nimble_netif shell command to use ztimer as well. At least locally here everything runs as expected now.

I'm still getting it:

2021-04-14 09:39:44,831 # help
2021-04-14 09:39:44,833 # Command              Description
2021-04-14 09:39:44,837 # ---------------------------------------
2021-04-14 09:39:44,840 # scan                 trigger a BLE scan
> scan
2021-04-14 09:39:46,439 # scan
2021-04-14 09:39:46,456 # Scanning for 1000ms now ...Stack pointer corrupted, reset to top of stack
2021-04-14 09:39:46,456 # FSR/FAR:
2021-04-14 09:39:46,458 #  CFSR: 0x00000092
2021-04-14 09:39:46,460 #  HFSR: 0x40000000
2021-04-14 09:39:46,461 #  DFSR: 0x00000000
2021-04-14 09:39:46,463 #  AFSR: 0x00000000
2021-04-14 09:39:46,464 # MMFAR: 0x20000018
2021-04-14 09:39:46,465 # Misc
2021-04-14 09:39:46,467 # EXC_RET: 0xfffffff1
2021-04-14 09:39:46,724 # main(): This is RIOT! (Version: 2021.04-devel-1345-g22d644-pr-16317)
2021-04-14 09:39:46,727 # NimBLE Scanner Example Application
2021-04-14 09:39:46,730 # Type `scan help` for more information

@haukepetersen
Copy link
Contributor Author

Yes, just noticed this myself. Also I noticed, that the newly added commit is not even related to this example application, as the nimble_netif shell command is not used here... Will dig a little deeper.

@haukepetersen
Copy link
Contributor Author

Havn't found the reason, but it seems some changes in RIOT could be the cause:

2021-04-14 10:23:15,677 # Scanning for 500ms now ...
2021-04-14 10:23:15,698 # sys/malloc_thread_safe/malloc_wrappers.c:30 => 0x265
2021-04-14 10:23:15,700 # *** RIOT kernel panic:
2021-04-14 10:23:15,702 # FAILED ASSERTION.
2021-04-14 10:23:15,702 # 
2021-04-14 10:23:15,703 # *** halted.
2021-04-14 10:23:15,703 # 
2021-04-14 10:23:15,704 # Inside isr 1
2021-04-14 10:23:15,708 # Stack pointer corrupted, reset to top of stack
2021-04-14 10:23:15,709 # FSR/FAR:
2021-04-14 10:23:15,710 #  CFSR: 0x00000000
2021-04-14 10:23:15,712 #  HFSR: 0x80000000
2021-04-14 10:23:15,713 #  DFSR: 0x00000002
2021-04-14 10:23:15,715 #  AFSR: 0x00000000
2021-04-14 10:23:15,715 # Misc
2021-04-14 10:23:15,717 # EXC_RET: 0xfffffff1
2021-04-14 10:23:15,975 # main(): This is RIOT! (Version: 2021.04-devel-1344-g398c3-opt_nimble_ztimer)
2021-04-14 10:23:15,978 # NimBLE Scanner Example Application
2021-04-14 10:23:15,981 # Type `scan help` for more information

A malloc problem would also explain, why the hard-fault is only visible right after system reboot. Will keep digging.

@haukepetersen
Copy link
Contributor Author

Findings:
a) the hard fault is unrelated to the changes in this PR -> it can also be reproduced in master
b) it seems that somewhere under specific circumstances, (can't pinpoint it yet) NimBLE is indireclty calling malloc from interrupt context
c) since #15606 was merged, this leads to a crashing nimble_scanner application

@fjmolinas
Copy link
Contributor

Findings:
a) the hard fault is unrelated to the changes in this PR -> it can also be reproduced in master
b) it seems that somewhere under specific circumstances, (can't pinpoint it yet) NimBLE is indireclty calling malloc from interrupt context
c) since #15606 was merged, this leads to a crashing nimble_scanner application

If this is reproducible in master, then if an issue is opened I won't consider it a blocker for this PR.

@haukepetersen
Copy link
Contributor Author

It is :-)

I will open the issue so we can progress with the ztimer migration for NimBLE. Now after 2 hours of debugging I did not close in on the malloc issue, so it seems to be something more complicated...

@fjmolinas
Copy link
Contributor

Findings:
a) the hard fault is unrelated to the changes in this PR -> it can also be reproduced in master
b) it seems that somewhere under specific circumstances, (can't pinpoint it yet) NimBLE is indireclty calling malloc from interrupt context
c) since #15606 was merged, this leads to a crashing nimble_scanner application

But it seems it was introduced after #15606 since I can't seem to reproduce from that commit.

@haukepetersen
Copy link
Contributor Author

I had the same idea about this just minutes ago :-) In the middle of bisecting it right as we speak.

@haukepetersen
Copy link
Contributor Author

Took a while but I think I found the issue:

  • in nimble/ll: Use jrand48 for pseudo random generator apache/mynewt-nimble#883 the pseudo random number generator for nimble was switched from using the stdlibs rand() to jrand48()
  • the function wrapping this call (ble_ll_rand()) is called not only from thread context, but as it seems under some circumstances also from interrupt context
  • the seeding for jrand48() is done during late initialization, done the first time ble_ll_rand() is called
  • it seems that the first ever call to jrand48() triggers a malloc call
  • if this initial call happens to be run in interrupt context, we get the failed assertion triggered by malloc_thread_safe

So much for the observed hard fault, now towards verifying and fixing it :-)

@kaspar030
Copy link
Contributor

Took a while but I think I found the issue:

nice catch!

@fjmolinas
Copy link
Contributor

Took a while but I think I found the issue:

Nice catch indeed.

@haukepetersen
Copy link
Contributor Author

Fix is on its way :-) Will propose something to NimBLE upstream so the fix is not tied to the RIOT specific initialization...

@fjmolinas
Copy link
Contributor

Fix is on its way :-) Will propose something to NimBLE upstream so the fix is not tied to the RIOT specific initialization...

Yep, offline @kaspar030 mentioned he is OK with merging during soft-feature freeze. So let's go ahead!

@haukepetersen
Copy link
Contributor Author

Fix is PRed, lets see if my approach is accepted...

@fjmolinas Awesome, I just need to remove the last commit and this PR should be good to go, right?!

@haukepetersen haukepetersen added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Apr 14, 2021
@haukepetersen
Copy link
Contributor Author

@fjmolinas you might be right with the upstream nimble repo:

[INFO] cloning nimble
git-cache: updating remote 599d7830fe58fc2956d866b36dc3c27fb7ad754d
git-cache: trying checkout from source
fatal: reference is not a tree: 7a2bf2420a1a2bcd8bbb5bf8f5927882b3c499c0

seems Murdocks git cache can't find the needed commit... So for now I changed the upstream repo to link directly to my nimble fork.

@haukepetersen
Copy link
Contributor Author

Also included the ztimer migration for the nimble_netif shell command again. Without it Murdock fails to build some examples, so lets simply include it in this PR to make life easier...

@haukepetersen
Copy link
Contributor Author

Alright, Murdock is happy now.

@fjmolinas do you want to give this one more try/final review? Just verified that gnrc_networking (ping between two nodes) and the GATT example are still working as expected, besides the known hard-fault on scan issue that is...

@fjmolinas
Copy link
Contributor

Alright, Murdock is happy now.

@fjmolinas do you want to give this one more try/final review? Just verified that gnrc_networking (ping between two nodes) and the GATT example are still working as expected, besides the known hard-fault on scan issue that is...

Other then the sc_nimble_netif there have not been any changed since my last test, so I just perform the test procedure described in #16324 one last time.

@fjmolinas
Copy link
Contributor

All good

> ble scan
2021-04-15 10:16:08,434 # ble scan
2021-04-15 10:16:08,436 # scanning (for 500ms) ...
2021-04-15 10:16:08,936 # done
2021-04-15 10:16:08,944 # [ 0] 88:c6:26:c4:70:eb (PUBLIC) [IND] "", adv_msg_cnt: 7, adv_int: 35470us, last_rssi: -74
2021-04-15 10:16:08,953 # [ 1] f8:da:0c:58:5e:1c (PUBLIC) [SCAN_IND] "undefined", adv_msg_cnt: 8, adv_int: 47886us, last_rssi: -70
2021-04-15 10:16:08,962 # [ 2] 7a:39:56:35:44:92 (RANDOM) [IND] "undefined", adv_msg_cnt: 2, adv_int: 307us, last_rssi: -75
> ble scan 1000
2021-04-15 10:16:14,387 # ble scan 1000
2021-04-15 10:16:14,389 # scanning (for 1000ms) ...
2021-04-15 10:16:15,389 # done
2021-04-15 10:16:15,398 # [ 0] 7a:39:56:35:44:92 (RANDOM) [IND] "undefined", adv_msg_cnt: 6, adv_int: 93862us, last_rssi: -76
2021-04-15 10:16:15,408 # [ 1] f8:da:0c:58:5e:1c (PUBLIC) [SCAN_IND] "undefined", adv_msg_cnt: 13, adv_int: 49495us, last_rssi: -70
2021-04-15 10:16:15,416 # [ 2] 88:c6:26:c4:70:eb (PUBLIC) [IND] "", adv_msg_cnt: 17, adv_int: 48753us, last_rssi: -73
> ble scan 5000
2021-04-15 10:16:18,892 # ble scan 5000
2021-04-15 10:16:18,894 # scanning (for 5000ms) ...
2021-04-15 10:16:23,896 # done
2021-04-15 10:16:23,904 # [ 0] 6a:28:09:fc:81:d4 (RANDOM) [IND] "undefined", adv_msg_cnt: 1, adv_int: 0us, last_rssi: -86
2021-04-15 10:16:23,914 # [ 1] f8:da:0c:58:5e:1c (PUBLIC) [SCAN_IND] "undefined", adv_msg_cnt: 86, adv_int: 54647us, last_rssi: -70
2021-04-15 10:16:23,922 # [ 2] 88:c6:26:c4:70:eb (PUBLIC) [IND] "", adv_msg_cnt: 67, adv_int: 66267us, last_rssi: -85
2021-04-15 10:16:23,930 # [ 3] 7a:39:56:35:44:92 (RANDOM) [IND] "undefined", adv_msg_cnt: 15, adv_int: 260085us, last_rssi: -78

Copy link
Contributor

@fjmolinas fjmolinas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK!

@fjmolinas fjmolinas merged commit ae6ffa3 into RIOT-OS:master Apr 15, 2021
@haukepetersen haukepetersen deleted the opt_nimble_ztimer branch April 15, 2021 08:40
@haukepetersen
Copy link
Contributor Author

Very nice, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: BLE Area: Bluetooth Low Energy support CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants