-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High bluetooth ISR latency with CONFIG_BT_MAX_CONN=2 #29994
Comments
It could perhaps be this?
@jhedberg Is this irq_lock really needed? We don't lock irq anywhere else where we lookup the conn from the handle. |
Reporter see increasing latencies over time which has no explanation, unless there is some sort of O(n) traversal |
Setting this as high since ISR latency is critical for most users. |
Well, you do display the worst case recorded right? So it could be that over time we hit the worst case scenario? |
Yeah, I don't think it's actually linear growth, just more time to get unlucky. I've seen assertions firing anywhere between 2 and 8 hours of uptime. (edit for clarification: I saw the assertions with my own project, not the sample) |
@jmgao Can you try and comment out the irq-lock/unlock in hci_num_completed_packets around bt_conn_lookup_handle? |
Commenting out the first set of irq_lock/unlocks doesn't seem to help. |
Sidenote: increasing BT_CTLR_ULL_HIGH_PRIO and BT_CTLR_ULL_LOW_PRIO to 1 appears to make the symptoms go away. It's strange to me that the critical radio ISR and the low priority upper level ISR have the same priority by default. Is ULL_LOW_PRIO actually on a hard realtime deadline? If not, that seems like it's just asking for priority inversion... |
I agree, that looks like strange IRQ priorities defaults. |
Oh, whoops, I think I tested with zero latency interrupts in my latest comment: with ZLI off, I'm still seeing elevated latencies with BT_CTLR_ULL_LOW_PRIO increased (and with HIGH_PRIO increased as well), although maybe somewhat better better (max of 65 after 5 minutes) |
Using ZLI, controller interrupts are at highest proirity and hence other interrupts or irq locks do not introduce latency.
As latencies are observed with ULL_LOW and ULL_HIGH increased, indicate that there is some other interrupt at the same priority (ARM interrupt priority value 0) as the Radio interrupt. As this is your own project, what other interrupts/peripherals are you using? |
This is with the peripheral sample.
Or irq_lock in a lower priority interrupt, right? |
During my work in progress tasks, I am notice such gradual increase in reported latency by the controller. I am suspecting logging, hence, please disable logging subsystem, use a printk instead to display the ISR latencies and let us know if there is similar gradual increased latencies. |
|
No added latency seen for > 15 mins of connection with phone as central and simultaneous advertising when using the below:
@koffes you may have interest here. |
I can confirm that I am seeing similar issues.
I tried with |
@koffes What was the result for following in your application: |
Controller currently has an O(n) execution on scheduling conflict resolution and request processing in ULL_HIGH and ULL_LOW execution contexts. When the Radio ISR, LLL, ULL_HIGH, and ULL_LOW execution are configured at the same IRQ priority, these introduce ISR latencies amongst themself. There exists 3 Kconfig options to mitigate controller's self induced ISR Latencies
I will appreciate, if the options 1 to 3 are tested and any observations is noted here in this issue. |
Option 1 appeared to fix things for me, option 2 still had elevated latencies. |
@cvinayak : I tested with the combination you suggested. It then asserts almost right away. The call stack looks like: Let me know and I can show you my debug session. |
@jmgao Have you tried 3 as well? |
any updates on that? |
lowering to medium because this only happens with shell enabled. |
Sorry, missed the questions in my email inbox. I vaguely recall trying option 3 and it increasing the time to failure to a day or two, but it's been a while. I don't believe this requires shell being enabled to happen, although that might increase the chances of a fatal assert happening. |
@carlescufi I am unable to reproduce on recent master. Attempted with Also tried in our custom application with TRACING and SHELL active. |
Sorry for the delay, I also had some problems reproducing, but it was because I tested without connecting to the device, because I was using a different board that connections were failing on, and spent the last few days trying to figure out why. (I was running it on a board that doesn't have a 32kHz crystal, and apparently I tested with particle_xenon on master as of 5 minutes ago: it reproduces without shell enabled. Cherry-picking #30829 does not help. (I was wondering if the problem could be related to RC oscillator calibration, because I vaguely recall skimming across something in an NRF datasheet saying that it would do regular calibration. I checked the board I used in the original report and particle_xenon: both of them are definitely using |
You should not use
Which application did you use, your own or a zephyr sample? |
The Zephyr sample in samples/bluetooth/peripheral |
Do you have a Nordic nRF52840 DK that you could try to reproduce this on? @pabigot you have a Particle Xenon, right? Could you please follow the instructions above and see if you can reproduce? |
On xenon I do get a spike latency around the time service discovery is done. It seems to be a one-time thing, and I don't see any asserts.
edit: I see the same thing on nRF52840 DK, spikes there of 74 then 79. |
Thank you @pabigot I am not able to reproduce the assertion too, do see similar latencies on nrf52840dk. |
I have been adding implementation to in-system profile other controller's ISRs, and have noticed an implementation flaw that executes both ULL_HIGH and ULL_LOW in the same ISR when their priorities are same (instead of tailchaining/yielding so as to permit the naturally higher priority radio ISR exection which is at the same IRQ priority level as the ULL_HIGH and ULL_LOW).
With the new ticker implementation in the split controller, it is not feasible to have LLL and ULL_HIGH/ULL_LOW at same IRQ priority level, hence have submitted #31753 @jmgao please check that you get reduced Radio ISR latency using this PR: #31753 |
Fix the default ULL_HIGH execution priority to be lower than LLL execution priority to reduce Radio ISR service latency. Also, default to Zero Latency IRQ for Radio ISR if enabled in the kernel. Fixes zephyrproject-rtos#29994. Signed-off-by: Vinayak Kariappa Chettimada <[email protected]>
Fix the default ULL_HIGH execution priority to be lower than LLL execution priority to reduce Radio ISR service latency. Also, default to Zero Latency IRQ for Radio ISR if enabled in the kernel. Fixes #29994. Signed-off-by: Vinayak Kariappa Chettimada <[email protected]>
Describe the bug
I'm seeing high ISR latencies with CONFIG_BT_MAX_CONN set to 2, which I suspect might be resulting in a
!radio_is_ready()
assertion firing after multiple hours (perhaps due to unlucky collision with another ISR).To Reproduce
Apply the following patch to samples/bluetooth/peripheral:
Build and flash (tested on particle_xenon).
Use nRF Connect to connect to the device, open the vendor-defined service, and wait.
I'm seeing maximum ISR latencies of >50us within a minute or so, and >70us after a few minutes of waiting.
Expected behavior
Lower peak ISR latencies
Impact
!radio_is_ready
assertions firingLogs and console output
Environment (please complete the following information):
Additional context
Add any other context about the problem here.
@cvinayak
The text was updated successfully, but these errors were encountered: