-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
samples: bluetooth: central_iso: fix synchronization of data submission #72622
Conversation
Update: This approach is too brittle. I tested it under different circumstances and it turns out that my initial assumption was wrong that the connected event would be a stable anchor for synchronization. I'll have to implement the timstamp-based approach right away. (Therefore converted back to Draft.) |
Done. Known issue (cf Discord), IMO to be fixed in a separate PR:
If you don't like the surrounding changes (refactoring, reversal of payload length, etc.) just let me know and I'll remove this unrelated stuff. |
struct bt_iso_tx_info iso_tx_info = {0}; | ||
int err; | ||
|
||
err = bt_iso_chan_get_tx_sync(chan, &iso_tx_info); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fgrandel , thanks for spending time on improving this sample!
I would recommend to not use timestamps when providing data in this sample.
The increase in sequence number gives the controller enough information about when it is to be scheduled on air. See https://developer.nordicsemi.com/nRF_Connect_SDK/doc/latest/nrfxlib/softdevice_controller/doc/isochronous_channels.html#providing-data for more information on how the Nordic SoftDevice Controller processing incoming HCI ISO Data.
When using timestamps it is possible to provide the data to the controller right before it is going to be sent on air.
Unfortunately, this timing information cannot be obtained in a controller-agnostic way:
The timestamp returned from the HCI LE Read ISO TX Sync command is ambiguous, the returned value depends on how the controller implementation interprets the spec. Also, the command documentation refers to CIG/BIG reference anchor points which may not align with the SDU generation if ISO_Interval != SDU interval (up to the controller). These things make it hard to rely on this return value.
See the open spec errata https://bluetooth.atlassian.net/browse/ES-23138 for more details.
Another note: The timing of the iso_sent
is also implementation dependent. It is raised when the controller raises the HCI Number of Completed Packets event. This event may be raised when the controller is ready to receive more data. This may be the case when the payload is scheduled for transmission. That is, even before it is sent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLDR; I see why the sent-callback + HCI LE Read ISO TX Sync + timestamp approach is lacking and depends on controller implementation details. But IMO it is still the only approach that works at all to reliably synchronize ISO channels for the moment being.
Arguments following:
Thanks for the link. Unfortunately I don't have permission to access it - guess that's only for SIG members.
The SDU vs. ISO interval argument is convincing - however as long as all our options are to some degree depending on the controller implementation, I'd still prefer the approach recommended by the core spec. By the time we'll implement more controllers, I guess the spec will have been updated and the errata removed in a somehow backwards compatible way, so we can move on from there.
In Zephyr we can and should decouple from implementation details by wrapping the controller specific parts behind more usable high-level APIs in the meantime. This would also improve developer experience by protecting them from the hacky and error-prone synchronization details. Plus it guarantees a migration path.
The dependence of iso_sent on the controller implementation is also a good argument. But we still need to provide developers with a working solution until the spec does so. That's AFAICS, why Nordic recommends the HCI LE Read ISO TX Sync + timestamp solution for the moment being and uses the send callback as a trigger. Also see the recommended approach in the link you provided:
This is the preferred way of providing data to the SoftDevice Controller and guarantees the highest degree of control.
Re sent-callback latency: Introducing a delay of one ISO interval w/o technical need is not a good way of interpreting the slack in the spec IMO. We should at least provide best-effort latency as far as technically possible - even more as it seems to be the only workable solution for deterministic synchronization for the time being. So the controller really should deliver some sufficient degree of latency guarantees beyond what is required by the spec for this specific callback/HCI command until the spec provides a workable solution itself.
The seq-number and "time-of-arrival" approaches don't work unless you admit at least two ISO intervals of extra latency to avoid systematic package loss (as demonstrated by the bug I'm fixing here). This I find inacceptable because it counters the concept of ISO channels with unframed PDUs and deterministic latency. Also see the latency promises of the CIG config API in Zephyr which will not be met. You need to add at least two sequence numbers all the time to ensure that:
- the initially (but non-deterministically) skipped event gets accounted for.
- source-to-CIG anchor point offset doesn't play a role even if you happen to synchronize too closely to the next anchor point so that the controller will pick up your SDUs too late.
In addition the "time-of-arrival" approach is not consistent with the spec AFAICS as flushing cannot be controlled deterministically by the application. It's the point of the ISO protocol that you can synchronize your source with the sink with deterministic latency and time-to-live of data packets.
On the peripheral both approaches don't work at all because neither the time-of-arrival nor the seq number approach are able to account for drift of the peripheral's clock. This cannot work by design if you want well-synchronized data across devices. This also explains why the (local) timer approach does systematically not work.
In our case for example we do decentralized measurements on peripherals in sync with the central's distributed clock across several devices and do explicitly want those measurements be sent out in the same CIG event with deterministic latency so we don't need timestamps and get cheap syntonization of all peripherals to the master clock w/o having to implement our own syntonization approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the peripheral both approaches don't work at all because neither the time-of-arrival nor the seq number approach are able to account for drift of the peripheral's clock. This cannot work by design if you want well-synchronized data across devices. This also explains why the (local) timer approach does systematically not work.
Fun fact: An ISO peripheral from BT Core spec 5.4 is not aware of the SDU interval and can actually never transmit correctly (reliably) 8)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another comment for the use of bt_iso_chan_get_tx_sync
: When we use it like this here, we can use the current response to compare with the previous response. IIRC (and please correct me @rugeGerritsen) the return values of bt_iso_chan_get_tx_sync
are only when an SDUs have been sent (or scheduled for transmission). In the case that the timestamp is too low or the sequence number is our of order, the return value of the current and previous should be the same.
e.g.
- App send SDU_1 with TS_1 and PSN_1
- App gets
sent
callback and gets TS_1 and PSN_1 frombt_iso_chan_get_tx_sync
- App sends SDU_2 wit TS_1 and PSN_2 (or TS_2 and PSN_1)
- Controller reject this as invalid (which not all controllers may do), but still triggers a Number of Completed Packets event
- App gets
sent
callback and gets TS_1 and PSN_1 frombt_iso_chan_get_tx_sync
(since it was not transmitted or scheduled for transmission) - App is now aware that the previous TX went wrong and could attempt to fix by increasing PSN or TS accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case that the timestamp is too low or the sequence number is our of order, the return value of the current and previous should be the same.
Only when the Controller has received a SDU from Host, the returned Sequence Number and Timestamp correspond to the last transmitted or scheduled SDU are calculated.
- If
bt_iso_chan_get_tx_sync
is called multiple times without a SDU enqueued, it will return same Sequence Number and Timestamp corresponding to the instant the previousbt_iso_chan_send
was called. - Every
bt_iso_chan_get_tx_sync
afterbt_iso_chan_send
call will return the Sequence Number and Timestamp of the last transmitted or scheduled SDU. Hence, for the consecutive call tobt_iso_chan_send
, the next Sequence Number (returned value plus one) and next Timestamp (returned value plus one SDU interval) be provided tobt_iso_chan_send_ts
. - If
bt_iso_chan_get_tx_sync
followed bybt_iso_chan_send_ts
is called in the sent-callback, then the returned Sequence Number and Timestamp could be stale depending on the SDU intervals elapsed from the last call ofbt_iso_chan_send_ts
to when the current sent-callback executes. If sufficient SDU buffers into the future has been already provided to the Controller, thenbt_iso_chan_get_tx_sync
will continue to return future Sequence Number and Timestamp ensuring new SDU enqueued to the Controller are not stale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to implement a drift compensation algorithm now that:
- does no longer rely on the sent callback
- does not rely on the exact point in time that controllers store the TX sync information (even if the sync timestamp lies in the future)
- should continue to work if SDU interval != ISO interval
- uses an algorithm that is able to calculate the SDU interval even on a peripheral
- works with BT > 1 and FT > 1
- tolerates a certain amount of package loss
- compensates for initial non-standard delay of the second SDU (split controller ISOAL bug?)
- uses seq_num only when sending the sdu (no more timestamp dependency)
For obvious reasons I can not compensate for TX timestamp specification ambiguities but as I have to rely on hardware-specific timing anyway as long as the bluetooth clock is not exposed by the API in some hardware-independent way, this is not a problem in practice as the split controller defines the timestamp in the same way as the Nordic SD controller AFAICS. Let's hope that by the time more hardware is being supported by the sample, this ambiguity will have been fixed.
An ISO peripheral from BT Core spec 5.4 is not aware of the SDU interval
Hrm, I think I found a way to derive the SDU interval from channel info that is available both, on the central and the peripheral:
Transport_Latency = CIG_Sync_Delay + FT x ISO_Interval - SDU_Interval, see Bluetooth Core, Version 5.4, Vol 6, Part G, Section 3.2.2.
Therefore:
SDU_Interval = CIG_Sync_Delay + FT x ISO_Interval - Transport_Latency
These parameters are available from the HCI LE CIS Established event.
Zephyr Controller follows a similar design (implementation choice) to ACL Sending Data for Sending ISO data too, refer to Bluetooth Core Specification Version 5.4, Vol 6, Part D, Section 6.1 Sending Data. In this design, the Number of Completed Packets is generated when corresponding buffer space is freed when acknowledged and/or flushed (in the case of CIS in the Zephyr Controller). But in the implementation of ISO ack/flush, the Zephyr Controller implementation reuses the code (control path) that is determining stale PDUs hence the Number of Completed Packets is delayed by one ISO Interval. An optimization is good to have, but not necessary, as the Specification states (Bluetooth Core Specification Version 5.4, Vol 4, Part E, Section 7.7.19 Number Of Completed Packets event): Hence, an application should not rely on the rate at which the sent callback is called. Application will need to ensure there is enough buffers available/configured in the Controller to be able to provide the ISO data with corresponding sequence numbers for the future SDU intervals. Zephyr Controller also needs that flush timeout is considered when determining the buffer number, as the implementation does not generate Number of Complete Packets until they are flushed under bad transmissions and NACK conditions. i.e for FT=2, Zephyr Controller needs 5 HCI ISO Data buffers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As others have pointed out, proper synchronization of ISO data over HCI as a generic host (i.e. not knowing what controller we use) is pretty hard (actually impossible, but can be done "good enough").
The use and requirements of both sequence numbers and timestamps are pretty vague / implementation specific in the core spec as per 5.4 unfortunately. Hopefully this will be improved in the future.
Some comments that need to be addressed
*/ | ||
static void iso_timer_timeout(struct k_work *work) | ||
static void iso_send_sdu(uint32_t seq_num, int32_t next_sdu_ts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static void iso_send_sdu(uint32_t seq_num, int32_t next_sdu_ts) | |
static void iso_send_sdu(uint32_t seq_num, uint32_t next_sdu_ts, bool is_first_sdu) |
I think this will make more sense. There's not really any reason to encode the seq_num and "is_first" into the same variable, and since -1
was only called one place this seems like a better solution to me.
Furthermore, the timestamp is a uint32_t
, so for this to have worked the variable should otherwise have been a int64_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach is now obsolete, so should be resolved. I returned to a timer-based approach as recommended.
static uint8_t buf_data[CONFIG_BT_ISO_TX_MTU]; | ||
static bool data_initialized; | ||
static size_t len_to_send = ARRAY_SIZE(buf_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static size_t len_to_send = ARRAY_SIZE(buf_data); | |
static const size_t len_to_send = ARRAY_SIZE(buf_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
len_to_send is now updated again, so this should be obsolete, please check if I'm not being confused here. ;-)
.connected = iso_connected, | ||
.sent = iso_sent, | ||
.disconnected = iso_disconnected, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.connected = iso_connected, | |
.sent = iso_sent, | |
.disconnected = iso_disconnected, | |
.connected = iso_connected, | |
.disconnected = iso_disconnected, | |
.sent = iso_sent, |
or
.connected = iso_connected, | |
.sent = iso_sent, | |
.disconnected = iso_disconnected, | |
.connected = iso_connected, | |
.disconnected = iso_disconnected, | |
.sent = iso_sent, |
Personal opinion, but it kind of stood out with sent
in the middle of the connected and disconnected callbacks, and also not using the same indentation scheme :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sent callback is no longer needed, so I returned to the previous code. This should therefore be obsolete.
Agreed. But there are good arguments why it should be improved regardless of what the spec allows (see my response to @rugeGerritsen above). This is required to cover up for weaknesses in the spec in other places to achieve a workable solution at all. Guess you agree that it is also not against the spec to improve latency on the sent-callback. Gaining one ISO interval should provide us with enough leeway so that the sent-callback latency is deterministic enough under all practical circumstances to synchronize ISO channels even if HCI works over reasonable remote links. Can you agree with that statement? |
struct bt_iso_tx_info iso_tx_info = {0}; | ||
int err; | ||
|
||
err = bt_iso_chan_get_tx_sync(chan, &iso_tx_info); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another comment for the use of bt_iso_chan_get_tx_sync
: When we use it like this here, we can use the current response to compare with the previous response. IIRC (and please correct me @rugeGerritsen) the return values of bt_iso_chan_get_tx_sync
are only when an SDUs have been sent (or scheduled for transmission). In the case that the timestamp is too low or the sequence number is our of order, the return value of the current and previous should be the same.
e.g.
- App send SDU_1 with TS_1 and PSN_1
- App gets
sent
callback and gets TS_1 and PSN_1 frombt_iso_chan_get_tx_sync
- App sends SDU_2 wit TS_1 and PSN_2 (or TS_2 and PSN_1)
- Controller reject this as invalid (which not all controllers may do), but still triggers a Number of Completed Packets event
- App gets
sent
callback and gets TS_1 and PSN_1 frombt_iso_chan_get_tx_sync
(since it was not transmitted or scheduled for transmission) - App is now aware that the previous TX went wrong and could attempt to fix by increasing PSN or TS accordingly
struct bt_iso_tx_info iso_tx_info = {0}; | ||
int err; | ||
|
||
err = bt_iso_chan_get_tx_sync(chan, &iso_tx_info); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case that the timestamp is too low or the sequence number is our of order, the return value of the current and previous should be the same.
Only when the Controller has received a SDU from Host, the returned Sequence Number and Timestamp correspond to the last transmitted or scheduled SDU are calculated.
- If
bt_iso_chan_get_tx_sync
is called multiple times without a SDU enqueued, it will return same Sequence Number and Timestamp corresponding to the instant the previousbt_iso_chan_send
was called. - Every
bt_iso_chan_get_tx_sync
afterbt_iso_chan_send
call will return the Sequence Number and Timestamp of the last transmitted or scheduled SDU. Hence, for the consecutive call tobt_iso_chan_send
, the next Sequence Number (returned value plus one) and next Timestamp (returned value plus one SDU interval) be provided tobt_iso_chan_send_ts
. - If
bt_iso_chan_get_tx_sync
followed bybt_iso_chan_send_ts
is called in the sent-callback, then the returned Sequence Number and Timestamp could be stale depending on the SDU intervals elapsed from the last call ofbt_iso_chan_send_ts
to when the current sent-callback executes. If sufficient SDU buffers into the future has been already provided to the Controller, thenbt_iso_chan_get_tx_sync
will continue to return future Sequence Number and Timestamp ensuring new SDU enqueued to the Controller are not stale.
} | ||
|
||
iso_send_sdu(iso_tx_info.seq_num + 1, iso_tx_info.ts + INTERVAL_US); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iso_send_sdu(iso_tx_info.seq_num + 1, iso_tx_info.ts + INTERVAL_US); | |
iso_send_sdu(iso_tx_info.seq_num + 1U, iso_tx_info.ts + INTERVAL_US); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach has been reverted, so should be resolved.
if (unlikely(is_first_sdu)) { | ||
ret = bt_iso_chan_send(&iso_chan, buf, seq_num); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If is_first_sdu
, call bt_iso_chan_send
once then followed by call to bt_iso_chan_get_tx_sync
and bt_iso_chan_send_ts
consecutively ((iso_tx.rtn + 1U) * 2U)
times, so as to ensure the Controller has sufficient SDUs buffered before sent-callback under the case of retransmissions.
Give this a try under noisy conditions or when devices are far away. Let me know if the Sequence Number and Timestamp get back in sync when the radio condition get better, when devices move back close.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe, I now implemented an approach that should tolerate package loss. If it doesn't 100% I hope that my new approach has enough merit that it can be merged anyway. Should I experience problems with synchronization under package loss in practical usage in the future I'll open up an other PR if you don't mind.
@rugeGerritsen @cvinayak @Thalley a quick message to thank you for your reviews. I've just not been able to implement it yet, because I'm short on time. But will do. |
Previously the BT_CTLR_CONN_ISO_SDU_LEN_MAX value defaulted to the "magic" number 251 which actually is the default MTU. We make this dependency explicit (and thereby also transparently configurable) by setting the default to BT_ISO_TX_MTU explicitly. This also reduces the probability of inadvertent configuration errors or inconsistencies. Signed-off-by: Florian Grandel <[email protected]>
Fixes a minor typo in the Zephyr BT controller configuration overlay. Signed-off-by: Florian Grandel <[email protected]>
Introduces reversed xmas-tree format to local variables before making changes to them for better readability of subsequent commits. Signed-off-by: Florian Grandel <[email protected]>
The change replaces magic numbers in the ISO CIG configuration by preprocessor definitions. Signed-off-by: Florian Grandel <[email protected]>
Replaces static constant variables by preprocessor definitions for improved readability and (minor) RAM usage improvement. Signed-off-by: Florian Grandel <[email protected]>
Remove CONFIG_BT_CTRL_SCAN_DATA_LEN_MAX: The sample does not employ extended scanning. Remove CONFIG_BT_CTLR_CONN_ISO_SDU_LEN_MAX: This setting defaults to BT_ISO_TX_MTU which is set in the controller-independent prj.conf file and should default to that value. Signed-off-by: Florian Grandel <[email protected]>
The sample used deferred work to schedule ISO SDUs. The deferred work API does not guarantee deterministic timing, though. This commit switches to kernel timers which allow for deterministic periodic or one-off timing as long as timers are re-scheduled in the timer callback which runs directly in ISR context. Signed-off-by: Florian Grandel <[email protected]>
This change introduces a drift compensation algorithm that keeps SDU preparation synchronized with the bluetooth clock in a role, HCI transport and controller agnostic way based on standard HCI commands. Signed-off-by: Florian Grandel <[email protected]>
This change increases the max SDU size to twice the available max PDU size which results in a burst number (BN) larger than one. The sample thereby demonstrates that the drift compensation algorithm can deal with BN > 1. Signed-off-by: Florian Grandel <[email protected]>
Setting the SDU latency parameter to a value larger than the SDU interval results in a flush timeout (FT) larger than one if at the same time the ISO policy is switched from "low latency" to "reliability". This demonstrates that the synchronization algorithm can deal with FT > 1. Signed-off-by: Florian Grandel <[email protected]>
* | ||
* @brief Bluetooth clock hardware abstraction for nRF5x | ||
* | ||
* @details Currently, there is no way to access the Bluetooth clock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you have seen earlier, the timestamps in the HCI packets are defined by the controller.
For the Zephyr open source link layer, the timestamp maps directly to RTC0 ticks. For the SoftDevice Controller the timestamps also takes wrapping into account. Therefore, the approach taken here will only work until the RTC wraps (8 minutes). Also, the approach taken here will only work when run on a single-core 52 series device. That is, it will not work on the nRF53 app core.
Looking at this it looks like the only feasible way forward:
- Remove reading out the controller clock
- Never issue SDUs with timestamps.
- Not use timers to trigger submission of SDUs, just push them to controller as quickly as possible.
The drawback will be that the total end-to-end latency becomes slightly longer.
If we want to optimize for reduced latency, controller-specific assumptions are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @rugeGerritsen here
As the current BT Core spec works w.r.t. ISO over HCI, we can't really make any (good) assumptions, and the best we really can do is just to always enqueue at least 2 SDUs and TX as fast as possible.
There are no guarantees we can make without knowing the specific controller.
I've recently refactored TX for a similar sample: #73406
Effectively moving a timer-based approach to just sending as fast and much as possible, based on the available buffers.
Keep in mind that the buffer handling will also change as part of #72090
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, see my comment re next steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of good changes in this PR, but not sure if some of the platform specific ones are good to have.
This is, after all, a sample and not a full application.
# Max 2 PDUs per SDU (BN = 2) | ||
CONFIG_BT_ISO_TX_MTU=494 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand this.
CONFIG_BT_ISO_TX_MTU effectively reflects the maximum size of the SDUs. The SDUs can be any value below this, and thus increasing this size does not necessarily affect the number of PDUs per SDU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sample now counts up to 494 bytes, so two PDUs will effectively be required to transfer that SDU (as the MAX PDU size payload is limited in the sample to exactly half of this value). This is required to prove that timing will work even if SDU interval != ISO interval.
/* Currently only nRF5 hardware is supported by this sample. */ | ||
#include "bt_clock_hal_nrf5.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least in upstream. I don't think we should make too many assumptions of how samples may be used by other projects, and samples should not (without proper guards at least) assume or require any specific hardware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, see my comment re next steps.
NET_BUF_POOL_FIXED_DEFINE(tx_pool, 1, BT_ISO_SDU_BUF_SIZE(CONFIG_BT_ISO_TX_MTU), | ||
|
||
static struct bt_iso_tx_info last_iso_tx_sync_info; | ||
static inline bool is_synchronized(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static inline bool is_synchronized(void) | |
static bool is_synchronized(void) |
inline
s are rarely an improvement, and should generally be left to the compiler
@@ -945,7 +945,7 @@ config BT_CTLR_CONN_ISO_SDU_LEN_MAX | |||
int "Maximum Connected Isochronous Channel SDU Length" | |||
depends on BT_CTLR_CONN_ISO | |||
range 1 4095 | |||
default 251 | |||
default BT_ISO_TX_MTU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cvinayak is BT_CTLR_CONN_ISO_SDU_LEN_MAX including the size of the bt_hci_iso_sdu_hdr
/bt_hci_iso_sdu_ts_hdr
headers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is SDU length (no HCI layer)
@Thalley @rugeGerritsen First of all: Thx again for your extensive and fast review. Highly appreciated!! Before I try to fix remaining test failures let's see how and whether I can progress in a way that fits both, your needs and mine. I totally agree with your feedback re hardware-specific code in the sample (and more or less expected it). OTOH sending data as fast as possible is really a boring solution that wouldn't motivate me to further spend my free time (as I obviously scratch my own itch here and the existing HW specific solution is totally fine for what we need). Plus sending data as fast as possible and relying on (variable) backpressure and non-deterministic latency from the stack defeats the very idea of isochronous data sources. Therefore I'd like to propose something different: I've been thinking about a generic clock syntonization (aka drift compensation) framework for a long time already. See the linked resource for a definition of clock syntonization (as opposed to synchronization). I already verified most elements of this approach within my (currently stalled) 802.15.4 TSCH protocol project. And I still believe that a general clock source (counter management/syntonization/synchronization) framework would really be a great and long overdue addition to Zephyr. And I might be able to make it useful in this case. Basically I propose to buffer local timestamps until I get corresponding TX sync info via HCI. Once I get that info I can syntonize the local clock to the remote clock via a configurable syntonization strategy within the clocksource framework. Initially I'd use Zephyr's existing clock syntonization algorithm internally. But as the syntonization approach will be pluggable, something like a PID controller could be added any time and re-used across the system (as is required for more precise clock syntonization and quicker convergence, e.g. in PTP). Assuming arbitrary latency between data preparation and data sending but limited jitter (which I'd expect to be a reasonable assumption, even in the HCI over UART case) such an approach would make the solution completely hardware / HCI agnostic and still optimize latency and (more importantly) provide stable isochronous timing of measurements across an arbitrary number of peripherals which is the very idea of isochronous channels. The syntonization framework implementation would be kept locally to the network subsystem for the time being and would leverage the existing k_time (ns precision timestamp) framework already introduced there for network timestamps. The sample would only use the API and could therefore be simplified. Would that be a compromise acceptable upstream? |
I think board specific behavior is something that fits in an application, but not necessarily a sample. If in a somewhat generic way that does not make the sample depend on it and does not make the code hard to follow, I would be OK with having board (or platform) dependent code here (so it's not an automatic rejection).
That sounds like a very interesting project, and could be a nice way of doing this. The timestamps generated by the controller is basically "whatever it wants": In my eyes, the main issue is that you cannot do anything "good" (e.g. proper use of timestamps) as a generic host. ISO over HCI is honestly just pretty terribly designed for generic hosts such as Zephyr. If* (and that's a big if) we want the sample here to assume anything about a controller, it should be based on the Zephyr controller. If you want to go the road of improving and optimizing TX, then we should add this sample to a babblesim tests in CI, and run tests there that can verify that such optimizations work, and will also ensure that they keep working by adding some pass criteria |
@Thalley @rugeGerritsen I have to apologize again for being inactive for such a long time. I've not given up on this ticket and will work on it again as soon as I've time. |
This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. |
This PR synchronizes preparation and scheduling of isochronous data to the Bluetooth clock.
It uses TX synchronization info to set and maintain a well-defined phase between the application timer and the bluetooth clock including ongoing drift compensation (see https://github.com/nrfconnect/sdk-nrf/blob/main/samples/bluetooth/iso_time_sync/src/iso_tx.c and the "LE Read ISO TX Sync" HCI command in the Bluetooth Core Spec 5.4, Vol 4 Part E, 7.8.96).
The PR introduces a synchronization approach that works on both, central and peripheral, and could therefore be ported in later PRs to showcase sending synchronized data from multiple peripherals to one central.
Fixes #72452