Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drivers: timer: nrf_rtc_timer: Add lockless, non-wraping counter reading #30829

Closed

Conversation

nordic-krch
Copy link
Contributor

@nordic-krch nordic-krch commented Dec 17, 2020

Updates z_timer_cycle_get_32 to use lockless algorithm.

Proposed algorithm is also faster than the previous one.

Additionally, updated clock control to not use locking k_uptime_get():

drivers: clock_control: Use k_cycle_get_32() for start/sop timestamping

When shell command is enabled, start/stop is timestamped to provide
status. Previously, k_uptime_get() was used which locks interrupts and
takes more cycles. It was leading to bluetooth failures because
clock is controlled in critical path and any delays are unaccepted.

Converted to use k_cycle_get_32() which has lockless implementation
but is limited to 32 bits which means that status will be invalid
if clock state change occurs too far in the past (~36 hours). Note was
added to status command help message.

Fixes #29994.

Signed-off-by: Krzysztof Chruscinski [email protected]

@nordic-krch nordic-krch added bug The issue is a bug, or the PR is fixing a bug area: Timer Timer platform: nRF Nordic nRFx labels Dec 17, 2020
@nordic-krch nordic-krch force-pushed the nrf_rtc_timer_tick_get branch from bd46353 to d8818f6 Compare December 17, 2020 15:41
@nordic-krch
Copy link
Contributor Author

I initially thought that it will solve #30777 but it is using k_uptime_get() which does not call z_timer_cycle_get_32 but z_clock_elapsed from locked context.
This is definitely worth adding but i will also look if k_uptime_get/k_tick_get can be optimized.

Copy link
Collaborator

@pabigot pabigot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle this seems OK, though every time we touch this code something breaks, so I'd want to stare at it more and see build results before approving.

If this is intended to support #29994's need to have shell track the time since a clock changed: it's not really going to help much. It does avoid #30777's problem of having the estimated duration wrap at 00:08:32 but still wraps at 36:24:32, so the shell command will still fail to give reliable information.

We'd need a 64-bit cycle counter to handle this sort of thing correctly.

@nordic-krch
Copy link
Contributor Author

We'd need a 64-bit cycle counter to handle this sort of thing correctly.

Shall we pursue adding k_cycle_get() then?

I've modified clock control to use k_cycle_get_32 for timestamping. I think that 32bit limitation (36h) is unrealistic, shell command is enabled for debugging purposes anyway.

@nordic-krch nordic-krch force-pushed the nrf_rtc_timer_tick_get branch from c355ce0 to fb6a656 Compare December 18, 2020 08:09
@nordic-krch nordic-krch requested a review from nashif as a code owner December 18, 2020 08:09
@github-actions github-actions bot added the area: Samples Samples label Dec 18, 2020
@nordic-krch nordic-krch force-pushed the nrf_rtc_timer_tick_get branch 2 times, most recently from 59a4487 to 629afe1 Compare December 18, 2020 08:14
@pabigot
Copy link
Collaborator

pabigot commented Dec 18, 2020

I sympathize with the theory that wrapping at 36 hours won't be noticed, and it's certainly less likely than 8-plus minutes, but I don't like subtle surprises in behavior like that. Since this is Nordic only I don't object, but I also don't really approve. Carry on.

I do think we need 64-bit support not only in the k_cycle API but the counter API, and am myself moving to k_uptime_ticks() as the canonical source for absolute time in Zephyr. Making people deal with non-monotonic absolute time representations is really unfriendly.

@koffes
Copy link
Collaborator

koffes commented Jan 7, 2021

Great nordic-krch. Could other relevant reviewers be added? This PR would solve some major issues for us.

@carlescufi
Copy link
Member

@jmgao could you please test this PR and see if it solves the issue you saw in #29994? If so, please approve the PR.

@jmgao
Copy link
Contributor

jmgao commented Jan 8, 2021

Flashed a build onto a board just now, I'll report back in a few hours.

@carlescufi
Copy link
Member

Flashed a build onto a board just now, I'll report back in a few hours.

Could you please report back @jmgao?

@carlescufi
Copy link
Member

@pabigot looking at this, it seems worthwhile to me to get this in to reduce the introduced latency in users of k_cycle_get_32(), and it solves real problems for users. What sort of tests could we run on this to be confident that it is not breaking anything?

@carlescufi carlescufi added this to the v2.5.0 milestone Jan 13, 2021
@carlescufi
Copy link
Member

@koffes and @jmgao could you please re-test this so that we know if this PR is enough for us to close #29994? We'd like to settle this by 2.5.

@carlescufi
Copy link
Member

Note: This PR helps reducing interrupt latency regardless of whether CONFIG_CLOCK_CONTROL_NRF_SHELL is enabled, so this could be a fix to #29994 even if the shell is disabled.

Copy link
Collaborator

@pabigot pabigot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not rejecting this, just want to understand why the solution for the counter is so complex before approving the timer part. I'm still not on-board with having the shell clock data be limited to 36 hours, but the timer fix seems like a good idea.

It'd also be trivial to support 64 bit cycles by just changing last_count to be uint64_t. All 32-bit uses could cast the value to 32 bits, so there shouldn't be a performance impact except when advancing it wraps.

@@ -222,13 +225,56 @@ void z_nrf_rtc_timer_compare_set(uint32_t chan, uint32_t cc_value,
z_nrf_rtc_timer_compare_int_unlock(chan, key);
}

uint32_t z_timer_cycle_get_32(void)
{
/* Lockless algorithm for reading cycles assumes that there is an
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary for the anchor logic to be so complex?

As far as I can tell, the longest delay that's possible is half the 24-bit span, so unless the RTC interrupt is delayed for over 256 s we know that it will be invoked before the 24-bit counter wraps at 512 s.

We already add the 24-bit delta to last_count in the ISR. Why can't we just use last_count as the anchor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it was a bit too much. I've updated sys_clock_cycle_get_32 to simply attempt to calculate current value based on last_count and repeat until last count is unchanged during the process. It is slightly faster but mainly it is lockless.

@koffes
Copy link
Collaborator

koffes commented Jan 15, 2021

Something else has changed in the meantime, Unable to reproduce.
Let me know if required, and I can check out an earlier version of our project and apply this PR as a patch.

@nashif nashif removed this from the v2.5.0 milestone Feb 14, 2021
@github-actions
Copy link

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label Apr 16, 2021
Updates z_timer_cycle_get_32 to use lockless algorithm.

Signed-off-by: Krzysztof Chruscinski <[email protected]>
When shell command is enabled, start/stop is timestamped to provide
status. Previously, k_uptime_get() was used which locks interrupts and
takes more cycles. It was leading to bluetooth failures because
clock is controlled in critical path and any delays are unaccepted.

Converted to use k_cycle_get_32() which has lockless implementation
but is limited to 32 bits which means that status will be invalid
if clock state change occurs too far in the past (~36 hours). Note was
added to status command help message.

Signed-off-by: Krzysztof Chruscinski <[email protected]>
@nordic-krch nordic-krch force-pushed the nrf_rtc_timer_tick_get branch from 629afe1 to 0e522cc Compare April 19, 2021 13:11
@nordic-krch nordic-krch removed the Stale label Apr 19, 2021
Copy link
Collaborator

@pabigot pabigot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to know why there's a loop in the snapshot 32-bit cycle display. Otherwise this will be fine, if misleading when the maximum span is exceeded. A 64-bit cycle counter would solve many things.

do {
anchor = *base;
ret = counter_sub(counter(), anchor) + anchor;
} while (anchor != *base);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on what the loop is giving you here. Why not just:

uint32_t last = last_count;

return counter_sub(counter(), last) + last;

This should return a valid timestamp even if last_count is being updated simultaneously. To the caller it's equivalent to the loop when the update happens right after the loop exit condition is met.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @pabigot. There's no need for a loop here.

@github-actions
Copy link

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label Jul 28, 2021
@github-actions github-actions bot closed this Aug 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Clock Control area: Samples Samples area: Timer Timer bug The issue is a bug, or the PR is fixing a bug platform: nRF Nordic nRFx Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

High bluetooth ISR latency with CONFIG_BT_MAX_CONN=2
8 participants