-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib: posix: nanosleep: fix nanosleep for sub microsecond durations #28542
Merged
carlescufi
merged 2 commits into
zephyrproject-rtos:master
from
cfriedt:issue/28483/fix-nanosleep-2-for-sub-microsecond-durations
Oct 2, 2020
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still approving, but this is where it's going to break: when we have a system where the cycle clock is slower than the CPU clock then we'll have to add a fudge factor for lower bounds that aren't met. That'll generally be board-specific; I think one of my reel_board's has exhibited that behavior in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, "break" is debatable. From the spec:
Which clearly applies here because the scheduler is triggered at least once.
The board I'm currently working on is also in the 32k club, so it's also nowhere near accurate.
I considered adding a negative value in
rmtp
to reflect the overrun w.r.t.rqtp
but the Linux & macOS do not do that, and it seems thatrmtp
is only modified whennanosleep(2)
is interrupted by a signal.Ideally, a solution to #6498 would be rolled-out soon-ish and that should help to fix this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try again:
Assume the CPU clock that's used for the nanosleep busy-wait runs at 10 MHz and is precise: there are 10^7 ticks in one second.
There is no access to this as a clock, so a different clock is used for the system clock that counts cycles. Assume it runs at 10 kHz and has a -500 ppm error: it ticks 9995 times in one second.
Then the duration measured for a 1 s sleep taking 10^7 CPU cycles will be 999500 us, which will appear to violate the spec because it's shorter than expected. The test will fail, even though the code does exactly what it's supposed to do. That's what I mean by "break".
This mostly shows up in Nordic because other platforms have synchronous CPU and cycle clocks, but in theory it can apply to any platform. The frequency tolerance for Nordic LFRC is 5%, and for HFRC is 8% so a 13% error between the CPU and cycle clocks can be observed. It's pretty likely at least one board in some test environment somewhere will display a negative ppm error and fail the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying. I think that is a definite concern when using one clock domain to try and synchronize a separate clock domain. That is not what is happening here. In the implementation, I took out the busy wait and opted to use
k_usleep()
instead, which synchronises in the same (albeit slow as heck) clock domain.Assumption 1:
k_usleep()
does not use the cycle counter thatk_busy_wait()
uses.Assumption 1 seems to be valid but please correct me if I'm wrong.
Assumption 2: calling
k_usleep()
for any non-zero duration would incur at least 1 expiration of the cycle clockMy assumption was, that by doing so, there would be a guarantee that the cycle clock must expire at least once (it is likely more than that).
It would appear that Assumption 2 also holds.
So I am skeptical that (now - then) will ever be a negative value in this test.
I realize that nanosleep is incredibly inaccurate this way, but it is both compliant with the spec and the tests do not break.
Given that Zephyr is not relying on nanosleep anywhere yet, I feel that the inaccuracy is a lesser evil than requiring all platforms to try and tune their clock-tolerances and to try and do something hackish in the tests. I like keeping things simple-ish.
I think the issue here, is that Nordic put a lot of work into trying to tune nanosleep based on clock tolerances, but it was difficult to get tests to pass consistently. Tests also do not pass consistently in qemu as well, or on other platforms I was using. Using clock tolerances is a bit of a slippery slope.
The posixly-correct way to fix this would be to provide access to a collection of clocks and their resolutions via
clock_getres()
, then callclock_nanosleep()
using the clock with the best resolution.Since we don't (yet) have
clock_getres()
, and I believe we do not have the ability to use an arbitrary clock to un-suspend a thread, and also since we do not have a calibrated per-cpu delay-loop, I'm ok with usingk_usleep()
until other more reliable and more accurate options are available.I realize it's a lot to ask, but can you be ok with that as well for the next release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd missed that the implementation had also changed away from busy-wait. I haven't seen anything here that makes me remove my existing approval, but we need @pfalcon to approve this.
For the other stuff, I just want to point to #19030 and its parent #19282. Clocks and time are very important to me.