-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arm64 tick timer fixes #13674
Arm64 tick timer fixes #13674
Conversation
[Experimental Bot, please feedback here] Let's break down this pull request (PR) against the NuttX requirements: Summary:
Impact:
Testing:
Conclusion: This PR addresses a significant issue but requires further refinement to meet NuttX's standards:
By addressing these points, the PR will be well-positioned for review and merging into NuttX. |
However, we were unable to reproduce the issue on qemu/x86_64 and qemu/armv8a (commit d47076). Here is the test code: #include <fcntl.h>
#include <nuttx/config.h>
#include <stdio.h>
#include <semaphore.h>
#include <nuttx/clock.h>
/****************************************************************************
* Public Functions
****************************************************************************/
int main(int argc, FAR char *argv[])
{
sem_t sem;
clock_t tick;
uint32_t wait;
/* waittick table */
static const uint32_t waittick_tbl[] =
{
0, 1, 10, 100, 1000, 10000, 100000, 1000000
};
sem_init(&sem, 0, 0);
for (uint32_t i = 0; i < sizeof(waittick_tbl)/sizeof(uint32_t) ; i++)
{
wait = waittick_tbl[i];
tick = clock_systime_ticks();
nxsem_tickwait(&sem, wait);
tick = clock_systime_ticks() - tick;
printf("tick1 %u : %lu\n", wait, tick);
// assert(wait <= tick);
tick = clock_systime_ticks();
nxsem_tickwait_uninterruptible(&sem, wait);
tick = clock_systime_ticks() - tick;
printf("tick2 %u : %lu\n", wait, tick);
// assert(wait <= tick);
}
return 0;
} |
I actually never tested whether this is reproducible in qemu; that would depend on the timer accuracy in qemu/linux. It is really easy to reproduce in real hw within approx 10 minutes of running. I can check the qemu reproducibility tomorrow. Thanks! Oh! And the test above wouldn't even show the issue; it does count the correct number of ticks, just too fast (sometimes two ticks at once). So you need to measure the real time spent over the tickwait. So take the clock_gettime or up_timer_gettime before and after the tickwait and check that the correct amount of time has passed, not the amount of ticks :) |
Thank you for providing more information. However, after changing to the clock_gettime, I still cannot reproduce the issue on qemu/aarch64. I will try to conduct further investigations on a real armv8 machine. Here is my test code: #include <fcntl.h>
#include <nuttx/config.h>
#include <stdio.h>
#include <semaphore.h>
#include <nuttx/clock.h>
#include <nuttx/arch.h>
static inline time_t diff_time(struct timespec *ts_start, struct timespec *ts_end){
return (ts_end->tv_sec - ts_start->tv_sec) * NSEC_PER_SEC + (ts_end->tv_nsec - ts_start->tv_nsec);
}
/****************************************************************************
* Public Functions
****************************************************************************/
int main(int argc, FAR char *argv[])
{
sem_t sem;
clock_t tick;
uint32_t wait;
time_t diff;
struct timespec ts_start;
struct timespec ts_end;
/* waittick table */
static const uint32_t waittick_tbl[] =
{
0, 1, 10, 100, 1000, 10000, 100000, 1000000
};
sem_init(&sem, 0, 0);
for (uint32_t i = 0; i < 10000000; i++)
{
wait = waittick_tbl[i%2];
tick = clock_systime_ticks();
clock_gettime(0, &ts_start);
nxsem_tickwait(&sem, wait);
clock_gettime(0, &ts_end);
tick = clock_systime_ticks() - tick;
diff = diff_time(&ts_start, &ts_end);
printf("tick1 %u : %lu, %ld ns\n", wait, tick, diff);
assert(wait <= tick && diff >= 0);
tick = clock_systime_ticks();
clock_gettime(0, &ts_start);
nxsem_tickwait_uninterruptible(&sem, wait);
clock_gettime(0, &ts_end);
tick = clock_systime_ticks() - tick;
diff = diff_time(&ts_start, &ts_end);
printf("tick2 %u : %lu, %ld ns\n", wait, tick, diff);
assert(wait <= tick && diff >= 0);
}
return 0;
} |
issue re-produces also in qemu, at least with this simple test.
|
…ss build Signed-off-by: Jukka Laitinen <[email protected]>
…k timer This fixes two issues with the tick timer 1) Each tick was longer than the requested period. This is because setting the compare register was done by first reading the current time, and only after that setting the compare register. In addition, when handling the timer interrupts in arch_alarm.c / oneshot_callback, the current_tick is first read, all the tick handling is done and only after that the next tick is started. The whole tick processing time was added to the total tick time. 2) When the compare time is not aligned with tick period, and is drifting, eventually any call to ONESHOT_TICK_CURRENT would either return the current tick, or the next one, depending on the rounding of division by the cycle_per_tick. This again leads to oneshot_callback randomly handling two ticks at a time, which breaks all wdog based timers, causing them to randomly timeout too early. The issues are fixed as follows: Align the compare time register to be evenly divisible by cycle_per_tick. This will lead arm64_tick_current always to return the currently ongoing tick, fixing 2). Also calculating the next tick's start from the aligned current count will fix 1), as there is no time drift in the start cycle. Signed-off-by: Jukka Laitinen <[email protected]>
…d condition nxsem_tickwait correctly sleeps more than 1 tick. But nxsem_tickwait_uninterruptible may wake up to a signal (with -EINTR), in which case the tick + 1 must also be taken into account. Otherwise the nxsem_tickwait_uninterruptible may wake up 1 tick too early. Also fix the nxsem_tickwait to return with -ETIMEDOUT if called with delay 0. This is similar to e.g. posix sem_timedwait. Signed-off-by: Jukka Laitinen <[email protected]>
8268a47
to
a2c0d6f
Compare
I added a minimal test code above, which can be used in both real target and qemu; it shows the major issue typically within 1 minute. |
@Fix-Point Were you able to re-produce the issue with the code I pasted above? |
Summary
It has been observed in our testing with imx9 platform, that nxsem_tickwait_uninterruptible randomly wakes up too early. As this is commonly used in device drivers as a timeout timer, this causes early timeouts resulting many errors. Further investigation revealed that the tick timer in arch_alarm/oneshot_callback sometimes ticks two times at one call. This would then cause the wdog timer to timeout immediately even if it was supposed to wait for one more tick.
The issue was root-caused to these things:
This PR proposes solution to the issues by
The alignment of tick start fixes both the tick timer drift and current tick retrieval issues. It is then not necessary to add an interface for the oneshot alarm to support atomic get&start tick timer.
This patchset also adds a cleanup for arch_alarm oneshot_callback. This shouldn't fix any bug per-se, but as we have had that patch included in all the testing, it is included in this patchset. What it fixes in theory is, that the oneshot_callback could be called more frequently than the tick period without errors - but this is not used for anything atm.
Impact
Fixes random errors in several device drivers for imx9 platform, in longer test runs.
Testing
Tested on i.MX93 platform with some added debug instrumentation.