-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xtimer: timer & target overflow, hang resolved. #9211
Conversation
Looks legit, I'll run the tests on hardware. @kaspar030 can you take a look? It seems a straightforward change. |
sys/xtimer/xtimer_core.c
Outdated
@@ -190,6 +190,11 @@ int _xtimer_set_absolute(xtimer_t *timer, uint32_t target) | |||
|
|||
timer->target = target; | |||
timer->long_target = _long_cnt; | |||
|
|||
/* Ensure timer is fired in right hardware timer periode. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/periode/period/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems legit to me, too. But xtimer is pretty central, we should run the tests on a couple of more architectures.
@Josar can you provide your test example? The current xtimer_hang test succeeds both in master and this PR, though it seems more stable with your patch. Tested on arduino-mega2560. |
Much more stable with this PR in a z1 (msp430). |
@Josar Maybe make both commit message and PR title summarize that is changed instead of the intend of the change? This is not the first resolved hang in xtimer, and it won't be the last. ;) |
@gebart would you take a brief look if we're missing something obviuos? |
Murdock just agreed, however for some reason in my setup I can't compile xtimer_hang test (both in current master and this PR): make: Entering directory '/Users/facosta/git/RIOT-OS/RIOT/tests/xtimer_hang'
Building application "tests_xtimer_hang" for "samr21-xpro" with MCU "samd21".
In file included from /Users/facosta/git/RIOT-OS/RIOT/tests/xtimer_hang/main.c:28:0:
/Users/facosta/git/RIOT-OS/RIOT/sys/include/xtimer.h:34:10: fatal error: msg.h: No such file or directory
#include "msg.h"
^~~~~~~
compilation terminated. |
The xtimer hang is obviously designed to find errors like this but in the Python Skript than allows a deviation of 10% hiding errors like that again. The error can also be found with xtimer sleep short when considering the test time, like introduced in #9037. I tried to introduce probing in #9025 and #8865 so this errors could be be probed in a HIL setup. Also the error can be visually seen in the output stream stutter, but the test neglect this. I use the xtimer_hang test and added probing pins., as I saw the print stutter. |
fc1047f
to
986a636
Compare
3fce999
to
b33175f
Compare
Well now the tests passed. |
I successfully run the
I will re-run all automated tests on them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick spell check!
sys/xtimer/xtimer_core.c
Outdated
* This is crucial when using low CPU frequencies so reaching the '_xtimer_now()' | ||
* call needs multiple xtimer ticks. | ||
* | ||
* '_xtimer_set()' and `_xtimer_periodic_wakeup()` ensure this by allready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling mistake allready->already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a hard time trying to understand what was wrong and how this fixes it.
Also:
Problem is that all test are design in such a way that they do not fail, eg by allowing deviation of 10%.
What's the point of trying to correct for interrupt latency (e.g. XTIMER_OVERHEAD) when the tests allow such a big deviation?
The timer where placed in the wrong list when they where set near the overflow. (line 201 - 247) The next target (32 bit) was compared to a low lovel timer (unknown bit size) (line 536)
IMHO, the test allows such deviation as the underlying problem was not found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks fine. I tested on arduino-mega2560
and z1
and measured with a logic analyzer. It improves the behavior a lot. On the z1 I noticed that in some cases one of the workers isn't executed until a full wrap-around of the timer happens. This doesn't happen very often, once after the first ~2 seconds and then again after ~80 seconds. With your Test supplied in #10174 this is much easier to see if you use two separate pins for both workers. I'd still agree with the others and say this is a huge improvement and we shuld investigate remaining problems with a follow-up.
Please check the typos / wording I commented below.
[edit]: please squash directly so we can merge soon ;)
sys/include/xtimer/implementation.h
Outdated
/** | ||
* @brief Sets the timer to the appropriate timer_list or list_head. | ||
* | ||
* @note The target to set the timer to has to be at least bigger then then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(...) then then
sys/include/xtimer/implementation.h
Outdated
* '_xtimer_now()' call needs multiple xtimer ticks to evaluate. | ||
* | ||
* @param[in] timer pointer to xtimer_t which is added to the list. | ||
* @param[in] target Absolut target value in ticks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolute
sys/xtimer/xtimer_core.c
Outdated
/* Ensure that offset is bigger than 'XTIMER_BACKOFF', | ||
* 'target - now' will allways be the offset no matter if target < or > now. | ||
* | ||
* This expects that target was not set to close to now and overrun now, so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not too close to
sys/xtimer/xtimer_core.c
Outdated
* | ||
* This expects that target was not set to close to now and overrun now, so | ||
* from setting target up until the call of '_xtimer_now()' above now has not | ||
* become equal and than bigger target. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
equal or bigger than target?
sys/xtimer/xtimer_core.c
Outdated
* call needs multiple xtimer ticks. | ||
* | ||
* '_xtimer_set()' and `_xtimer_periodic_wakeup()` ensure this by already | ||
* backing of for small values. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backing off
sys/xtimer/xtimer_core.c
Outdated
* than 'now', also for values when now will overflow and the value of target | ||
* is smaller then now. | ||
* If `target < XTIMER_OVERHEAD` the new target will be at the end of this | ||
* 32bit period, instead at the beginning of the next.*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new target will not be at the end of this 32bit period but instead at the beginning... ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it will be at the end of the running period and not at the beginning of the next.
bdf062a
to
67f4c19
Compare
As i could narrow it down this is caused by setting the timer to the long_list. Line 232 in 67f4c19
I think this is because setting the xtimer is not interrupt save. |
@Josar I am trying to provide an unit-test with mock for this to show the issue (implementation dependent but still a start). I have xtimer using my own mock functions of
Are you able to describe a minimal test that I could do to show this ? I can fake values of I will also try to extract something from your description at the same time and do it myself but you may be faster than me. |
Hmm my branch was based on your changes, I will revert your commits to be able to show an error. |
* | ||
* @note The target to set the timer to has to be at least bigger then the | ||
* ticks needed to jump into the function and calculate '_xtimer_now()'. | ||
* So that 'now' did not pass the target. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is hard to guarantee I think. The problem is that the calling task may be de-scheduled for a few ticks and then you miss it anyway.
As it is a private function, is it not possible to update the API to pass it the now
and offset
value to handle it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't get this requirement. How can it be that for correct operation one needs to know the time it gets to get to a certain line? IMO this is very fragile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To not pass now
and offset
, another solution could be to just assume (and ensure) that offset < UINT32_MAX - MAX_MARGIN
where MAX_MARGIN
is the maximum time we allow to enter the function and read now
again.
This way, you are sure that a timer loop can be detected if you took less than MAX_MARGIN
to go to now
.
A large enough solution could be to say it fits in an int32_t
so uses 31bits
.
It is an easy thing to ensure over the code base.
With 1Mhz it would still allow working if you take 35 minutes between the first call and being in the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no problem to guarantee. XTIMER_BACKOFF
has to be set to as much ticks as it needs to reach the backing-off case.
I allready started an test which helps to extract the required values.
See: #8990
This requirement is based on the fact that the time is ticking while you try to set a timer. If you set your deadline too short you will miss it. Thats it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will look at #8990 as a base.
When you start calling _xtimer_set_absolute
you are still in your context, and you could be de-scheduled, for some time (maybe more than XTIMER_BACKOFF
), so you do not know that the target
is not elapsed, or not too close from your calling function.
Of course if you call the function with something that ends up being in the past, it should _shoot it directly, but you cannot do better as it was asked too late.
All the calling functions of _xtimer_set_absolute
already have a now
value (2 functions + 1 test).
By getting now
and offset
you remove the issues with "calling offset was too short" issue and handle this in the function directly.
However I see another issue also with the current implementation, it is that when now
is read in the function, and offset
is evaluated it is not with masked interrupts. So everything could also go wrong there too, be de-scheduled and get stuck with xtimer_spin_until
.
The only hard limitation we cannot do anything about, is that between the first now
and the now
in the function, there should not be more than UINT32_MAX / Frequency
ticks.
I wanted to say 2min23 for 32Mhz timers, but it looks like there are only sub MHz timers.
Even if there is this comment about cc2538 using a 16Mhz timer and code to handle them.
RIOT/sys/include/xtimer/tick_conversion.h
Line 41 in 77dc923
/* e.g. cc2538 uses a 16 MHz timer */ |
git grep XTIMER_HZ . ':!sys'
boards/common/arduino-atmega/include/board_common.h:#define XTIMER_HZ (250000UL)
boards/frdm-kw41z/include/board.h:#define XTIMER_HZ (32768ul)
boards/hifive1/include/periph_conf.h:#define XTIMER_HZ (32768ul)
boards/ikea-tradfri/include/board.h:#define XTIMER_HZ (250000UL)
boards/jiminy-mega256rfr2/include/board.h:#define XTIMER_HZ (125000UL)
boards/mega-xplained/include/board.h:#define XTIMER_HZ (125000UL)
boards/mulle/include/board.h:#define XTIMER_HZ (32768ul)
boards/slstk3401a/include/board.h:#define XTIMER_HZ (250000UL)
boards/slstk3402a/include/board.h:#define XTIMER_HZ (250000UL)
boards/sltb001a/include/board.h:#define XTIMER_HZ (250000UL)
boards/slwstk6000b/include/board.h:#define XTIMER_HZ (250000UL)
boards/stk3600/include/board.h:#define XTIMER_HZ (250000UL)
boards/stk3700/include/board.h:#define XTIMER_HZ (250000UL)
boards/waspmote-pro/include/board.h:#define XTIMER_HZ (62500UL)
cpu/esp32/periph/timer.c: CHECK_PARAM_RET (freq == XTIMER_HZ_BASE, -1);
cpu/esp32/periph/timer.c: CHECK_PARAM_RET (freq == XTIMER_HZ_BASE, -1);
cpu/esp8266/periph/timer.c: CHECK_PARAM_RET (freq == XTIMER_HZ_BASE, -1);
cpu/esp8266/periph/timer.c: CHECK_PARAM_RET (freq == XTIMER_HZ_BASE, -1);
tests/bench_timers/Makefile:test-xtimer: CFLAGS+=-DTEST_XTIMER -DTIM_TEST_FREQ=XTIMER_HZ -DTIM_TEST_DEV=XTIMER_DEV
Looking at all this really needs a lot of concentration D:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could propose something with my ideas and see if it would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you see #7630 regarding passing a reference time as argument to _xtimer_set_absolute?
It needs a rebase but it is related to this discussion I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not, I just jumped in the xtimer issue to provide a test, but it means that I had a similar reasoning. I will provide a minimal test to show an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gebart I think your fix still misses something, from what I get, the function should be calling xtimer_now
from within a masked interrupt context, so that the time between this now
and the periph_timer_set
can be guaranteed to be fixed as you cannot be de-scheduled.
If not the comparison with XTIMER_BACKOFF
does not guarantee anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to show these with tests.
@cladmi i think a low level timer mock is a good idea. I think a test just needs to set the counter register of the lltimer to the end of the period before setting the xtimers. And also set the 32 bit low counter of the xtimer to the end of a period before setting the xtimer. Basically testing is needed for
|
You are talking about Your test description is to show the fix in the interrupt handler right ? I think I will try to show only the issue with |
If the target is at the beginning of the next low level timer period it will end in the long_list, regardless of the fact that
As the test supplied in #10174 enables to see issues, this does not mean that this PR handles them all. It was a first step to even out the most trivial bugs which had a big impact. De-sheduling is a seperate problem. And can happen at multiple times, this is a problem of not being interrupt safe. Basically a thread might be de-sheduled between reading the xtimer value untill it sets its new target. But this is not meant to be handled in this PR and requires a substantial rework. |
I think this is important to underline - this PR is already a huge improvement and we shouldn't try to fix all the problems at once-> IMO: test the PR on various boards/archs as is and merge. We can fix the remaining problems (that are now easier to find) with follow-ups. This PR was hanging around here for long enough... |
I'm running all test with this branch on |
1. When the 32 bit target of the xtimer overflowed the timer was not placed in the right list. 2. When the hardware timer overflowed the comparison was wrong for setting next target. 3. Backoff condition
b9a8fa1
to
6723689
Compare
@leandrolanzieri done. |
These are the results for
|
Here are some measurements I did to compare results of master to this PR.
Arduino DUE
Note: Capture files can be downloaded here |
@leandrolanzieri with a 32bit hardware timer running with 1MHz this proplem may happen once in ca. 71 minutes, if i am not mistaken. To provoke this error you could let the timer run faster then 1MHz. Besides that one of those fixes only affects not 32bit timers. |
I really enjoy the upgrade documentation and improvement. However I do not like that it was merged only on testing without reviewing the documentation and consequence out what was written there. I tried to prove my assumptions with #10321 please show me if my tests are wrong to say my critics are wrong. |
This Problems encountered when testing for #8990
Tested with xtimer_hang on jiminy board.