Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ztimer/periodic: remove timer on init if already running #19806

Merged
merged 1 commit into from
Jul 12, 2023

Conversation

benpicco
Copy link
Contributor

@benpicco benpicco commented Jul 6, 2023

Contribution description

If ztimer_periodic_init() is called on a periodic timer that is already running, the init function will completely overwrite the ztimer_t structure that then is already part of a linked list.

This causes corruption of the clock's timer list and leads to interesting results (e.g. if the timer was the first element in the list, it will now be added again causing ->next to point to itself which creates a circular list and an infinite loop when iterating the clock's timers.)

Testing procedure

Issues/PRs references

@github-actions github-actions bot added Area: timers Area: timer subsystems Area: sys Area: System labels Jul 6, 2023
@benpicco benpicco requested a review from fabian18 July 6, 2023 16:44
@benpicco benpicco added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Jul 6, 2023
@riot-ci
Copy link

riot-ci commented Jul 6, 2023

Murdock results

✔️ PASSED

eadb374 ztimer/periodic: remove timer on init if already running

Success Failures Total Runtime
6936 0 6936 11m:53s

Artifacts

@benpicco benpicco requested a review from kfessel July 10, 2023 10:50
@fabian18
Copy link
Contributor

I think it could be that, when the timer is passed uninitialized, like in the test, it could contain a garbage next pointer.
The if the list is emty, the _is_set() function could return a false positive.

@benpicco
Copy link
Contributor Author

benpicco commented Jul 11, 2023

Ugh, I somehow expected the ztimer_is_set() function to iterate all timers on the clock and compare their address with the passed timer, not rely on the values inside the timer struct at all - but that's indeed not the case.

Maybe we need something like

bool ztimer_is_set_safe(const ztimer_clock_t *clock, const ztimer_t *timer)
{
    const ztimer_base_t *list = &clock->list;
    const ztimer_base_t *entry = &timer->base;

    while (list->next) {
        ztimer_base_t *list_entry = list->next;
        if (list_entry == entry) {
            return true;
        }
        list = list->next;
    }

    return false;
}

@kfessel
Copy link
Contributor

kfessel commented Jul 12, 2023

maybe a timer should at least be stopped before being reinitialized e.g.: if the clock is changed in the reinitialize this would not work. even with this change and the ztimer_safe_is_set, if we would make that work for reinit (using the .clock) it might not work for init any more.

I am not sure why you want to reinit but probably to change the period -> maybe we need a set period function.

@kaspar030
Copy link
Contributor

The if the list is emty, the _is_set() function could return a false positive.

True. Maybe the documentation needs updating - that's more like a "is maybe set".

ztimer_remove() only iterates if there's something set for the ztimer_t's next, which is an optimization (as if there's no "next", the timer cannot be set due to the timer list being a circular list).

ztimer_remove() is safe w.r.t. having garbage in the ztimer_t struct, so is_set(t) -> ztimer_remove(t) is fine even for uninitialized timers.

@benpicco
Copy link
Contributor Author

ztimer_remove() is safe w.r.t. having garbage in the ztimer_t struct, so is_set(t) -> ztimer_remove(t) is fine even for uninitialized timers.

So I can drop the ztimer_is_set(clock, &timer->timer) check and just call ztimer_remove(clock, &timer->timer) unconditionally?

@kaspar030
Copy link
Contributor

So I can drop the ztimer_is_set(clock, &timer->timer) check and just call ztimer_remove(clock, &timer->timer) unconditionally?

yes, that will do the right thing, more efficiently.

@benpicco
Copy link
Contributor Author

Say no more!

@benpicco benpicco force-pushed the ztimer_periodic_init-safe branch from 54926d4 to eadb374 Compare July 12, 2023 13:04
Copy link
Contributor

@kaspar030 kaspar030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK.

@kaspar030
Copy link
Contributor

bors merge

1 similar comment
@benpicco
Copy link
Contributor Author

bors merge

@bors
Copy link
Contributor

bors bot commented Jul 12, 2023

Already running a review

@bors
Copy link
Contributor

bors bot commented Jul 12, 2023

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@bors bors bot merged commit 7dd7d1e into RIOT-OS:master Jul 12, 2023
@benpicco benpicco deleted the ztimer_periodic_init-safe branch July 13, 2023 09:34
@@ -57,6 +57,7 @@ void ztimer_periodic_init(ztimer_clock_t *clock, ztimer_periodic_t *timer,
bool (*callback)(
void *), void *arg, uint32_t interval)
{
ztimer_remove(clock, &timer->timer);
Copy link
Contributor

@kfessel kfessel Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line should be
ztimer_remove(timer->clock, &timer->timer)

since clock may not be the clock the timer is set on it can not be safely removed there (there is just a good chance that the same clock is used for the reinitialize)

but doing that would also call the remove function with a not initialized timer->clock, which may also not be safe

bors bot added a commit that referenced this pull request Jul 24, 2023
19817: compile_and_test_for_boards: Add no-compile flag r=benpicco a=MrKevinWeiss


### Contribution description

Since we have a no-test flag that prevents executing tests, I think a no-compile flag is a nice compliment. Why? Well if I want to use this script for running multiple boards at the same time, RIOT is not so great handling parallel compile steps with conflicts on lockfiles happening, mostly due to packages. With this I can compile a list of boards sequentially, then flash and run tests in parallel, skipping the compile step.


### Testing procedure

Run the following once to compile and clean:
```
./dist/tools/compile_and_test_for_board/compile_and_test_for_board.py . native --applications tests/sys/shell --clean-after
```

Then try to run without the compile step and it should fail due to lack of the binary
```
./dist/tools/compile_and_test_for_board/compile_and_test_for_board.py . native --applications tests/sys/shell --no-compile
```

### Issues/PRs references


19826: ztimer/periodic: reinit remove from right clock and handle aquired ztimer r=benpicco a=kfessel

### Contribution description

#19806 added some retinit handling for ztimer periodic removing the timer from the new clock 

This tries to detect if this is a reinit and remove the timer from the old clock
this also removes the ztimer_acquire/_release handling by removing now calls in favour of set return value and now values  that are allready in ztimer,
that also has the potential to reduce the jitter of the periodic calls and bus-usage (for cpus that take their time to get "now") 

### Testing procedure

read

run tests/sys/ztimer_periodic

### Issues/PRs references

Fixes #19806 

19841: boards/adafruit-itsybitsy-nrf52: Add configuration for DotStar LED r=benpicco a=jimporter



19842: cpu/stm32: fix ld script for SRAM4 r=benpicco a=gschorcht

### Contribution description

This PR fixes the LD script for STM32.

Since the CCM and SRAM4 length are defined as symbols with perifx `_`, the LD script didn't use them correctly. Instead of using `ccmram_length` and `sram4_length`, `_ccmram_length` and `_sram4_length` have to be used. Furthermore, the location counter for the SRAM has to be set to the beginning of SRAM4 to work.

BTW, I don't understand why the `ccmram` region is defined. There is no section definition that would use it to place code or data with attribute `.ccmram.*` there.

Without the fix in this PR, defined symbols in `tests/sys/malloc` for the `b-u585i-iot02a` board were:
```python
00000000 T _sheap2     <== wrong start position because of wrong location counter
28000000 T _eheap2     <== wrong end position because of `sram4_length` is 0.
```
Although the `tests/sys/malloc` crashes for `b-u585i-iot02a` at the end of the heap (known problem, see [here](#17410 (comment))), it uses only the backup RAM before it crashes:
```
Allocated 512 Bytes at 0x200bf600, total 756072
Allocated 512 Bytes at 0x200bf818, total 756592
Allocated 512 Bytes at 0x200bfa30, total 757112
Allocated 512 Bytes at 0x200bfc48, total 757632
Allocated 512 Bytes at 0x40036408, total 758152
Allocated 512 Bytes at 0x40036610, total 758672
Allocated 512 Bytes at 0x40036818, total 759192
```

With the fix in this PR, defined symbols in `tests/sys/malloc` for the `b-u585i-iot02a` board are:
```python
28000000 T _sheap2     <== correct start position
28004000 T _eheap2     <== correct end position
```
`tests/sys/malloc` also crashes for the `b-u585i-iot02a` at the end of the heap, but it uses also the SRAM4 before it crashes.
```
Allocated 512 Bytes at 0x200bf600, total 756072
Allocated 512 Bytes at 0x200bf818, total 756592
Allocated 512 Bytes at 0x200bfa30, total 757112
Allocated 512 Bytes at 0x200bfc48, total 757632
Allocated 512 Bytes at 0x40036408, total 758152
Allocated 512 Bytes at 0x40036610, total 758672
Allocated 512 Bytes at 0x40036818, total 759192
Allocated 512 Bytes at 0x28000008, total 759712
Allocated 512 Bytes at 0x28000210, total 760232
Allocated 512 Bytes at 0x28000418, total 760752
...
Allocated 512 Bytes at 0x280038e8, total 774272
Allocated 512 Bytes at 0x28003af0, total 774792
Allocated 512 Bytes at 0x28003cf8, total 775312
```

### Testing procedure

1. Flash `tests/sys/malloc` and use `MAX_MEM` limit to stop `malloc` before the crash:
   ```
   CFLAGS='-DMAX_MEM=774800 -DCHUNK_SIZE=512 -DDEBUG_ASSERT_VERBOSE' \
   BOARD=b-u585i-iot02a make -j8 -C tests/sys/malloc flash
   ```
   Without the PR it crashes at the end of the backup RAM. With the PR it works.

2. Check `_sheap2` and `_eheap2` with
   ```
   nm -s tests/sys/malloc/bin/b-u585i-iot02a/tests_malloc.elf | grep heap2 | sort
   ```
   Without the PR it will be:
   ```
   00000000 T _sheap2
   28000000 T _eheap2
   ```
   With the PR it should be:
   ```
   28000000 T _sheap2
   28004000 T _eheap2
   ```

### Issues/PRs references


Co-authored-by: MrKevinWeiss <[email protected]>
Co-authored-by: Karl Fessel <[email protected]>
Co-authored-by: Jim Porter <[email protected]>
Co-authored-by: Gunar Schorcht <[email protected]>
@benpicco benpicco added this to the Release 2023.07 milestone Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: sys Area: System Area: timers Area: timer subsystems CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants