Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iotlab-m3: thread tests failing #10800

Closed
cladmi opened this issue Jan 17, 2019 · 12 comments
Closed

iotlab-m3: thread tests failing #10800

cladmi opened this issue Jan 17, 2019 · 12 comments
Assignees
Labels
Area: boards Area: Board ports Area: tests Area: tests and testing framework Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Comments

@cladmi
Copy link
Contributor

cladmi commented Jan 17, 2019

Analysis

The threads are using printf to show threads activity, however, a thread could be de-scheduled in the middle of printf. It did not really happen before but always happens when using the DMA.

Description

I found out that the thread related tests are all failing on BOARD=iotlab-m3:

- [tests/cond_order](tests/cond_order/test.failed)
- [tests/isr_yield_higher](tests/isr_yield_higher/test.failed)
- [tests/msg_try_receive](tests/msg_try_receive/test.failed)
- [tests/mutex_order](tests/mutex_order/test.failed)
- [tests/pipe](tests/pipe/test.failed)
- [tests/posix_semaphore](tests/posix_semaphore/test.failed)
- [tests/ps_schedstatistics](tests/ps_schedstatistics/test.failed)
- [tests/pthread_barrier](tests/pthread_barrier/test.failed)
- [tests/pthread_cooperation](tests/pthread_cooperation/test.failed)
- [tests/pthread_rwlock](tests/pthread_rwlock/test.failed)
- [tests/rmutex](tests/rmutex/test.failed)
- [tests/thread_basic](tests/thread_basic/test.failed)
- [tests/thread_exit](tests/thread_exit/test.failed)
- [tests/thread_flags](tests/thread_flags/test.failed)
- [tests/thread_msg](tests/thread_msg/test.failed)
- [tests/thread_msg_seq](tests/thread_msg_seq/test.failed)

I bisected with

BOARD=iotlab-m3 make -C tests/thread_basic/ all flash test

to get to the commit commit 0b6bccb as breaking the thread tests for iotlab-m3.

Steps to reproduce the issue

Run the test

BOARD=iotlab-m3 make -C tests/thread_basic/ all flash test

Or one of the listed ones before. The test should succeed but fails.

Versions

I tested with the current master b6eb12c and bisected to 0b6bccb.

I was building using the docker container integration with:

riot/riotbuild      latest              2a18018b5a2b        8 weeks ago         3.36GB

And also with ubuntu bionic:

./dist/tools/ci/print_toolchain_versions.sh

Operating System Environment
-----------------------------
       Operating System: "Ubuntu" "18.04.1 LTS (Bionic Beaver)"
                 Kernel: Linux 4.18.0-13-generic x86_64 x86_64

Installed compiler toolchains
-----------------------------
             native gcc: gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0
      arm-none-eabi-gcc: arm-none-eabi-gcc (GNU Tools for Arm Embedded Processors 7-2018-q2-update) 7.3.1 20180622 (release) [ARM/embedded-7-branch revision 261907]
                avr-gcc: avr-gcc (GCC) 5.4.0
       mips-mti-elf-gcc: missing
             msp430-gcc: msp430-gcc (GCC) 4.6.3 20120301 (mspgcc LTS 20120406 unpatched)
   riscv-none-embed-gcc: missing
   xtensa-esp32-elf-gcc: missing
   xtensa-lx106-elf-gcc: missing
                  clang: clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)

Installed compiler libs
-----------------------
   arm-none-eabi-newlib: "3.0.0"
    mips-mti-elf-newlib: missing
riscv-none-embed-newlib: missing
xtensa-esp32-elf-newlib: missing
xtensa-lx106-elf-newlib: missing
               avr-libc: "2.0.0" ("20150208")

Installed development tools
---------------------------
                  cmake: cmake version 3.10.2
               cppcheck: Cppcheck 1.82
                doxygen: 1.8.13
                 flake8: 3.6.0 (mccabe: 0.6.1, pycodestyle: 2.4.0, pyflakes: 2.0.0) CPython 3.6.7 on Linux
                    git: git version 2.17.1
                   make: GNU Make 4.1
                openocd: Open On-Chip Debugger 0.10.0
                 python: Python 2.7.15rc1
                python2: Python 2.7.15rc1
                python3: Python 3.6.7
             coccinelle: missing
@cladmi cladmi added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: tests Area: tests and testing framework Area: boards Area: Board ports labels Jan 17, 2019
@cladmi cladmi added this to the Release 2019.01 milestone Jan 17, 2019
@miri64
Copy link
Member

miri64 commented Jan 17, 2019

Wasn't 0b6bccb reverted in #10775?

@miri64
Copy link
Member

miri64 commented Jan 17, 2019

Ah... it only reverted it for the SPI...

@aabadie
Copy link
Contributor

aabadie commented Jan 17, 2019

argh

@aabadie
Copy link
Contributor

aabadie commented Jan 17, 2019

I have to look at other concerned ST boards

@aabadie
Copy link
Contributor

aabadie commented Jan 17, 2019

Ok, the problem is not only affecting the iotlab-m3 board but also all STM32 based boards that I have tested until now (nucleo-091rc, b-l072zlrwan1, b-l475e-iot01a).

The problems comes from multiple threads trying to print nearly at the same time. A potential solution (discussed offline with @cladmi) would be to use some kind of atomic stdio functions: if a thread is already writing to UART, no other should be able to print until the first one is done. I'm not sure if this won't break the DMA feature itself but with minimal changes, it will fix the tests.

The proposed solution should be discussed further, but it may take some time, I propose to revert the use of DMA at board level.

@cladmi
Copy link
Contributor Author

cladmi commented Jan 17, 2019

So the thread tests are not thread safe :D as nothing says that when you do printf you will not have another printf in the middle.

I have two ideas to implement this with functionsatomic_puts and atomic_printf:

  • use a global mutex around the original puts/printf. Which is somehow the same as done for printk without the buffer they have: https://en.wikipedia.org/wiki/Printk#Description
  • Do it with irq_disable
    • however, the current code does not support it as a special case is only done for irq_is_in which does not take into account that interrupts can be disabled. May it also be a similar reason why it did not work for spi ?

@cladmi
Copy link
Contributor Author

cladmi commented Jan 17, 2019

I did not listed it in the first place, as I thought it was not thread related and that it was another issue, but tests/gnrc_netif is also not thread safe as it relies on printing in parallel.

https://ci.inria.fr/iotlab-os-ci/job/test-job/lastCompletedBuild/testReport/iotlab-m3.tests_gnrc_netif.test.tests/01-run/testfunc/

In that case, as there is another thread printing, I think that having a gnrc_logging_thread_mutex that could be taken by the tests too could ensure you can print blocks in parallel to the logging thread without being interrupted.

@vincent-d
Copy link
Member

I believe printf/puts thread-safety should be fixed by #8619

I don't have hardware at the moment, so I can't test. I guess disabling DMA is a first step to get the stability back.

@miri64
Copy link
Member

miri64 commented Jan 17, 2019

Maybe that was also the problem with #10761?

@aabadie
Copy link
Contributor

aabadie commented Jan 17, 2019

I believe printf/puts thread-safety should be fixed by #8619

This improves the situation but tests/cond_order is still failing: things are not printed in the expected order, but at least they are correctly printed.

I guess disabling DMA is a first step to get the stability back.

This kind of is what is done in #10803: the feature is not pulled automatically by the board. It's just that DMA with SPI is not tested automatically.

@cladmi
Copy link
Contributor Author

cladmi commented Jan 18, 2019

To be pedantic about words. I would say thatprintf/puts are "thread safe", it is just normal to have mixed outputs if two threads print "at the same time". Like the first tests you do with threads on linux where you have mixed outputs.
Before it was quite unlikely to be de-scheduled during a printf as you were often the thread with the biggest priority but was still possible and shown in some tests with print on asynchronous events.
When using the DMA you are always de-scheduled.
(Now that I read the link from #8619 (comment) maybe they are not thread safe but for me it is a different issue than the issue with the tests)

I would use more say not "atomic" as it is more the fact that even if you printed the first characters of the string, somebody else can print before you finish printing your string.
I think it is normal to assume that you can be de-scheduled during a printf and somebody else tries to print too.

What is not thread safe is our tests implementation as they do not take this possible de-scheduling into account. And even with a thread safe newlib, they would still they need to use a shared lock to ensure full string printing.

@cladmi
Copy link
Contributor Author

cladmi commented Jul 8, 2019

The issue was fixed during 2019.01 through #10803

The tests tread safety is another issue than this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: boards Area: Board ports Area: tests Area: tests and testing framework Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

No branches or pull requests

4 participants