Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mps2_an385 crashes if CONFIG_INIT_STACKS=y and CONFIG_COVERAGE=y #15794

Closed
andrewboie opened this issue May 1, 2019 · 4 comments · Fixed by #16099
Closed

mps2_an385 crashes if CONFIG_INIT_STACKS=y and CONFIG_COVERAGE=y #15794

andrewboie opened this issue May 1, 2019 · 4 comments · Fixed by #16099
Assignees
Labels
area: ARM ARM (32-bit) Architecture bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@andrewboie
Copy link
Contributor

andrewboie commented May 1, 2019

If both of these configs get enabled, we get a bus fault:

***** BUS FAULT *****
  Precise data bus error
  BFAR Address: 0xaaaaaaaa
***** Hardware exception *****
Current thread ID = 0x2000ca54
Faulting instruction address = 0x8656
Fatal fault in thread 0x2000ca54! Aborting.
***** USAGE FAULT *****
  Illegal load of EXC_RETURN into PC
***** Hardware exception *****
Current thread ID = 0x20009a14
Faulting instruction address = 0x5bcd
Fatal fault in thread 0x20009a14! Aborting.
***** MPU FAULT *****
  Data Access Violation
  MMFAR Address: 0x64
***** Hardware exception *****
Current thread ID = 0x00000000
Faulting instruction address = 0x5bf8
Fatal fault in ISR! Spinning...

Note that the thread ID information is bogus since we are in early boot context.

The crash happens in the call to memset() in z_new_thread_init() when it tries to write 0xAA to the stack buffer (which is what CONFIG_INIT_STACKS does) for the main thread.

#0  memset (buf=0x2000ab30 <_main_stack>, c=170, n=8192)
    at /home/apboie/projects/zephyr3/zephyr/lib/libc/minimal/source/string/string.c:290
#1  0x00006c2c in z_new_thread_init (thread=0x20009a14 <_main_thread_s>, 
    pStack=0x2000ab30 <_main_stack> "", stackSize=8192, prio=0, options=1)
    at /home/apboie/projects/zephyr3/zephyr/kernel/include/kernel_structs.h:222
#2  0x00006cea in z_new_thread (thread=0x20009a14 <_main_thread_s>, stack=0x2000ab30 <_main_stack>, 
    stackSize=8192, pEntry=0xc271 <bg_thread_main>, parameter1=0x0 <crc32_ieee>, 
    parameter2=0x0 <crc32_ieee>, parameter3=0x0 <crc32_ieee>, priority=0, options=1)
    at /home/apboie/projects/zephyr3/zephyr/arch/arm/core/thread.c:100
#3  0x0001003c in z_setup_new_thread (new_thread=0x20009a14 <_main_thread_s>, 
    stack=0x2000ab30 <_main_stack>, stack_size=8192, entry=0xc271 <bg_thread_main>, 
    p1=0x0 <crc32_ieee>, p2=0x0 <crc32_ieee>, p3=0x0 <crc32_ieee>, prio=0, options=1, 
    name=0x14354 "main") at /home/apboie/projects/zephyr3/zephyr/kernel/thread.c:362
#4  0x0000c4e0 in prepare_multithreading (dummy_thread=0x2000ca54 <_main_stack+7972>)
    at /home/apboie/projects/zephyr3/zephyr/kernel/init.c:368
#5  0x0000c6d0 in z_cstart () at /home/apboie/projects/zephyr3/zephyr/kernel/init.c:513
#6  0x0000776e in _PrepC () at /home/apboie/projects/zephyr3/zephyr/arch/arm/core/cortex_m/prep_c.c:187
#7  0x00007460 in __start () at /home/apboie/projects/zephyr3/zephyr/arch/arm/core/cortex_m/reset.S:75

To reproduce, add the lines:

CONFIG_COVERAGE=y
CONFIG_INIT_STACKS=y

to samples/hello_world/prj.conf and run on mps2_an385.
First noticed with tests/kernel/mem_protect/userspace, but not specific to that test

@andrewboie andrewboie added bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug labels May 1, 2019
@ioannisg ioannisg added the area: ARM ARM (32-bit) Architecture label May 2, 2019
@wentongwu
Copy link
Contributor

from log, it's the instruction located in 0x8656(in memset) caused the BUS fault. And it tries to access 0xaaaaaaaa, so there is bus fault. Haven't got the root cause, still checking it.

@wentongwu
Copy link
Contributor

wentongwu commented May 12, 2019

  1. when CONFIG_COVERAGE=y, it will enable CONFIG_NO_OPTIMIZATIONS, and only with CONFIG_NO_OPTIMIZATIONS=y and CONFIG_INIT_STACKS=y, the issue will also happen.

  2. -O0 optimized memset code is like below, and it will use the stack buffer to do the buffer assignment

    while (n >= sizeof(unsigned int)) {
    3862:       e007            b.n     3874 <memset+0x6a>
                *(d_word++) = c_word;
    3864:       69bb            ldr     r3, [r7, #24]
    3866:       1d1a            adds    r2, r3, #4
    3868:       61ba            str     r2, [r7, #24]
    386a:       693a            ldr     r2, [r7, #16]
    386c:       601a            str     r2, [r3, #0]
                n -= sizeof(unsigned int);
    386e:       687b            ldr     r3, [r7, #4]
    3870:       3b04            subs    r3, #4
    3872:       607b            str     r3, [r7, #4]
        while (n >= sizeof(unsigned int)) {
    3874:       687b            ldr     r3, [r7, #4]
    3876:       2b03            cmp     r3, #3
    3878:       d8f4            bhi.n   3864 <memset+0x5a>
        }

added some debug code and also from above log 0x0000c4e0 in prepare_multithreading (dummy_thread=0x2000ca54 <_main_stack+7972>), it shows the initialization code (before switch to main thread) is using the main_stack as the stack, and also z_new_thread_init is memset-ing main_stack buffer with 0xaa, so until the place [r7, #24] = 0xaaaaaaaa, str r2, [r3, #0] will trigger BUS FAULT as last comment mentioned. So the question is why initialization code use main_stack instead of _interrupt_stack as designed.

  1. when CONFIG_NO_OPTIMIZATIONS=y, _PrepC will do function call to set_and_switch_to_psp,
void _PrepC(void)
{
    34dc:       b580            push    {r7, lr}
    34de:       af00            add     r7, sp, #0
#ifdef CONFIG_INIT_STACKS
        init_stacks();
    34e0:       f7fe f9ba       bl      1858 <init_stacks>
#endif
        /*
         * Set PSP and use it to boot without using MSP, so that it
         * gets set to _interrupt_stack during initialization.
         */
        set_and_switch_to_psp();
    34e4:       f7fe f9a4       bl      1830 <set_and_switch_to_psp>

and from set_and_switch_to_psp code below, when return from set_and_switch_to_psp, there will be "mov sp, r7" , so it will still write the original stack(see vector table) address to sp (even it's PSP now) instead of _interrupt_stack as code designed.

static inline void set_and_switch_to_psp(void)
{
    1830:       b580            push    {r7, lr}
    1832:       b082            sub     sp, #8
    1834:       af00            add     r7, sp, #0
        u32_t process_sp;

        process_sp = (u32_t)&_interrupt_stack + CONFIG_ISR_STACK_SIZE;
    1836:       4b07            ldr     r3, [pc, #28]   ; (1854 <set_and_switch_to_psp+0x24>)
    1838:       f503 5380       add.w   r3, r3, #4096   ; 0x1000
    183c:       607b            str     r3, [r7, #4]
    183e:       687b            ldr     r3, [r7, #4]
    1840:       603b            str     r3, [r7, #0]
  \details Assigns the given value to the Process Stack Pointer (PSP).
  \param [in]    topOfProcStack  Process Stack Pointer value to set
 */
__STATIC_FORCEINLINE void __set_PSP(uint32_t topOfProcStack)
{
  __ASM volatile ("MSR psp, %0" : : "r" (topOfProcStack) : );
    1842:       683b            ldr     r3, [r7, #0]
    1844:       f383 8809       msr     PSP, r3
        __set_PSP(process_sp);
        switch_sp_to_psp();
    1848:       f001 fe21       bl      348e <switch_sp_to_psp>
}
    184c:       bf00            nop
    184e:       3708            adds    r7, #8
    1850:       46bd            mov     sp, r7
    1852:       bd80            pop     {r7, pc}
    1854:       200001a8        .word   0x200001a8
  1. so there should be no function call when changing buffer address to SP, and it's the -O0 that cause the "static inline" not working. And with below temp changes, the samples/hello_world case with (CONFIG_INIT_STACKS=y and CONFIG_COVERAGE=y) can run correctly.
--- a/arch/arm/core/cortex_m/prep_c.c
+++ b/arch/arm/core/cortex_m/prep_c.c
@@ -37,7 +37,7 @@

 #include <string.h>

-static inline void switch_sp_to_psp(void)
+static inline __attribute__((always_inline)) void switch_sp_to_psp(void)
 {
        __set_CONTROL(__get_CONTROL() | CONTROL_SPSEL_Msk);
        /*
@@ -48,7 +48,7 @@ static inline void switch_sp_to_psp(void)
        __ISB();
 }

-static inline void set_and_switch_to_psp(void)
+static inline __attribute__((always_inline)) void set_and_switch_to_psp(void)
 {
        u32_t process_sp; 
  1. after applied above changes, most of the cases in tests/kernel/mem_protect/userspace also can run correctly with CONFIG_INIT_STACKS=y and CONFIG_COVERAGE=y. I will consider more to make the PR appropriatly.

@wentongwu
Copy link
Contributor

with above temp changes, only the last sub test of tests/kernel/mem_protect/userspace fails as below, so I think the original crash problem has been fixed. I'm preparing PR. I will open another github issue track below problem. Thanks.

starting test - test_stack_buffer
Reserved space: 0
Provided stack size: 1024
Stack object 0x20023000[1024]

  • Testing supervisor mode

    • Thread reports buffer 0x20023020 size 992***** MPU FAULT *****
      Data Access Violation
      MMFAR Address: 0x20023000
      ***** Hardware exception *****
      Current thread ID = 0x2001163c
      Faulting instruction address = 0x10
      Caught system error -- reason 0

    Assertion failed at ./tests/kernel/mem_protect/userspace/src/main.c:107: z_SysFatalErrorHandler: (Reached unreachable code)
    Unexpected fault during test
    FAIL - test_stack_buffer

@ioannisg
Copy link
Member

and from set_and_switch_to_psp code below, when return from set_and_switch_to_psp, there will be "mov sp, r7" , so it will still write the original stack(see vector table) address to sp (even it's PSP now) instead of _interrupt_stack as code designed.

Right. It was really a bad idea to move this code to C from assembly.

wentongwu added a commit to wentongwu/zephyr that referenced this issue May 16, 2019
With -O0 optimizion, gcc compiler doesn't inline "static inline"
marked function. So when function call return from function
set_and_switch_to_psp which is to switch sp from MSP to PSP, the
ending "mov sp, r7" instruction will overwrite the just updated
sp value(PSP) with the beginning stack pointer(should be MSP)
stored in r7 register, so the switch doesn't happen. And it causes
unpredictable problems in the initialization process, the backward
analysis for this problem can be found on Github issue zephyrproject-rtos#15794.

Fixes: zephyrproject-rtos#15794.

Signed-off-by: Wentong Wu <[email protected]>
wentongwu added a commit to wentongwu/zephyr that referenced this issue May 16, 2019
increase stack buffer when code coverage enabled.

Fixes: zephyrproject-rtos#15794.

Signed-off-by: Wentong Wu <[email protected]>
backporting bot pushed a commit that referenced this issue May 16, 2019
With -O0 optimizion, gcc compiler doesn't inline "static inline"
marked function. So when function call return from function
set_and_switch_to_psp which is to switch sp from MSP to PSP, the
ending "mov sp, r7" instruction will overwrite the just updated
sp value(PSP) with the beginning stack pointer(should be MSP)
stored in r7 register, so the switch doesn't happen. And it causes
unpredictable problems in the initialization process, the backward
analysis for this problem can be found on Github issue #15794.

Fixes: #15794.

Signed-off-by: Wentong Wu <[email protected]>
backporting bot pushed a commit that referenced this issue May 16, 2019
increase stack buffer when code coverage enabled.

Fixes: #15794.

Signed-off-by: Wentong Wu <[email protected]>
andrewboie pushed a commit that referenced this issue May 16, 2019
With -O0 optimizion, gcc compiler doesn't inline "static inline"
marked function. So when function call return from function
set_and_switch_to_psp which is to switch sp from MSP to PSP, the
ending "mov sp, r7" instruction will overwrite the just updated
sp value(PSP) with the beginning stack pointer(should be MSP)
stored in r7 register, so the switch doesn't happen. And it causes
unpredictable problems in the initialization process, the backward
analysis for this problem can be found on Github issue #15794.

Fixes: #15794.

Signed-off-by: Wentong Wu <[email protected]>
andrewboie pushed a commit that referenced this issue May 16, 2019
increase stack buffer when code coverage enabled.

Fixes: #15794.

Signed-off-by: Wentong Wu <[email protected]>
nashif pushed a commit to nashif/zephyr that referenced this issue May 21, 2019
With -O0 optimizion, gcc compiler doesn't inline "static inline"
marked function. So when function call return from function
set_and_switch_to_psp which is to switch sp from MSP to PSP, the
ending "mov sp, r7" instruction will overwrite the just updated
sp value(PSP) with the beginning stack pointer(should be MSP)
stored in r7 register, so the switch doesn't happen. And it causes
unpredictable problems in the initialization process, the backward
analysis for this problem can be found on Github issue zephyrproject-rtos#15794.

Fixes: zephyrproject-rtos#15794.

Signed-off-by: Wentong Wu <[email protected]>
nashif pushed a commit to nashif/zephyr that referenced this issue May 21, 2019
increase stack buffer when code coverage enabled.

Fixes: zephyrproject-rtos#15794.

Signed-off-by: Wentong Wu <[email protected]>
nashif pushed a commit that referenced this issue May 21, 2019
With -O0 optimizion, gcc compiler doesn't inline "static inline"
marked function. So when function call return from function
set_and_switch_to_psp which is to switch sp from MSP to PSP, the
ending "mov sp, r7" instruction will overwrite the just updated
sp value(PSP) with the beginning stack pointer(should be MSP)
stored in r7 register, so the switch doesn't happen. And it causes
unpredictable problems in the initialization process, the backward
analysis for this problem can be found on Github issue #15794.

Fixes: #15794.

Signed-off-by: Wentong Wu <[email protected]>
nashif pushed a commit that referenced this issue May 21, 2019
increase stack buffer when code coverage enabled.

Fixes: #15794.

Signed-off-by: Wentong Wu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: ARM ARM (32-bit) Architecture bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants