Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel.threads.tls.userspace fails with SDK 0.12.0-beta on ARM Cortex-M #30393

Closed
galak opened this issue Dec 2, 2020 · 9 comments · Fixed by #31312
Closed

kernel.threads.tls.userspace fails with SDK 0.12.0-beta on ARM Cortex-M #30393

galak opened this issue Dec 2, 2020 · 9 comments · Fixed by #31312
Assignees
Labels
bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@galak
Copy link
Collaborator

galak commented Dec 2, 2020

tests/kernel/threads/tls/kernel.threads.tls.userspace fails to run on qemu for mps2_an521 & mps2_an385.

./scripts/sanitycheck -p mps2_an521 -p mps2_an385 -s tests/kernel/threads/tls/kernel.threads.tls.userspace

Get the following log:

Running test suite thread_tls
===================================================================
START - test_tls
thread 0: result 0 (expecting 0)
thread 1: result 0 (expecting 0)
thread 2: result 0 (expecting 0)
 PASS - test_tls
===================================================================
Test suite thread_tls succeeded
Running test suite thread_tls_userspace
===================================================================
START - test_tls_userspace
E: 0x200 is not a valid z_thread_stack_element
E: address is not a known kernel object
E: syscall z_vrfy_k_thread_create failed check: bad stack object
E: r0/a1:  0x00000000  r1/a2:  0x00000000  r2/a3:  0x00000000
E: r3/a4:  0x00000000 r12/ip:  0x00000000 r14/lr:  0x00000000
E:  xpsr:  0x00000000
E: Faulting instruction address (r15/pc): 0x00000000
E: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
E: Current thread: 0x300002f8 (unknown)
E: Halting system
QEMU: Terminated
@galak galak added the bug The issue is a bug, or the PR is fixing a bug label Dec 2, 2020
@galak galak added the priority: medium Medium impact/importance bug label Dec 2, 2020
@galak
Copy link
Collaborator Author

galak commented Dec 2, 2020

@dcpleung can you please take a look at this, seeing failures when building with gcc-10.2 on ARM.

@dcpleung
Copy link
Member

dcpleung commented Dec 2, 2020

I've traced it to the calling site of k_thread_create() at tests/kernel/threads/tls/src/main.c line 102.

Under userspace, when creating the second thread (tls_thread[1]), the stack argument to k_thread_create() is 0x200 and not pointing to tls_stack[1]. This is due to register r1 being set to 0x200. Here are the register values just before calling k_thread_create() on second thread:

Test NOT under userspace:

r0             0x200000c0          536871104
r1             0x20003200          536883712
r2             0x200000a8          536871080
r3             0x0                 0
r4             0x1                 1
r5             0x0                 0
r6             0x0                 0
r7             0x200000a0          536871072
r8             0x1                 1
r9             0x200000c0          536871104
r10            0x20000188          536871304
r11            0x200000a4          536871076
r12            0xabcdef00          -1412567296
sp             0x20002f60          0x20002f60 <ztest_thread_stack+3936>
lr             0x4b37              19255
pc             0x53c               0x53c <start_tls_test+28>
xpsr           0x81000000          -2130706432

Test under userspace:

r0             0x200000c0          536871104
r1             0x200               512
r2             0x200000a8          536871080
r3             0x0                 0
r4             0x1                 1
r5             0x20002f90          536883088
r6             0x24c               588
r7             0x200000a0          536871072
r8             0x56e               1390
r9             0x200000c0          536871104
r10            0x20000188          536871304
r11            0x200000a4          536871076
r12            0x56f               1391
sp             0x20002f60          0x20002f60 <ztest_thread_stack+3936>
lr             0x54d               1357
pc             0x53c               0x53c <start_tls_test+28>
xpsr           0x81000000          -2130706432

Maybe register r1 is not restored correctly when doing syscall? The value 0x200 could be GCC adding to r1 to change from tls_stack[0] to tls_stack[1] (as one stack is of size 512 bytes == 0x200).

@dcpleung
Copy link
Member

dcpleung commented Dec 2, 2020

Here is the disassembly of the for loop:

0x53c <start_tls_test+28>	mov.w   r2, #5
0x540 <start_tls_test+32>       ldr     r3, [sp, #32]
0x542 <start_tls_test+34>       strb.w  r2, [r3], #1                                             	
0x546 <start_tls_test+38>	str     r3, [sp, #32]
0x548 <start_tls_test+40>      	bl     	0xa2bc <arch_is_user_context>
0x54c <start_tls_test+44>	ldr     r3, [pc, #244]  ; (0x644 <start_tls_test+292>)
0x54e <start_tls_test+46>	cmp     r0, #0
0x550 <start_tls_test+48>	beq.n   
0x552 <start_tls_test+50>	movs    r2, #0
0x554 <start_tls_test+52>	ldr     r0, [sp, #40]   ; 0x28
0x556 <start_tls_test+54>	strd    r2, r2, [sp, #48]	; 0x30
0x55a <start_tls_test+58>	strd    r0, r2, [sp, #60]      	; 0x3c
0x55e <start_tls_test+62>	str     r2, [sp, #56]   ; 0x38
0x560 <start_tls_test+64>	str     r2, [sp, #68]   ; 0x44
0x562 <start_tls_test+66>	mov     r0, r10
0x564 <start_tls_test+68>	mov.w   r2, #512        ; 0x200
0x568 <start_tls_test+72>	add     r5, sp, #48     ; 0x30
0x56a <start_tls_test+74>	movs    r6, #147        ; 0x93
0x56c <start_tls_test+76>	svc     3
0x56e <start_tls_test+78>	ldr     r2, [sp, #36]   ; 0x24 
0x570 <start_tls_test+80>	adds    r4, #1
0x572 <start_tls_test+82>	str.w   r0, [r2], #4
0x576 <start_tls_test+86>	cmp     r4, #3
0x578 <start_tls_test+88>      	str     r2, [sp, #36]   ; 0x24
0x57a <start_tls_test+90>      	add.w   r10, r10, #200  ; 0xc8
0x57e <start_tls_test+94>      	add.w   r1, r1, #512    ; 0x200
0x582 <start_tls_test+98>	bne.n   0x53c <start_tls_test+28>

start_tls_test+94 is where it adds 512 == 0x200 to r1, where under userspace, r1 is 0 after returning from k_thread_create().

@dcpleung
Copy link
Member

dcpleung commented Dec 2, 2020

I was able to reproduce it even with TLS disabled: west build -d /tmp/build -p -b mps2_an385 tests/kernel/threads/tls/ -t run -- -DCONFIG_TEST_USERSPACE=y -DCONFIG_QEMU_ICOUNT=n -DCONFIG_THREAD_LOCAL_STORAGE=n

and the following modified main.c:

/*
 * Copyright (c) 2020 Intel Corporation
 *
 * SPDX-License-Identifier: Apache-2.0
 */

#include <ztest.h>
#include <kernel.h>
#include <kernel_structs.h>
#include <app_memory/app_memdomain.h>
#include <sys/libc-hooks.h>
#include <sys/util.h>

#define NUM_THREADS	3
#define STACK_SIZE	(512 + CONFIG_TEST_EXTRA_STACKSIZE)

#define STATIC_DATA	0xABCDEF00U

#ifdef CONFIG_USERSPACE
K_APPMEM_PARTITION_DEFINE(part_common);
struct k_mem_domain dom_common;
#endif /* CONFIG_USERSPACE */

enum test_result {
	TEST_OK,

	/* When thread_data* != STATIC_DATA at thread entry */
	ERR_BAD_STATIC_DATA,

	/* When thread_bss* != 0 at thread entry */
	ERR_BSS_NOT_ZERO,

	/* If data/bss is changed by other threads */
	ERR_DATA_CHANGED_BY_OTHERS,
	ERR_BSS_CHANGED_BY_OTHERS,

	TEST_NOT_STARTED,
};

static K_THREAD_STACK_ARRAY_DEFINE(tls_stack, NUM_THREADS, STACK_SIZE);

static struct k_thread tls_thread[NUM_THREADS];

K_APP_BMEM(part_common) static k_tid_t tls_tid[NUM_THREADS];
K_APP_BMEM(part_common) static enum test_result tls_result[NUM_THREADS];

static void tls_thread_entry(void *p1, void *p2, void *p3)
{
	uint32_t idx;

	idx = (uint32_t)POINTER_TO_UINT(p1);

	/* Values are all expected. Test passed */
	tls_result[idx] = TEST_OK;

	return;
}

static void start_tls_test(uint32_t thread_options)
{
	unsigned int i;
	bool passed;

	/* Create threads */
	for (i = 0; i < NUM_THREADS; i++) {
		tls_result[i] = TEST_NOT_STARTED;
		tls_tid[i] = k_thread_create(&tls_thread[i], tls_stack[i],
					     STACK_SIZE, tls_thread_entry,
					     UINT_TO_POINTER(i), NULL, NULL,
					     0, thread_options, K_NO_WAIT);
	}

	/* Wait for all threads to run */
	k_sleep(K_MSEC(500));

	/* Stop all threads */
	for (i = 0; i < NUM_THREADS; i++) {
		k_thread_abort(tls_tid[i]);
		k_thread_join(&tls_thread[i], K_FOREVER);
	}

	/* Check test results */
	passed = true;
	for (i = 0; i < NUM_THREADS; i++) {
		TC_PRINT("thread %d: result %d (expecting %d)\n",
			 i, tls_result[i], TEST_OK);
		if (tls_result[i] != TEST_OK) {
			passed = false;
		}
	}

	zassert_true(passed, "Test failed");
}

void test_tls(void)
{
	/* TLS test in supervisor mode */
	start_tls_test(0);
}

#ifdef CONFIG_USERSPACE
void test_tls_userspace(void)
{
	/* TLS test in supervisor mode */
	start_tls_test(K_USER | K_INHERIT_PERMS);
}
#endif

void test_main(void)
{
#ifdef CONFIG_USERSPACE
	unsigned int i;

	struct k_mem_partition *parts[] = {
		&part_common,
#if Z_LIBC_PARTITION_EXISTS
		&z_libc_partition,
#endif
		&ztest_mem_partition,
	};

	parts[0] = &part_common;
	k_mem_domain_init(&dom_common, ARRAY_SIZE(parts), parts);
	k_mem_domain_add_thread(&dom_common, k_current_get());

	for (i = 0; i < NUM_THREADS; i++) {
		k_thread_access_grant(k_current_get(),
				      &tls_thread[i], &tls_stack[i]);
	}
#endif /* CONFIG_USERSPACE */

	ztest_test_suite(thread_tls,
			 ztest_unit_test(test_tls));
	ztest_run_test_suite(thread_tls);

#ifdef CONFIG_USERSPACE
	ztest_test_suite(thread_tls_userspace,
			 ztest_user_unit_test(test_tls_userspace));
	ztest_run_test_suite(thread_tls_userspace);
#endif /* CONFIG_USERSPACE */
}

@andrewboie andrewboie changed the title kernel.threads.tls.userspace fails with SDK 0.12.0-beta kernel.threads.tls.userspace fails with SDK 0.12.0-beta on ARM Cortex-M Dec 2, 2020
dcpleung added a commit to dcpleung/zephyr that referenced this issue Dec 2, 2020
According to GCC manual regarding input operands in extended asm:

    Warning: Do not modify the contents of input-only operands
    (except for inputs tied to outputs). The compiler assumes that
    on exit from the asm statement these operands contain the same
    values as they had before executing the statement. It is not
    possible to use clobbers to inform the compiler that the values
    in these inputs are changing.

This has been observed that after arch_syscall_invoke6(), r1 has
its value set to zero instead of the value before syscall. Given
the above warning, it seems that GCC does not automatically save
and restore r1-r3. Since arch_syscall_invoke*() can be inlined
without function prologue so r0-r3 may not be saved even though
they are caller-save registers. So save and restore them manually
on relevant arch_syscall_invoke*().

Fixes zephyrproject-rtos#30393

Signed-off-by: Daniel Leung <[email protected]>
@dcpleung
Copy link
Member

dcpleung commented Jan 7, 2021

I think there is something wrong with GCC's code generation here. If I add printk() at the beginning of the thread creation loop, it ran fine. Also setting CONFIG_DEBUG_OPTIMIZATIONS=y also works. It seems to be broken on when using -Os.

@galak
Copy link
Collaborator Author

galak commented Jan 13, 2021

@dcpleung what do you think the compiler bug is?

@dcpleung
Copy link
Member

@dcpleung what do you think the compiler bug is?

I am not totally sure this is a compiler bug, but this looks like clobbered register are not handled correctly with inline assembly inside inline functions. When tracing through the syscall invoke inline functions (inside include/arch/arm/aarch32/syscall.h), clobbered registers specified in those invoke functions are not saved/restored by generated code. When I add code to manually save/restore those registers (as in #30404), the issue disappeared.

@galak
Copy link
Collaborator Author

galak commented Jan 13, 2021

I am not totally sure this is a compiler bug, but this looks like clobbered register are not handled correctly with inline assembly inside inline functions. When tracing through the syscall invoke inline functions (inside include/arch/arm/aarch32/syscall.h), clobbered registers specified in those invoke functions are not saved/restored by generated code. When I add code to manually save/restore those registers (as in #30404), the issue disappeared.

was there some particular registers that didn't seem to get save/restored?

@dcpleung
Copy link
Member

was there some particular registers that didn't seem to get save/restored?

The registers declared as variables (i.e. register uint32_t r1 __asm__("r1") = arg2;).

They cannot be added to the clobber list as GCC complains (error: 'asm' specifier for variable 'r1' conflicts with 'asm' clobber list).

@galak galak added the dev-review To be discussed in dev-review meeting label Jan 14, 2021
galak added a commit to galak/zephyr that referenced this issue Jan 14, 2021
The inline asm code was not conveying in all cases that registers r1-r3
would get clobbered by the SVC handler code.  In the cases that we can't
list r1-r3 in the clobber list the registers need to show up as outputs
to know that they values are not preserved by the callee.

Fixes zephyrproject-rtos#30393

Signed-off-by: Kumar Gala <[email protected]>
@galak galak removed the dev-review To be discussed in dev-review meeting label Jan 14, 2021
ioannisg pushed a commit that referenced this issue Jan 20, 2021
The inline asm code was not conveying in all cases that registers r1-r3
would get clobbered by the SVC handler code.  In the cases that we can't
list r1-r3 in the clobber list the registers need to show up as outputs
to know that they values are not preserved by the callee.

Fixes #30393

Signed-off-by: Kumar Gala <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
3 participants