Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linking picolibc's srand to esp32c3 causes startup fault #69708

Closed
lgl88911 opened this issue Mar 2, 2024 · 14 comments · Fixed by #72642
Closed

Linking picolibc's srand to esp32c3 causes startup fault #69708

lgl88911 opened this issue Mar 2, 2024 · 14 comments · Fixed by #72642
Assignees
Labels
area: picolibc Picolibc C Standard Library bug The issue is a bug, or the PR is fixing a bug platform: ESP32 Espressif ESP32 priority: low Low impact/importance bug

Comments

@lgl88911
Copy link
Collaborator

lgl88911 commented Mar 2, 2024

Describe the bug
Linking picolibc's srand to esp32c3 causes startup fault

To Reproduce
Steps to reproduce the behavior:

  1. Modify hello_world sample : Call srand in the main function
int main(void)
{
	srand(3);
	printf("Hello World! %s\n", CONFIG_BOARD);
	return 0;
}
  1. west build -b esp32c3_devkitm sample/hello_world
  2. west flash
  3. Reset run, you will find startup hang-up:

I (189) boot: Loaded app from partition at offset 0x10000
I (189) boot: Disabling RNG early entropy source...

Expected behavior
Linking picolibc's srand works fine on startup。

Impact
I can‘t use picolibc's C library properly on esp32c3

Logs and console output
Via jtag/gdb I found that the fault happens in prepare_multithreading->z_setup_new_thread->setup_thread_stack, but I can‘t confirm the exact reason:

arch_system_halt (reason=reason@entry=0) at /home/frank/work/zpro/zephyr_zgp/zephyr/kernel/fatal.c:30
30 for (;;) {
(gdb) bt
#0 arch_system_halt (reason=reason@entry=0) at /home/frank/work/zpro/zephyr_zgp/zephyr/kernel/fatal.c:30
#1 0x403805cc in k_sys_fatal_error_handler (reason=reason@entry=0, esf=) at /home/frank/work/zpro/zephyr_zgp/zephyr/kernel/fatal.c:44
#2 0x403805e6 in z_fatal_error (reason=reason@entry=0, esf=) at /home/frank/work/zpro/zephyr_zgp/zephyr/kernel/fatal.c:118
#3 0x403804ca in z_riscv_fatal_error (esf=, reason=0) at /home/frank/work/zpro/zephyr_zgp/zephyr/arch/riscv/core/fatal.c:59
#4 _Fault (esf=) at /home/frank/work/zpro/zephyr_zgp/zephyr/arch/riscv/core/fatal.c:184
#5 0x403801ec in z_riscv_thread_start () at /home/frank/work/zpro/zephyr_zgp/zephyr/arch/riscv/core/isr.S:624
Backtrace stopped: frame did not save the PC

Additional context
But from the phenomenon, it seems that only linking picolibc's srand causes this phenomenon, if I switch to use newlibc it will work fine. I suspect it is related to -T${CMAKE_CURRENT_SOURCE_DIR}/../../components/esp_rom/esp32c3/ld/esp32c3.rom.newlib.ld in modules/hal/espressif/zephyr/esp32c3/CMakeLists.txt, but no further ideas.

@lgl88911 lgl88911 added the bug The issue is a bug, or the PR is fixing a bug label Mar 2, 2024
@henrikbrixandersen henrikbrixandersen added platform: ESP32 Espressif ESP32 area: picolibc Picolibc C Standard Library labels Mar 4, 2024
@aescolar aescolar added the priority: low Low impact/importance bug label Mar 5, 2024
Copy link

github-actions bot commented May 5, 2024

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@github-actions github-actions bot added the Stale label May 5, 2024
@lgl88911
Copy link
Collaborator Author

lgl88911 commented May 5, 2024

This problem remains unresolved. Could someone provide assistance?

@sylvioalves
Copy link
Collaborator

sylvioalves commented May 5, 2024

Hum I thought it was solved already. Let me re-check tomorrow.

@github-actions github-actions bot removed the Stale label May 6, 2024
@keith-packard
Copy link
Collaborator

Hrm. srand is a wrapper for srandom which stores to a thread local variable, _rand_next. I bet there's some thread local storage issue at play here.

@keith-packard
Copy link
Collaborator

If this is a TLS specific issue, then building using newlib with TLS enabled should also fail (as that causes the kernel to use TLS for some internal operations). If this is specific to initialized TLS variables, then this should fail even with newlib:

/*
 * Copyright (c) 2012-2014 Wind River Systems, Inc.
 *
 * SPDX-License-Identifier: Apache-2.0
 */

#include <stdio.h>

__thread int tls_data_variable = 14;

int main(void)
{
	printf("Hello World! %s %d\n", CONFIG_BOARD_TARGET, tls_data_variable);

	return 0;
}

@sylvioalves
Copy link
Collaborator

sylvioalves commented May 12, 2024

Ok, got it. Reason it fails is because __tdata_start is placed in wrong region:
0x00000000000091e8 PROVIDE (__tdata_start = LOADADDR (tdata)), where it should be something like
0x000000003c0007bc PROVIDE (__tdata_start = ADDR (tdata)).

Seems esp32c3 linker file needs to be updated, which can be confirmed using the patch below, which makes it work by fixing the above mentioned address:

diff --git a/include/zephyr/linker/thread-local-storage.ld b/include/zephyr/linker/thread-local-storage.ld
index 5a1313e1c8c..8444d1f082e 100644
--- a/include/zephyr/linker/thread-local-storage.ld
+++ b/include/zephyr/linker/thread-local-storage.ld
@@ -20,7 +20,7 @@
         */
 #ifdef CONFIG_XIP
        /* The "master copy" of tdata should be only in flash on XIP systems */
-       PROVIDE(__tdata_start = LOADADDR(tdata));
+       PROVIDE(__tdata_start = ADDR(tdata));
 #else
        PROVIDE(__tdata_start = ADDR(tdata));
 #endif

@keith-packard Not sure if the PR linked to this issue is the final solution yet. PTAL if you can. I need to check whether we have more cases like this and/or evaluate if this is the correct fix.

@keith-packard
Copy link
Collaborator

I commented in the PR -- I don't think this is right.

@lgl88911
Copy link
Collaborator Author

@sylvioalves @keith-packard I have performed a simple verification on #72642 , and i seems that this issue can be resolved. But I have some doubts. The error with __tdata_start should have caused a fault during z_tls_copy. Why does the gdb trace show it happening in z_riscv_thread_start?

@sylvioalves
Copy link
Collaborator

sylvioalves commented May 13, 2024

z_riscv_thread_start

z_tls_copy is inlined, so the gdb trace will not fault in there I think. What I can tell you right now is that memcpy and memset used in that function are provided in ROM. In that case, can you retry debugging it (making it crash) and comment lines 17 and 18 in here?
https://github.com/zephyrproject-rtos/hal_espressif/blob/zephyr/components/esp_rom/esp32c3/ld/esp32c3.rom.newlib.ld#L17

I'll be able to do so later in case you can´t.

@lgl88911
Copy link
Collaborator Author

lgl88911 commented May 13, 2024

In that case, can you retry debugging it (making it crash) and comment lines 17 and 18 in here?

In this case, where I am unable to see the "hello world" information, it is likely that a fault has occurred. However, since there is no JTAG connection available at the moment, it is not possible to further investigate and determine the exact location of the fault.

@lgl88911
Copy link
Collaborator Author

@sylvioalves
Today I finally found the Jtag connection and did further verification.When not using the memcpy and memset within the ESP32C3 ROM, the program will link to the symbols in Picolibc. However, when a fault occurs, the backtrace may not be complete. This could be due to the optimization of the stack frame under the RISC-V toolchain.

#0 _Fault (esf=0x3fcde480) at /home/frank/work/zpro/zephyrproject/zephyr/arch/riscv/core/fatal.c:280
#1 0x4038020e in z_riscv_thread_start () at /home/frank/work/zpro/zephyrproject/zephyr/arch/riscv/core/isr.S:660

However, I can confirm through single-step execution that the fault did occur when tls copying to the flash address within memcpy. Therefore, I can be certain that your proposed modification can solve this problem.

(gdb) si
Don't have the number of threads in FreeRTOS!
0x40381786 25 stack_ptr -= z_tls_data_size();
(gdb) si
Don't have the number of threads in FreeRTOS!
z_tls_copy (dest=0x3fc8c424 <z_main_stack+2036> "\nVD}") at /home/frank/work/zpro/zephyrproject/zephyr/kernel/include/kernel_tls.h:46
46 memcpy(dest, __tdata_start, (size_t)(uintptr_t)__tdata_size);
(gdb) p dest
$3 = 0x3fc8c424 <z_main_stack+2036> "\nVD}"
(gdb) p __tdata_start
$4 = 0x10ed0 ""
(gdb) si
Don't have the number of threads in FreeRTOS!
0x4038178c in arch_tls_stack_setup (new_thread=new_thread@entry=0x3fc8d558 <z_main_thread>, stack_ptr=0x3fc8c424 <z_main_stack+2036> "\nVD}", stack_ptr@entry=0x3fc8c430 <kheap.system_heap> "") at /home/frank/work/zpro/zephyrproject/zephyr/arch/riscv/core/tls.c:15
15 {
(gdb) si
Don't have the number of threads in FreeRTOS!
0x4038178e in z_tls_copy (dest=0x3fc8c424 <z_main_stack+2036> "\nVD}") at /home/frank/work/zpro/zephyrproject/zephyr/kernel/include/kernel_tls.h:46
46 memcpy(dest, __tdata_start, (size_t)(uintptr_t)__tdata_size);
(gdb) si
Don't have the number of threads in FreeRTOS!
0x40381792 in arch_tls_stack_setup (new_thread=new_thread@entry=0x3fc8d558 <z_main_thread>, stack_ptr=0x3fc8c424 <z_main_stack+2036> "\nVD}", stack_ptr@entry=0x3fc8c430 <kheap.system_heap> "") at /home/frank/work/zpro/zephyrproject/zephyr/arch/riscv/core/tls.c:15
15 {
(gdb) si
Don't have the number of threads in FreeRTOS!
0x40381794 in z_tls_copy (dest=0x3fc8c424 <z_main_stack+2036> "\nVD}") at /home/frank/work/zpro/zephyrproject/zephyr/kernel/include/kernel_tls.h:46
46 memcpy(dest, __tdata_start, (size_t)(uintptr_t)__tdata_size);
(gdb) si
Don't have the number of threads in FreeRTOS!
0x40381798 46 memcpy(dest, __tdata_start, (size_t)(uintptr_t)__tdata_size);
(gdb) si
Don't have the number of threads in FreeRTOS!
0x4038179a in arch_tls_stack_setup (new_thread=new_thread@entry=0x3fc8d558 <z_main_thread>, stack_ptr=0x3fc8c424 <z_main_stack+2036> "\nVD}", stack_ptr@entry=0x3fc8c430 <kheap.system_heap> "") at /home/frank/work/zpro/zephyrproject/zephyr/arch/riscv/core/tls.c:15
15 {
(gdb) bt
#0 0x4038179a in arch_tls_stack_setup (new_thread=new_thread@entry=0x3fc8d558 <z_main_thread>, stack_ptr=0x3fc8c424 <z_main_stack+2036> "\nVD}",
stack_ptr@entry=0x3fc8c430 <kheap.system_heap> "") at /home/frank/work/zpro/zephyrproject/zephyr/arch/riscv/core/tls.c:15
#1 0x40381ca4 in setup_thread_stack (stack_size=2048, stack=0x3fc8bc30 <z_main_stack>, new_thread=0x3fc8d558 <z_main_thread>)
at /home/frank/work/zpro/zephyrproject/zephyr/kernel/thread.c:480
#2 z_setup_new_thread (new_thread=new_thread@entry=0x3fc8d558 <z_main_thread>, stack=stack@entry=0x3fc8bc30 <z_main_stack>, stack_size=stack_size@entry=2048,
entry=entry@entry=0x4038189c <bg_thread_main>, p1=p1@entry=0x0, p2=p2@entry=0x0, p3=p3@entry=0x0, prio=prio@entry=0, options=options@entry=1, name=name@entry=0x3fc89760 "main")
at /home/frank/work/zpro/zephyrproject/zephyr/kernel/thread.c:555
#3 0x40381a90 in prepare_multithreading () at /home/frank/work/zpro/zephyrproject/zephyr/kernel/init.c:563
#4 z_cstart () at /home/frank/work/zpro/zephyrproject/zephyr/kernel/init.c:676
#5 0x403802e0 in __esp_platform_start () at /home/frank/work/zpro/zephyrproject/zephyr/soc/espressif/esp32c3/soc.c:102
#6 0x4004a296 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) si
Don't have the number of threads in FreeRTOS!
0x4038179c in z_tls_copy (dest=0x3fc8c424 <z_main_stack+2036> "\nVD}") at /home/frank/work/zpro/zephyrproject/zephyr/kernel/include/kernel_tls.h:46
46 memcpy(dest, __tdata_start, (size_t)(uintptr_t)__tdata_size);
(gdb) si
Don't have the number of threads in FreeRTOS!
0x40386540 in memcpy ()
(gdb) si
Don't have the number of threads in FreeRTOS!
0x40386544 in memcpy ()
(gdb) si
Don't have the number of threads in FreeRTOS!
0x40386548 in memcpy ()
(gdb) si
[esp32c3] Halt cause (5) - (PMP Load access fault)
Don't have the number of threads in FreeRTOS!
_isr_wrapper () at /home/frank/work/zpro/zephyrproject/zephyr/arch/riscv/core/isr.S:183
183 addi sp, sp, -__z_arch_esf_t_SIZEOF
(gdb) bt
#0 _isr_wrapper () at /home/frank/work/zpro/zephyrproject/zephyr/arch/riscv/core/isr.S:183
#1 0x403817a0 in z_tls_copy (dest=0x3fc8c424 <z_main_stack+2036> "\nVD}") at /home/frank/work/zpro/zephyrproject/zephyr/kernel/include/kernel_tls.h:46
#2 arch_tls_stack_setup (new_thread=new_thread@entry=0x3fc8d558 <z_main_thread>, stack_ptr=0x3fc8c424 <z_main_stack+2036> "\nVD}",
stack_ptr@entry=0x3fc8c430 <kheap.system_heap> "") at /home/frank/work/zpro/zephyrproject/zephyr/arch/riscv/core/tls.c:26
#3 0x40381ca4 in setup_thread_stack (stack_size=2048, stack=0x3fc8bc30 <z_main_stack>, new_thread=0x3fc8d558 <z_main_thread>)
at /home/frank/work/zpro/zephyrproject/zephyr/kernel/thread.c:480
#4 z_setup_new_thread (new_thread=new_thread@entry=0x3fc8d558 <z_main_thread>, stack=stack@entry=0x3fc8bc30 <z_main_stack>, stack_size=stack_size@entry=2048,
entry=entry@entry=0x4038189c <bg_thread_main>, p1=p1@entry=0x0, p2=p2@entry=0x0, p3=p3@entry=0x0, prio=prio@entry=0, options=options@entry=1, name=name@entry=0x3fc89760 "main")
at /home/frank/work/zpro/zephyrproject/zephyr/kernel/thread.c:555
#5 0x40381a90 in prepare_multithreading () at /home/frank/work/zpro/zephyrproject/zephyr/kernel/init.c:563
#6 z_cstart () at /home/frank/work/zpro/zephyrproject/zephyr/kernel/init.c:676
#7 0x403802e0 in __esp_platform_start () at /home/frank/work/zpro/zephyrproject/zephyr/soc/espressif/esp32c3/soc.c:102
#8 0x4004a296 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) si
[esp32c3] Halt cause (5) - (PMP Load access fault)
Don't have the number of threads in FreeRTOS!
184 DO_CALLER_SAVED(sr) ;

@keith-packard
Copy link
Collaborator

The TLS initialized data lives in FLASH at 0x10ed0 and needs to be copied into each thread's TLS block so that the initialized TLS variables start with the right values. It seems that the processor cannot read from FLASH though, and so the system generates a fault when memcpy is called. Using the fake memory address of the tdata segment (possibly 0x3c000ed0?), which is what you'd get if you replace __tdata_start = LOADADDR(tdata) with __tdata_start = ADDR(tdata) will resolve the access fault, but I don't understand how it can actually work -- how does the memory at that address ever receive the contents of FLASH at 0x10ed0? Who copies that information at boot time? It shouldn't be part of the regular .data segment initialization sequence as tdata is not part of .data.

@lgl88911
Copy link
Collaborator Author

how it can actually work -- how does the memory at that address ever receive the contents of FLASH at 0x10ed0? Who copies that information at boot time?

LOADADDR(tdata) is the offset address of tdata in the flash. The CPU can't directly access the tdata in the flash through this address. The esp32c3 contains an MMU and related flash drivers. During the bootloader phase, the addresses of the external SPI flash are mapped to virtual addresses through the MMU. After that, the CPU can directly access the flash through the virtual address ADDR(tdata).

@sylvioalves sylvioalves added DNM This PR should not be merged (Do Not Merge) and removed DNM This PR should not be merged (Do Not Merge) labels May 23, 2024
@sylvioalves
Copy link
Collaborator

@lgl88911, correct.
@keith-packard, we are evaluating removing CONFIG_XIP from the whole Espressif porting in Zephyr. The way ESP32 XiP works (with MMU and cache) does no fit the way Zephyr XiP is implemented, causing issues like this one. Flash code still resides in flash for execution, but MMU/Cache handles it in such way that XiP might not (or should not) be used with current approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: picolibc Picolibc C Standard Library bug The issue is a bug, or the PR is fixing a bug platform: ESP32 Espressif ESP32 priority: low Low impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants