Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subsys/logging: Add compiler barriers to msg stack allocation #46967

Merged

Conversation

keith-packard
Copy link
Collaborator

This ensures that the compiler will have moved the stack pointer below the
stack area where the message will be constructed. Otherwise, the message
can be smashed by an interrupt handler while it is being built.

This was found on arm64 when running the logging sample using picolibc and
building with sdk version 0.14.2

Signed-off-by: Keith Packard [email protected]

@nordicjm
Copy link
Collaborator

Fixes issue we have seen using nRF connect SDK 2.0.0 whereby too many messages causes a fault. Fault no longer appears with this fix

nordicjm
nordicjm previously approved these changes Jun 28, 2022
@andyross
Copy link
Contributor

This freaks me out a bit. Was log message management actually intended to be a lockless algorithm? That's... really, really hard to get right (sys_winstream was two weeks of work chasing "simple" mistakes, and that's a much smaller problem).

Why can't we just put a spinlock around this?

Also, note that ARM64 is OOO, so a compiler barrier is likely insufficient anyway and you probably want a DSB here (did we ever merge the memory barriers patch? I can't remember if there's an API for this yet or not).

Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, reading carefully, this is just a variable length array, not a lockless modification of global log state.

Let me pull in @carlocaione @stephanosio @npitre , I think what we're looking at here may be an arch bug. I'm not the arm64 expert, but I'm pretty sure the ABI (just like x86_64) includes a "red zone" below the stack pointer which leaf functions are allowed to use. Is it possible that something in interrupt entry/exit is forgetting that and clobbering the area under SP incorrectly before/after the switch to the interrupt stack?

Alternatively it's just a compiler bug, in which case adding a barrier is a fine workaround, but we should probably call that out in code and predicate it on whatever platforms are broken.

Drop a -1 on this, not to prevent merge but just to make sure we either get to a root cause or commit this as a workaround with an issue to make sure we find it.

@nordic-krch
Copy link
Contributor

@lairdjm is this #46986 bug fixed? If yes, @keith-packard can you add Fixes #46986 to the PR note?

@nordic-krch
Copy link
Contributor

#46986 was found on cortex-m so it's not arm64 specific.

@npitre
Copy link
Collaborator

npitre commented Jun 29, 2022 via email

@keith-packard
Copy link
Collaborator Author

I'd like to see a disassembly listing of the affected code, for both with and without the barrier. If this is indeed a case of the stack area being stumped by a badly timed IRQ that the barrier does fix then that will be pretty obvious from the disassembly. Normally, any allocation on the stack should see the stack pointer being decremented before any access (possibly using another register) to that area. If sp is decremented after the access then you'll have the smoking gun. Such analysis should ideally be part of the commit log as well (e.g. commit 1db5c8b).

I didn't actually look at the assembly code -- what I did was check the stack pointer while the data were being written into the reserved area and discovered that it was above the temporary storage, and that with the barriers in place, it was below. This seems like a compiler bug to me. That was just after setting a memory watchpoint on the first byte of the area and catching the trap handler writing over the address of the format string.

@nordicjm
Copy link
Collaborator

@lairdjm is this #46986 bug fixed? If yes, @keith-packard can you add Fixes #46986 to the PR note?

I did originally think it did, but it did not fix it. It's a bit of a nuisance to reproduce, and there wasn't enough shaking when I tested with this, or it was harder to reproduce as I expanded the buffer sizes.

@keith-packard
Copy link
Collaborator Author

OK, reading carefully, this is just a variable length array, not a lockless modification of global log state.

Yes, I think it's just in how GCC adjust $sp when there is a variable length array in the code -- it doesn't move the stack pointer, it just computes the address of the storage and uses it. That's presumably because nothing in the section of code writing the package to the stack might use the stack (call another function, etc), so it reduces the code by a couple of instructions.

Let me pull in @carlocaione @stephanosio @npitre , I think what we're looking at here may be an arch bug. I'm not the arm64 expert, but I'm pretty sure the ABI (just like x86_64) includes a "red zone" below the stack pointer which leaf functions are allowed to use. Is it possible that something in interrupt entry/exit is forgetting that and clobbering the area under SP incorrectly before/after the switch to the interrupt stack?

Yeah, looks like the ABI defines a 16-byte red zone below $sp. In this case, the temporary storage was larger than that and so outside of that zone.

Alternatively it's just a compiler bug, in which case adding a barrier is a fine workaround, but we should probably call that out in code and predicate it on whatever platforms are broken.

Yeah, this does look like it's just a compiler bug (however, it seems to be a cross-architecture bug?). Adding suitable comments would be a good idea.

Drop a -1 on this, not to prevent merge but just to make sure we either get to a root cause or commit this as a workaround with an issue to make sure we find it.

Sounds good. I'm not likely to keep working this issue; I'm clearing a backlog of bugs found when running all of the Zephyr tests using picolibc, and this was one of the last to get "resolved". I'd appreciate it if someone else could take over and drive this to conclusion.

@npitre
Copy link
Collaborator

npitre commented Jun 29, 2022

I didn't actually look at the assembly code -- what I did was check
the stack pointer while the data were being written into the reserved
area and discovered that it was above the temporary storage, and that
with the barriers in place, it was below.

The build produces a zephyr.lst file alongside the executable.
You could diff those zephyr.lst with and without the barrier.
That should be enough to pinpoint generated code differences.

This seems like a compiler bug to me.

Agreed. Documenting the issue with actual compiler output is desirable
nevertheless.

@keith-packard
Copy link
Collaborator Author

The build produces a zephyr.lst file alongside the executable. You could diff those zephyr.lst with and without the barrier. That should be enough to pinpoint generated code differences.

The differences are extensive and I wasn't able to actually identify the problem, even after separately compiling the relevant file with -S -fverbose-asm. Beats me what's up here; I'd encourage those with more expert knowledge to go digging.

@stephanosio stephanosio requested a review from carlocaione June 30, 2022 04:53
@carlocaione carlocaione self-assigned this Jun 30, 2022
@carlocaione
Copy link
Collaborator

I'll try to get a diff of the assembly code but in general I think we should address the red-zone problem anyway. Currently the entry/exit exception handling code is not considering the red-zone at all, probably clobbering it. So we should alternatively:

  1. Rework the handling code to not step over the red-zone
  2. Compiling zephyr with -mno-red-zone

@carlocaione
Copy link
Collaborator

@keith-packard which logging sample were you using exactly?

@keith-packard
Copy link
Collaborator Author

@keith-packard which logging sample were you using exactly?

It was samples/subsys/logging/syst/sample.logger.syst.deferred

This was reliably reproducing a failure when built using picolibc. I'm attempting to reproduce it with the current tree; I'll push a tree when I get something that breaks and then is fixed with the patch.

@npitre
Copy link
Collaborator

npitre commented Jun 30, 2022

-mno-red-zone is an x86-64 thing. The ARM ABI as far as I remember
always requires that the stack pointer be adjusted down before the
corresponding area is touched exa ctly to prevent interruptions from
stumping on it.

My suspicions are more about Z_LOG_MSG2_ON_STACK_ALLOC triggering some
corner case in the compiler.

Can't it be simplified as follows so to tell the compiler our actual
intention instead of relying on indirect tricks?

diff --git a/include/zephyr/logging/log_msg.h b/include/zephyr/logging/log_msg.h
index 0a4b092158..542478985c 100644
--- a/include/zephyr/logging/log_msg.h
+++ b/include/zephyr/logging/log_msg.h
@@ -143,10 +143,8 @@ enum z_log_msg_mode {
 
 #ifdef CONFIG_LOG_USE_VLA
 #define Z_LOG_MSG2_ON_STACK_ALLOC(ptr, len) \
-	long long _ll_buf[ceiling_fraction(len, sizeof(long long))]; \
-	long double _ld_buf[ceiling_fraction(len, sizeof(long double))]; \
-	ptr = (sizeof(long double) == Z_LOG_MSG2_ALIGNMENT) ? \
-			(struct log_msg *)_ld_buf : (struct log_msg *)_ll_buf; \
+	char __aligned(Z_LOG_MSG2_ALIGNMENT) __buf[len]; \
+	ptr = (struct log_msg *)__buf;\
 	if (IS_ENABLED(CONFIG_LOG_TEST_CLEAR_MESSAGE_SPACE)) { \
 		/* During test fill with 0's to simplify message comparison */ \
 		memset(ptr, 0, len); \

@keith-packard
Copy link
Collaborator Author

ok, I've reproduced this using my arm64-stack-smash branch -- this builds the whole system with picolibc, which also happens to disable the -ffreestanding compiler option; that may have an effect here?

@andyross
Copy link
Contributor

My understanding is that -ffreestanding suppresses optimizations based on known libc functions. I guess it's possible that that's unmasking an issue where e.g. some alloca()-adjacent call (alloca itself can't be implemented as a function, obviously) emitted here is being forced to go into a libgcc or libc implementation where it wouldn't earlier (or vice versa), and one has a bug? Seems a little far-fetched, but not implausible. That's the kind of thing that would be obvious from the generated assembly though; I'll pull and see if I can find something. Do we have any idea which expression including this macro is thought to be at fault?

Also: yeah, I tracked down the ARM64 SysV ABI docs at https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst and indeed there is no red zone. The stuff I was remembering was about the microsoft ARM ABI, which reserves 16 bytes apparently. (Doubly hilarious, because on x86 MS has no red zone and SysV is 128 bytes!) So the interrupt code is (probably/maybe) off the hook, I think.

@andyross
Copy link
Contributor

Ooph, Z_LOG_MSG2_STACK_CREATE (the only macro that expands the VLA code in question) gets expanded 24 different times in that sample. This may take a while.

@andyross
Copy link
Contributor

Yeah, I'm equally stumped. FWIW: I don't see any deltas at all when I apply your barrier patch to the branch you posted. The generated code is the same. And reading it (it's not so bad, all the logging is in a single log_msgs() function), I'm not seeing the stack pointer being modified at all except to push the frame at entry. It's very conventional.

Which I guess makes sense: this is the syst test, which means that there's no string formatting being done, and I'm guessing the VLAs are actually all being optimized out?

Is there a toolchain or platform interaction maybe? I'm using SDK 0.14.2 and building samples/subsys/logging/syst for qemu_cortex_a53.

@keith-packard
Copy link
Collaborator Author

I've pushed two branches -- arm64-stack-smash-broken and arm64-stack-smash-working -- the only difference is the addition of the two compiler_barrier() "calls". The broken branch reliably fails and the working branch reliably passes. The generated code is dramatically different, and in particular I've found at least one place where it defers moving sp until after data have been stored in those locations. This is a fragment of a diff of the .lst files with the addresses removed (or the diff is harder to understand). - is broken, + is working:

 	LOG_DBG("Debug message example, %f", 3.14159265359);
+    	9100027f 	mov	sp, x19
     	f0000040 	adrp	x0, 4000c000 <mmu_zephyr_ranges>
     	912bdc00 	add	x0, x0, #0xaf7
     	f9000e60 	str	x0, [x19, #24]
     	d0000040 	adrp	x0, 4000b000 <__rodata_region_start>
     	fd475c00 	ldr	d0, [x0, #3768]
     	d28000c0 	mov	x0, #0x6                   	// #6
     	f9000a60 	str	x0, [x19, #16]
-    	9100027f 	mov	sp, x19
     	aa1603e2 	mov	x2, x22
     	aa1403e0 	mov	x0, x20
     	d2800003 	mov	x3, #0x0                   	// #0
     	52862001 	mov	w1, #0x3100                	// #12544
     	fd001260 	str	d0, [x19, #32]
-    	97fffed6 	bl	40001534 <z_log_msg_static_create>
+    	97fffeda 	bl	40001534 <z_log_msg_static_create>

And this is the log message which gets corrupted by the interrupt. It seems like a very narrow window to be reliably hitting this on every run. But it's pretty clearly broken. I suspect the arm64 gcc backend just doesn't reliably keep sp below the used area of the stack all the time; we're having to force the issue by using a barrier.

I'm also using zephyr sdk version 0.14.2 and building for qemu_cortex_a53. Here's the command I use to run the test:

$ ./scripts/twister -p qemu_cortex_a53 --test samples/subsys/logging/syst/sample.logger.syst.deferred

Not having -ffreestanding in effect is pretty clearly a good plan here -- calls to memset followed by sprintk are replaced with a single call to strcpy. But, I'm not sure that is causing this particular issue?

@andyross
Copy link
Contributor

Ah, OK. I'd missed that you were running the deferred subtest config and had just been building the default test. And confirmed, I see the same failures you do. In twister. Interestingly if I build and run it from west it seems fine, I got 14 successes in a row after 4 consecutive failures (look like timeouts -- partial output in the log) in twister. That also tends to point at interrupt/timing.

I'm looking at the assembly here too, but still not quite seeing it. Your example is almost a smoking gun, but that snippet doesn't actually mutate SP. Can you back up a bit to see where the x19 value comes from? If it's something like "sub x19, [sp, #16]" then we have it.

But as I look at this, I'm seeing lots and lots of places where frankly the generated code is just bad and tends to repeatedly assign SP in noop contexts, even though the optimizer does like to move the assignments around.

I haven't seen something provable yet (e.g. that could be submitted to the gcc folks as a bug), but I believe it's there. Can we add a comment to the patch fingering a presumptive compiler bug and wrap the barrier in something like "ARM64_VLA_PROTECT()"? I'll take off my -1, clearly something is up.

@andyross
Copy link
Contributor

Actually I can't remember if []! is a pre- or post-decrement. If it's pre- then that first store is a bug too.

This ensures that the compiler will have moved the stack pointer below the
stack area where the message will be constructed. Otherwise, the message
can be smashed by an interrupt handler while it is being built.

This bug was found on qemu_cortex_a53 using SDK 0.14.2 with gcc 10.3.0 when
building the samples/subsys/logging/syst/sample.logger.syst.deferred test
under twister using picolibc:

Signed-off-by: Keith Packard <[email protected]>
@andyross
Copy link
Contributor

Also I need to formally apologize to the arm64 folks for fingering you first instead of the toolchain.

@keith-packard keith-packard dismissed stale reviews from andyross and nordicjm via 5e2dc52 June 30, 2022 22:31
@keith-packard
Copy link
Collaborator Author

Sorry, pushed a new version with an inline comment in the code and a better commit message.

@andyross
Copy link
Contributor

Upstream gcc bug submitted at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106153

@npitre
Copy link
Collaborator

npitre commented Jun 30, 2022

40001838:  f8010ec0   str  x0, [x22, #16]!      ;<-- auto-decrement x22 to be 16 bytes under the stack

Unless there is a missing minus sign that didn't reach me, the above is
equivalent to:

	x22 = x22 + 16;
	*(long *)x22 = x0;
	x22 = x22 + 16;

So nothing wrong to me.

@npitre
Copy link
Collaborator

npitre commented Jul 1, 2022

Errr, what I meant is:

	x22 = x22 + 16;
	*(long *)x22 = x0;

@stephanosio
Copy link
Member

stephanosio commented Jul 1, 2022

FYI, there is an early build for Zephyr SDK with GCC 12.1 here:
https://github.com/zephyrproject-rtos/sdk-ng/actions/runs/2583182283

It would be interesting to see how it behaves, and if the issue has already been fixed there.

p.s. you may need to use the topic-sdk15 branch for testing.

@andyross
Copy link
Contributor

andyross commented Jul 1, 2022

Yeah, I was backwards. Spend too long looking for something you know is there and you'll end up seeing it even if it's not. Will keep digging, something's gotta be in there.

@andyross
Copy link
Contributor

andyross commented Jul 1, 2022

Also I'm going to be flying tomorrow for a week of family vacation (and actually in Ecuador and the Galapagos and surely incommunicado for the two after that). So it may be a bit before I'll have significant time to spend on this again. But anyone interested: you can check the "main.i" preprocessed file in the gcc bug as an easy exerciser if you want to scan assembly. My current thought is writing a hacky "stack interpreter" to track pointers into stack memory vs. the SP register.

* stack pointer soon enough
*/

#define Z_LOG_ARM64_VLA_PROTECT() compiler_barrier()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since its only for arm64 what about:
#define Z_LOG_ARM64_VLA_PROTECT() IF_ENABLED(CONFIG_ARM64, (copiler_barrier()))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought #46986 was on arm32? Even if not, I still wonder if this is a problem in general GCC code or is specific to arm64?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to #46967 (comment) it's not related and still reproduces after adding the barrier. If we suspect that it's affecting more platforms than im ok with keeping it for all platforms but then maybe remove ARM64 from the name and only mention in the comment that it has been seen on arm64 and other platforms may be affected as well.

@carlocaione
Copy link
Collaborator

carlocaione commented Jul 1, 2022

@stephanosio @keith-packard

Using zephyr-sdk-0.14.2-55-g8cbcc6d the sample.logger.syst.deferred sample is failing to compile on the arm64-stack-smash-broken branch because:

In file included from /home/carlo/work/proj/zephyr/zephyrproject/modules/lib/picolibc/newlib/libc/stdlib/nano-malloc-memalign.c:34:
/home/carlo/work/proj/zephyr/zephyrproject/modules/lib/picolibc/newlib/libc/stdlib/nano-mallocr.c: In function 'memalign':
/home/carlo/work/proj/zephyr/zephyrproject/modules/lib/picolibc/newlib/libc/stdlib/nano-mallocr.c:761:36: error: array subscript 0 is outside array bounds of 'char[]' [-Werror=array-bounds]
  761 |         new_chunk_p->size = chunk_p->size - offset;
      |                                    ^~
/home/carlo/work/proj/zephyr/zephyrproject/modules/lib/picolibc/newlib/libc/stdlib/nano-mallocr.c:743:17: note: at offset -8 into object of size [0, 9223372036854775807] allocated by '__malloc_malloc'
  743 |     allocated = __malloc_malloc(size_with_padding);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function 'make_free_chunk',
    inlined from 'memalign' at /home/carlo/work/proj/zephyr/zephyrproject/modules/lib/picolibc/newlib/libc/stdlib/nano-mallocr.c:763:2:
/home/carlo/work/proj/zephyr/zephyrproject/modules/lib/picolibc/newlib/libc/stdlib/nano-mallocr.c:208:13: error: array subscript 0 is outside array bounds of 'char[]' [-Werror=array-bounds]
  208 |     c->size = size;
      |     ~~~~~~~~^~~~~~
/home/carlo/work/proj/zephyr/zephyrproject/modules/lib/picolibc/newlib/libc/stdlib/nano-mallocr.c: In function 'memalign':
/home/carlo/work/proj/zephyr/zephyrproject/modules/lib/picolibc/newlib/libc/stdlib/nano-mallocr.c:743:17: note: at offset -8 into object of size [0, 9223372036854775807] allocated by '__malloc_malloc'
  743 |     allocated = __malloc_malloc(size_with_padding);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

@carlocaione
Copy link
Collaborator

Thanks to @nashif we have at least solved a part of the mistery.

The fact that this can be easily reproduced with twister but not with west is that twister is simply blocking on a non-printable character.

The correct output of the test should be (at a certain point):

[00:00:00.120,000] <dbg> syst: mixed str dynamic str --- dynamic str --- another dynamic str --- another dynamic str
[00:00:00.120,000] <dbg> syst: mixed c/s ! static str dynamic str static str !
[00:00:00.120,000] <dbg> syst: Debug message example, 3.141593
[00:00:00.120,000] <err> syst: frame
                               03 d2 04 00 00 08 01 02  03 04 05 06 07 08       |........ ......  

but on the broken branch the output is:

[00:00:00.120,000] <dbg> syst: mixed str dynamic str --- dynamic str --- another dynamic str --- another dynamic str                                                                                                
[00:00:00.120,000] <dbg> syst: mixed c/s ! static str dynamic str static str !
[00:00:00.120,000] <dbg> syst: 
[00:00:00.120,000] <err> syst: frame
                               03 d2 04 00 00 08 01 02  03 04 05 06 07 08       |........ ......  

As you can see instead of Debug message example, 3.141593 a non-printable character is emitted. This character is breaking twister while the west output is unaffected by it (but you still have the issue there).

@stephanosio
Copy link
Member

stephanosio commented Jul 1, 2022

array subscript 0 is outside array bounds of 'char[]' [-Werror=array-bounds]

Yea, that is because Warray-bounds does some static analysis on the pointer/array index dereferences.

Note that chunk_p is assigned ptr_to_chunk(allocated) in L746, where allocated is a memory block allocated by malloc:

chunk_p = ptr_to_chunk(allocated);

This resolves to (chunk_t *)((char *)allocated - MALLOC_HEAD), which is below index 0 in terms of the "array index":

static inline chunk_t *
ptr_to_chunk(void * ptr)
{
    return (chunk_t *) ((char *) ptr - MALLOC_HEAD);
}

So dereferencing it in L761 is flagged by the static analysis:

new_chunk_p->size = chunk_p->size - offset;

@keith-packard since this code explicitly does an "unsafe access," disabling Warray-bounds (for this entire file or only the affected lines) would make sense. The "static analysis" feature was added in GCC 11 (see gcc-mirror/gcc@6450f07).

@andyross
Copy link
Contributor

andyross commented Jul 1, 2022

Nice catch on the twister interaction. Now I'm worried this is all going to be down to some kind of undefined behavior interaction and not technically a compiler bug at all. I was so sure we had one.

@keith-packard
Copy link
Collaborator Author

The fact that this can be easily reproduced with twister but not with west is that twister is simply blocking on a non-printable character.

Ah, sorry -- I should have reported more completely what the symptoms of failure were. The interrupt handler smashes the format string address so that when vfprintf does get called, the format string points into something which is definitely not a string.

@keith-packard
Copy link
Collaborator Author

/home/carlo/work/proj/zephyr/zephyrproject/modules/lib/picolibc/newlib/libc/stdlib/nano-mallocr.c:743:17: note: at offset -8 into object of size [0, 9223372036854775807] allocated by '__malloc_malloc'

Sounds like the compiler is getting even deeper into code analysis which leading it astray in this case. Building a C library is getting pretty hard these days :-). Which compiler version is generating this warning?

@stephanosio
Copy link
Member

Which compiler version is generating this warning?

@keith-packard #46967 (comment)

It was in GCC 12.1, but the same warning should be displayed in GCC 11.x as well.

@keith-packard
Copy link
Collaborator Author

Which compiler version is generating this warning?

@keith-packard #46967 (comment)

It was in GCC 12.1, but the same warning should be displayed in GCC 11.x as well.

I run this code through gcc 11.3 for x86 testing without issue. I'll give 12 a try.

@carlocaione
Copy link
Collaborator

carlocaione commented Jul 2, 2022

Not sure if this is a smoking gun but I found some of the assembly code in the non-working case weird and it is against what @npitre said before:

Normally, any allocation on the stack should see the stack pointer
being decremented before any access (possibly using another register)
to that area. If sp is decremented after the access then you'll have
the smoking gun.

back to the code:

40001810:   910003f3        mov     x19, sp

; ...
; a lot of code where x19 is never touched
; ...

    LOG_DBG("Debug message example, %f", 3.14159265359);
;
; at this point x19 is pointing to a location in memory past the SP:
;
; (gdb) p/x $sp
; $1 = 0x40060fe0
; (gdb) p/x $x19
; $2 = 0x40060fb0
;
400019a8:   f0000040        adrp    x0, 4000c000 <mmu_zephyr_ranges>
400019ac:   912bdc00        add     x0, x0, #0xaf7
;
; with the next instruction we are storing the value onto the stack but using
; x19 instead of SP. The storage location is at lower addresses than SP:
;
; (gdb) p/x $x19+24
; $3 = 0x40060fc8
;
400019b0:   f9000e60        str     x0, [x19, #24]

400019b4:   d0000040        adrp    x0, 4000b000 <__rodata_region_start>
400019b8:   fd475c00        ldr     d0, [x0, #3768]
400019bc:   d28000c0        mov     x0, #0x6
;
; same as before. The next instruction is saving onto the stack using x19.
; SP is still behind.
;
400019c0:   f9000a60        str     x0, [x19, #16]
;
; we finally move SP to match X19
;
400019c4:   9100027f        mov     sp, x19
...

In the working case the store instruction using the x19 are never going to touch addresses past the SP, because SP and x19 are set to the same address at the very beginning.

@npitre
Copy link
Collaborator

npitre commented Jul 2, 2022

That's a good find indeed.

However could you quote where sp and x19 start to diverge?

Now you have:

40001810:   910003f3        mov     x19, sp

; ...
; a lot of code where x19 is never touched
; ...

    LOG_DBG("Debug message example, %f", 3.14159265359);
;
; at this point x19 is pointing to a location in memory past the SP:
;
; (gdb) p/x $sp
; $1 = 0x40060fe0
; (gdb) p/x $x19
; $2 = 0x40060fb0

Where is sp modified to not be equal to x19 anymore?

Copy link
Collaborator

@carlocaione carlocaione left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have time to deep dive into this ATM but the fix is actually working I'm fine with merging this as is.

@carlescufi carlescufi merged commit 4436fb1 into zephyrproject-rtos:main Jul 13, 2022
@keith-packard keith-packard deleted the arm64-stack-logging branch July 14, 2022 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants