Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [IIR] IIR not working with SOF compiled with XCC RJ2024.3 #9350

Open
iuliana-prodan opened this issue Aug 7, 2024 · 9 comments
Open
Labels
bug Something isn't working as expected

Comments

@iuliana-prodan
Copy link
Contributor

iuliana-prodan commented Aug 7, 2024

Describe the bug
Compiled SOF successfully with XCC toolchain RJ2024.3 from Cadence.
When trying to run an IIR test I get a load/store error on DSP:

*** Booting Zephyr OS build v3.7.0-rc2-451-g740d7f735e23 ***
[00:00:00.000,000] <inf> main: SOF on imx8mp_evk
[00:00:00.000,000] <inf> main: SOF initialized
[00:00:06.788,000] <err> os:  ** FATAL EXCEPTION
[00:00:06.788,000] <err> os:  ** CPU 0 EXCCAUSE 3 (load/store error)
[00:00:06.788,000] <err> os:  **  PC 0x92455be4 VADDR 0x92c0c1fc
[00:00:06.788,000] <err> os:  **  PS 0x60720
[00:00:06.788,000] <err> os:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:7 CALLINC:2)
[00:00:06.788,000] <err> os:  **  A0 0x924467d4  SP 0x92461b30  A2 0x92c0c200  A3 0x92c0d274
[00:00:06.788,000] <err> os:  **  A4 (nil)  A5 0x92c0c1f0  A6 0x92c0c080  A7 0x2
[00:00:06.788,000] <err> os:  **  A8 (nil)  A9 0x1 A10 (nil) A11 0x92c0c020
[00:00:06.788,000] <err> os:  ** A12 0x9244668c A13 0x92c0c040 A14 (nil) A15 0x92c0c080
[00:00:06.788,000] <err> os:  ** LBEG 0x92455ba0 LEND 0x92455c06 LCOUNT (nil)
[00:00:06.788,000] <err> os:  ** SAR 0x1d
[00:00:06.788,000] <err> os:  **  THREADPTR 0x2
[00:00:06.788,000] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:00:06.788,000] <err> os: Current thread: 0x92c06f10 (unknown)
[00:00:06.886,000] <err> zephyr: Halting system

In Linux I get an Input/output error:

root@imx8mpevk:~# aplay -Dhw:0,0 -f S16_LE -c 2 -r 48000 10s_sample.wav &
[1] 784
root@imx8mpevk:~# sleep 2
Playing WAVE '10s_sample.wav' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo

root@imx8mpevk:~# /unit_tests/sof/tools/ctl/sof-ctl -Dhw:0 -n 46 -s /unit_tests/sof/tools/ctl/eq_iir_bandpass.txt
Control size is 1024.
Applying configuration "/unit_tests/sof/tools/ctl/eq_iir_bandpass.txt" into device hw:0 control numid=46.
4607827,0,116,50331648,0,0,0,0,116,2,1,0,0,0,0,0,0,2,2,0,0,0,0,3316150158,2048164275,513807534,3267352229,513807534,0,16384,3867454526,1191025347,38870735,77741469,38870735,4294967292,24197
Success.
hdr: magic 0x00464f53
hdr: type 0
hdr: size 116 bytes
hdr: abi 3:0:0
4607827,0,116,50331648,0,0,0,0,116,2,1,0,0,0,0,0,0,2,2,0,0,0,0,3316150158,2048164275,513807534,3267352229,513807534,0,16384,3867454526,1191025347,38870735,77741469,38870735,4294967292,24197
root@imx8mpevk:~# 
root@imx8mpevk:~# aplay: pcm_write:2178: write error: Input/output error

[1]+  Done(1)                 aplay -Dhw:0,0 -f S16_LE -c 2 -r 48000 10s_sample.wav
root@imx8mpevk:~#

To Reproduce
Use an IIR topology and run the following commands:

$ aplay -Dhw:0,0 -f S16_LE -c 2 -r 48000 10s_sample.wav &
$ sleep 2
$ sof-ctl -Dhw:0 -n 46 -s eq_iir_loudness.txt

Reproduction Rate
With latest SOF - main branch, with XCC RJ2024.3, the error is met 100%.
This was reproduced on all i.MX platforms: i.MX8MP, i.MX8QM, i.MX8QXP.

With XCC RI2023.11 everything works fine.

Expected behavior
No error should appear.

Impact
Not working as expected.
Showstopper for IIR, if updating to latest XCC toolchain.

Environment

  1. Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).
    • Kernel: Linux version 6.6.36
    • SOF: SOF/main
  2. Name of the topology file
    • Topology: sof-imx8mp-eq-iir-wm8960.tplg (or any IIR tplg for i.MX)
  3. Name of the platform(s) on which the bug is observed.
    • Platform: i.MX8MP, i.MX8QM, i.MX8QXP.

Screenshots or console output

[00:00:02.074,000] <err> os:  ** FATAL EXCEPTION
[00:00:02.074,000] <err> os:  ** CPU 0 EXCCAUSE 3 (load/store error)
[00:00:02.074,000] <err> os:  **  PC 0x92455be4 VADDR 0x92c0c1fc
[00:00:02.074,000] <err> os:  **  PS 0x60720
[00:00:02.074,000] <err> os:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:7 CALLINC:2)
[00:00:02.074,000] <err> os:  **  A0 0x924467d4  SP 0x92461b30  A2 0x92c0c200  A3 0x92c0d274
[00:00:02.074,000] <err> os:  **  A4 (nil)  A5 0x92c0c1f0  A6 0x92c0c080  A7 0x2
[00:00:02.074,000] <err> os:  **  A8 (nil)  A9 0x1 A10 (nil) A11 0x92c0c020
[00:00:02.074,000] <err> os:  ** A12 0x9244668c A13 0x92c0c040 A14 (nil) A15 0x92c0c080
[00:00:02.074,000] <err> os:  ** LBEG 0x92455ba0 LEND 0x92455c06 LCOUNT (nil)
[00:00:02.074,000] <err> os:  ** SAR 0x1d
[00:00:02.074,000] <err> os:  **  THREADPTR 0x2
[00:00:02.074,000] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:00:02.074,000] <err> os: Current thread: 0x92c06f10 (unknown)
[00:00:02.173,000] <err> zephyr: Halting system

I've attached here the disassembly for OK XCC 2023 SOF and NOT OK XCC 2024 SOF.
zephyr_2023_OK.dst.txt
zephyr_2024_NOK.dst.txt

@iuliana-prodan iuliana-prodan added the bug Something isn't working as expected label Aug 7, 2024
@iuliana-prodan
Copy link
Contributor Author

From the crash and after analyzing the disassembly I've realized the error is here:

			acc = AE_SLAI64S(acc, 1); /* Convert to Q17.47 */
92455be4:	297582ac4402b11345f23f 	{ ae_s32.l.i	aed3, a2, -4; l32i	a4, a5, 12; nop; ae_slai64s	aed1, aed1, 1 }

By comparing the two disassembles from RI2023 (which is ok) and RJ2024 (which is NOK):
image
we can see that for RJ2024, we have a load instruction on slot1, from the four operations in parallel.
By looking in the HiFi4 User guide:
image this seems to be correct.
Also, another difference is the addi and l32i instructions - these are in both cases but in different sets of 4 operations.

At first, I checked that all the variables are aligned based on the instructions restrictions and they seem to be (if we look at the VADDR or A2, A3, etc from crash dump).

I also wrote a simple function that does a store and load (trying to reproduce the order of instructions from where the crash happens), with the exact addresses:

diff --git a/src/math/iir_df1_hifi3.c b/src/math/iir_df1_hifi3.c
index d1aa05f55..40edcbee7 100644
--- a/src/math/iir_df1_hifi3.c
+++ b/src/math/iir_df1_hifi3.c
@@ -46,6 +46,22 @@
 
 /* 32 bit data, 32 bit coefficients and 32 bit state variables */
 
+uint32_t first_iteration(void)
+{
+       uint32_t *addr_r = (uint32_t *)0x92c0c220;
+       uint32_t value;
+
+       ae_int32 *addr_w = (ae_int32 *)0x92c0c200;
+       ae_int32x2 tmp = AE_ZERO32();
+
+       AE_S32_L_I (tmp, addr_w, -4);
+
+       value = *addr_r;
+
+       return value;
+}
+int cnt = 0x1;
+
 int32_t iir_df1(struct iir_state_df1 *iir, int32_t x)
 {
        ae_int64 acc;
@@ -67,6 +83,11 @@ int32_t iir_df1(struct iir_state_df1 *iir, int32_t x)
        int j;
        int nseries = iir->biquads_in_series;
 
+       if (cnt == 0x1) {
+               first_iteration();
+       }
+       cnt++;
+
        /* Bypass is set with number of biquads set to zero. */
        if (!iir->biquads)
                return x;

With this, the disassembly looks similar to the crash - meaning is doing a store (ae_s32.l.i), next a load (l32i.n), but it passes:

92446e24 <first_iteration>:
first_iteration():
/opt/samba/nxa06898/xcc/sof/src/math/iir_df1_hifi3.c:50
{
92446e24:	004136                      	entry	a1, 32
/opt/samba/nxa06898/xcc/sof/src/math/iir_df1_hifi3.c:55
	ae_int32x2 tmp = AE_ZERO32();
92446e27:	fc48c400672e              	{ l32r	a2, 92429fc4 (92c0c200 <heapmem+0x7200>); ae_movi	aed0, 0 }
/opt/samba/nxa06898/xcc/sof/src/math/iir_df1_hifi3.c:57
	AE_S32_L_I (tmp, addr_w, -4);
92446e2d:	e002f4                      	ae_s32.l.i	aed0, a2, -4
/opt/samba/nxa06898/xcc/sof/src/math/iir_df1_hifi3.c:59
	value = *addr_r;
92446e30:	8228                          	l32i.n	a2, a2, 32
/opt/samba/nxa06898/xcc/sof/src/math/iir_df1_hifi3.c:61
	return value;
92446e32:	f01d                          	retw.n

@iuliana-prodan
Copy link
Contributor Author

@singalsu @andyross do you have any suggestions, comments, ideas?
What else can I try?
Thanks!

@lgirdwood
Copy link
Member

@singalsu @andyross do you have any suggestions, comments, ideas? What else can I try? Thanks!

@iuliana-prodan I would raise a ticket with Cadence here as code works with old compiler and not new compiler. This could mean we need a #ifdef or pragma or indeed if our code is wrong here.

@singalsu
Copy link
Collaborator

Yep, good debugging work @iuliana-prodan ! Help from Cadence would be really useful with this.

@lgirdwood
Copy link
Member

@iuliana-prodan short term you could put an ifdef that does a less optimal flow here and enable the iffdef via Kconfig as a compiler Work Around ?

@iuliana-prodan
Copy link
Contributor Author

@iuliana-prodan short term you could put an ifdef that does a less optimal flow here and enable the iffdef via Kconfig as a compiler Work Around ?

I've raised a ticket to cadence "Case# 46820991: Sound Open Firmware code not working with latest XCC toolchain RJ2024.3" but no useful answer.

TBH I have no idea how to less optimize this code.

The problem could be reading from the input buffers here: https://github.com/thesofproject/sof/blob/main/src/math/iir_df1_hifi3.c#L90-L95.
All the intrinsics require aligned addresses.

Same here https://github.com/thesofproject/sof/blob/main/src/math/iir_df1_hifi3.c#L106-L110.

@lgirdwood
Copy link
Member

@singalsu looks like we have some gaps in the APIs for aligning MIN(copy_size). @iuliana-prodan these APIs are meant to adapt to HiFi flavor, but this sounds like a bug if you see unaligned access.

Can you try something like

diff --git a/src/math/iir_df1_hifi3.c b/src/math/iir_df1_hifi3.c
index 39be8e3a6..ba97605be 100644
--- a/src/math/iir_df1_hifi3.c
+++ b/src/math/iir_df1_hifi3.c
@@ -70,8 +70,8 @@ int32_t iir_df1(struct iir_state_df1 *iir, int32_t x)
                return x;
 
        /* Coefficients order in coef[] is {a2, a1, b2, b1, b0, shift, gain} */
-       coefp = (ae_int32x2 *)&iir->coef[0];
-       delayp = (ae_int32x2 *)&iir->delay[0];
+       coefp = __builtin_assume_aligned((ae_int32x2 *)&iir->coef[0], THE_HIFI_VERSION_DATASIZE);
+       delayp = __builtin_assume_aligned((ae_int32x2 *)&iir->delay[0], THE_HIFI_VERSION_DATASIZE);
        for (j = 0; j < iir->biquads; j += nseries) {
                /* the first for loop is for parallel EQs, and they have the same input */
                in = x;
'''

@iuliana-prodan
Copy link
Contributor Author

iuliana-prodan commented Aug 28, 2024

@singalsu looks like we have some gaps in the APIs for aligning MIN(copy_size). @iuliana-prodan these APIs are meant to adapt to HiFi flavor, but this sounds like a bug if you see unaligned access.

Can you try something like

diff --git a/src/math/iir_df1_hifi3.c b/src/math/iir_df1_hifi3.c
index 39be8e3a6..ba97605be 100644
--- a/src/math/iir_df1_hifi3.c
+++ b/src/math/iir_df1_hifi3.c
@@ -70,8 +70,8 @@ int32_t iir_df1(struct iir_state_df1 *iir, int32_t x)
                return x;
 
        /* Coefficients order in coef[] is {a2, a1, b2, b1, b0, shift, gain} */
-       coefp = (ae_int32x2 *)&iir->coef[0];
-       delayp = (ae_int32x2 *)&iir->delay[0];
+       coefp = __builtin_assume_aligned((ae_int32x2 *)&iir->coef[0], THE_HIFI_VERSION_DATASIZE);
+       delayp = __builtin_assume_aligned((ae_int32x2 *)&iir->delay[0], THE_HIFI_VERSION_DATASIZE);
        for (j = 0; j < iir->biquads; j += nseries) {
                /* the first for loop is for parallel EQs, and they have the same input */
                in = x;
'''

@lgirdwood sorry for the late reply, I was OOO.

I've tried your suggestion and it still doesn't work.
I get the same error, at the exact address:

[00:00:02.076,000] <err> os:  ** FATAL EXCEPTION
[00:00:02.076,000] <err> os:  ** CPU 0 EXCCAUSE 3 (load/store error)
[00:00:02.076,000] <err> os:  **  PC 0x92455be4 VADDR 0x92c0c1fc
[00:00:02.076,000] <err> os:  **  PS 0x60720
[00:00:02.076,000] <err> os:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:7 CALLINC:2)
[00:00:02.076,000] <err> os:  **  A0 0x924467d4  SP 0x92461b30  A2 0x92c0c200  A3 0x92c0d274
[00:00:02.076,000] <err> os:  **  A4 (nil)  A5 0x92c0c1f0  A6 0x92c0c080  A7 0x2
[00:00:02.076,000] <err> os:  **  A8 (nil)  A9 0x1 A10 (nil) A11 0x92c0c020
[00:00:02.076,000] <err> os:  ** A12 0x9244668c A13 0x92c0c040 A14 (nil) A15 0x92c0c080
[00:00:02.076,000] <err> os:  ** LBEG 0x92455ba0 LEND 0x92455c06 LCOUNT (nil)
[00:00:02.076,000] <err> os:  ** SAR 0x1d
[00:00:02.076,000] <err> os:  **  THREADPTR 0x2
[00:00:02.076,000] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:00:02.076,000] <err> os: Current thread: 0x92c06f10 (unknown)
[00:00:02.175,000] <err> zephyr: Halting system

The generated disassembly code is also the same.

@iuliana-prodan iuliana-prodan changed the title [BUG] [IIR] IIR not working wiht SOC compiled with XCC RJ2024.3 [BUG] [IIR] IIR not working with SOC compiled with XCC RJ2024.3 Aug 28, 2024
@iuliana-prodan iuliana-prodan changed the title [BUG] [IIR] IIR not working with SOC compiled with XCC RJ2024.3 [BUG] [IIR] IIR not working with SOF compiled with XCC RJ2024.3 Aug 28, 2024
@lgirdwood
Copy link
Member

The generated disassembly code is also the same.

ok - I would expect this compiler directive to be honored, definitely worth getting input from cadence on their compiler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected
Projects
None yet
Development

No branches or pull requests

3 participants