-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed "assert(buffer->is_shared)" in comp_buffer_connect() #9343
Comments
If this happens due to some invalid IPC sequence then asserts can be disabled in production and they are disabled in production right now, example: #9308 (comment) Asserts are only a debugging aid for "impossible"/buggy conditions. Invalid IPC sequences are not considered "impossible": IPC fuzzing would be entirely pointless if they were. |
Question: does IPC3 allow binding components located on separate cores? As I see in the code - no. If its true (cross core bind is not allowed) - that means the assert is at right place and protection against the illegal situation should be in if its false (cross core bind is allowed) - assert should be replaced with an error exit code in both situations - a fix is needed |
Its not allowed and the assert is ok as there are no IPC3 multicore users today. If someone does add multicore for IPC3 in the future they can address. |
@marcinszkudlinski could you please add an error code in |
well, it is a bit more complicated. As I see the check is already in place:
its checked for incoherent arch only, but the assert is there in every case @marc-hb @lgirdwood Looks like fuzzing is performed with CONFIG_INCOHERENT = 0 What coherent architectures we support? Is fuzzing also performed with CONFIG_INCOHERENT = 1, especially for IPC4? The quick fix would be to put the assert in question under conditional compiling, but there may be more places where checks or even "is_shared" flags are not necessary, checking in progress |
The mystery deepens: I tried to reproduce but I couldn't.
But SOF never even reaches int ipc_comp_connect(struct ipc *ipc, ipc_pipe_comp_connect *_connect)
{
struct sof_ipc_pipe_comp_connect *connect = ipc_from_pipe_connect(_connect);
struct ipc_comp_dev *icd_source;
struct ipc_comp_dev *icd_sink;
/* check whether the components already exist */
icd_source = ipc_get_comp_dev(ipc, COMP_TYPE_ANY, connect->source_id);
if (!icd_source) {
tr_err(&ipc_tr, "ipc_comp_connect(): source component does not exist, source_id = %u sink_id = %u",
connect->source_id, connect->sink_id);
return -EINVAL;
}
What did I miss? I have a newer Clang version (18), could that |
Right, but the less proximate problem here is that the cross-core bind is accessible via the IPC3 protocol even if it's "not allowed" as a matter of policy and device configuration, and the case isn't being handled as a protocol error, only as a an assertion failure downward in the stack when it hits an unexpected situation. As far as determinism: I've had mixed success with the reproducer files generated by the fuzzer too. It seems like too many things can affect the flow. Presumably what's happened is that the previously-fuzzed IPC commands left SOF in a weird state, but that may not be reachable in a reliable way from a developer system. Generally it's OK just to look at a failure and understand how we "could" have reached it and then fix via analysis instead. If we're really sure that this "shouldn't be able to happen" then I guess we have work to do. |
Thanks! This means the
Thanks, I took a quick look at this and I'm afraid we have some serious Kconfig problems here (what's new?).
So, in all of these cases It would help if Kconfig+cpp could make a difference between "undefined" and zero but I'm afraid that ship has sailed a long time ago; sorry for the digression. Long story short I think the very first step should be "top-down": first making sense of and clarifying the |
To be clear: some of these files have worked for me in the past. When they do, they save an ENORMOUS amount of testing time. So don't just cast them aside. Just disappointing to learn they don't always work. |
I think the best solution would be:
for incoherent archs the buffer must be allocated as shared/not shared, for coherent it may be set as shared at any moment |
#9356 please comment |
Even if 9356 is perfect, it does not change anything to the apparently messy Kconfig situation... @andyross could you help there? Should CONFIG_INCOHERENT really be gated by CONFIG_XTENSA? Can we align |
Likely yes? The latter is a Zephyr kconfig indicating the architecture, which always has an incoherent L1 cache. The former is a SOF tunable I'm less familiar with. From the perspective of software[1], the incoherence is irrelevant when there's only one CPU. So my guess is that SOF allows this to be =n on single CPU builds? The equivalent (but logically converse) Zephyr abstraction is something called CONFIG_KERNEL_COHERENCE, which forces mutable .data/.bss access to be uncached and does some trickery to allow stacks to be cached, while leaving everything else (.rodata and instruction literals in .text) cached. But regardless, the only architecture in this whole problem space with a truly incoherent cache is Xtensa, and that's very unlikely to change. [1] And software only! Things like host-shared memory and DMA buffers still care about coherence even with a single CPU, but that's generally handled at the driver layer and not global firmware behavior, AFAIK. |
So does this mean the "INCOHERENT" code cannot be fuzzed? (This is why I asked in the first place) |
@marc-hb even if you enable the flag for a cohernt arch, it won't change much (well, except some asserts like the above) - but the main problem, the infamous cache incoherency, won't be tested with fuzzing. we can think of some backdoors - fuzzing is not any of productions' compilations anyway, but better would be to run fuzzing using an emulator. |
AFAIK fuzzing is single threaded so I was never expecting fuzzing to catch memory coherency issues, that's not the point I was trying to make. My main point is: we should always minimize
Maybe it's not very "ambitious" but that's more like my main point.
That does not sound mutually exclusive. You can never have too much test coverage. You can only have limited validation time and resources and bad validation priorities. |
You're right
Well, I believe a backdoor than is not a bad idea, |
@marc-hb @marcinszkudlinski I believe this can be closed now with #9356 merged ? |
I just filed: Fuzzing is just the messenger. |
This is failing when fuzzing, see below.
This code came with commit 4a03699, @marcinszkudlinski can you please comment?
https://github.com/thesofproject/sof/actions/runs/10144692547/job/28048711573?pr=9338
Originally posted by @tmleman in #9338 (comment)
I have another PR with the same update and there was no failure there. Locally, I initially couldn't hit it either. That's why I did a rerun of fuzzing for this PR.
After some longer runs, I now have the same failure and I must admit that I don't quite understand the point of this assert. The assumption seems to be that when creating components we know they will be on separate cores and that they will be connected. That's why we set the buffer as shared, but I don't know how to verify this at an earlier stage so that here we can be sure that the buffer is shared.
I will also add that the same case reproduces for me on the current main.
The text was updated successfully, but these errors were encountered: