-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM: FPU: using Unshared FP Services mode can still result in corrupted floating point registers #29590
Comments
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time. |
Thanks @mniestroj
could you share some code example that can reproduce this issue? |
Well, this comes at some cost, however, so I'd better not apply this if there's a cleaner solution. |
#31472 is a good example how to reproduce that issue with upstream code. I've added
to
When I have done initial investigation it turned out that compiler (here GCC) should prevent using FPU registers unless there are FPU operations involved (e.g. float multiplication). GCC allows to prevent all FPU registers from being used with |
I've read the same thing here: https://gcc.gnu.org/pipermail/gcc-help/2020-July/139112.html, which aligns with your investigation. The problem is that we have made assumptions based on the K_FP_REGS thread-create flag, that is, the thread won't use the FP registers so the stack frame will never need to contain the FP registers, but if this is not the case, we might have to do more things than just enable FPU_SHARING together with FPU. |
I agree, all threads need to be treated like K_FP_REGS would be set for them. As a short term solution that flag could be set automatically in |
So in your understanding, this is just a subset of GCC versions that do this, or all GCC compilers (beyond some release) behave like this consistently? |
I would suggest to treat all GCC versions the same and assume that it might produce code that utilizes FP registers. This is because I haven't found any option for configuring such behavior in GCC codebase and I haven't found any statement in GCC's documentation about one behavior or the other. In summary it is kind of undefined whether FP (or any other than general purpose) registers are used. Issue described here was not reproducible with I was not able to reproduce this Issue with GCC 8, but it was reproducible with GCC 9 and 10. My suggestion with <compiler, arch> pair was to treat other compilers (like clang) differently from GCC if needed. |
@mniestroj thanks for your input here
this comes at a high cost, though. Does GCC behave as described only under a certain -O configuration? If so then at least we can limit the problematic cases to a subset of optimization configurations. |
Unfortunately. As I see for ARM K_FP_FLAGS "only" affects stack size/usage, but situation is worse for other archs. I am afraid we cannot do much without explicit compiler support for "do not use FP registers for non-FPU math operations".
|
Warning: it did not use FP register, but doesn't it means it will never do? |
I agree, we cannot trust that, it might change in new GCC versions or even when compiling slightly different code. Besides, optimizing FP usage in |
So, I am still thinking around this issue. I am involving @tejlmand (Toolchain responsible and build-system maintainer) to get his view. What I can tell about the workaround is:
|
There is already the following code in function prepare_multithreading() in kernel/init.c:
but I am not zephyr thread expert so I don't know whether it is enough ? Otherwise I am inline with your statement.
|
Thanks @ABOSTM , I am not worried on how to implement the workaround, if we decide, finally, to do it. Another option would be to consider the solution presented in https://gcc.gnu.org/pipermail/gcc-help/2020-July/139112.html, that is, to apply the gen-registers-only GCC directive on code that is not supposed to use the FP Registers. This falls on the user to do it, throughout the code base, though, so I am reluctant... |
It seems the only reasonable solution at this time is to disallow |
@stephanosio thanks so much for your input. Just to clarify, do you think this affects AARC32 other than the -M profile? |
@ioannisg As far as I can see, this applies to all ARM architecture variants. It is, however, not an immediate problem for the non-M profile variants at the moment because the relevant Zephyr arch ports do not support hardware FPU (and therefore do not allow enabling FP instructions). |
@stephanosio thanks. Have you been able to dig deep into this? I wonder if GCC allows only callee-saved FP registers to be accessed by non-floating point calculations (i.e. s16 and above), or is it caller-saved too? |
@katsuster could you check if this issue is relevant for Risc-V? For Cortex-M we have considered this to be a serious issue. |
@dcpleung @abrodkin FYI - might worth checking if this is affecting x86 and ARC, respectively. |
@ioannisg It seems that RISC-V GCC does not use FPU implicitly. Current GCC does not support SIMD of RISC-V instructions (packed-simd extension is not stable now). Of course, I'm not sure about future events:
|
@ioannisg x86 does not seem to be affected by this. FPU/SSE have to be explicitly enabled before those registers can be used on 32-bit. On 64-bit, FPU/SSE are always available so we are saving those registers anyway. |
Describe the bug
Unshared FP Services mode is supposed to work properly when a single thread uses the FPU. However, it looks like GCC (version 9 and above) generates code that temporarily stores integer data to FPU registers, even if no FPU math operations are carried out within a function. The values in FPU registers can be overwritten by other threads, then such modified values are loaded into original function and cause program corruption. This happens because under unshared FP registers mode, the Cortex-M (and perhaps other architectures) do not save/restore the FP context during thread context-switch or even during exception entry and return.
A straightforward workaround to this problem is to force Shared FP Registers mode for Cortex-M whenever CONFIG_FPU is enabled. This could be done with the following work-around:
CONFIG_FPU_SHARING=y
. This will be enabled FP context preservation.To Reproduce
An out of tree closed application code was used on nRF52840 with
CONFIG_FPU=y
andCONFIG_FPU_SHARING=n
. Issue reproduced with some toolchains only, look at "Environment" below.Additionally, the issue is reproduced by in-tree code, see #31472
Expected behavior
Using Unshared FP Services mode should be either safe to use (assuming only one thread operates on floats) or disallowed (not being possible to select by Kconfig) if not supported.
(As described above, this is not sufficient: we need to ensure FP Sharing mode is also working as expected.)
Impact
Using Unshared FP Services mode with ARM Cortex-M4F (and possibly others) results in undefined behavior.
Code listing
Here is the function that stores temporarily into FPU register (
vmov s16, r3
) and then loads from it later (vmov r1, s16
).Environment
Additional context
Similar issue was reported here: https://answers.launchpad.net/gcc-arm-embedded/+question/691604
The text was updated successfully, but these errors were encountered: