-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memcpy optimizations not working with Fortify after clang d437fba8ef626b6 #979
Comments
@gburgessiv alluded to making some changes on the kernel side. What would those changes be or look like? |
To clarify, a lot of the checks there are due to security-oriented compiler flags seeing a function call. A more direct comparison is available here: https://godbolt.org/z/fqAhUC . Specifically, none of the conditions in the asm in comment #1 are from the inlined memcpy.
In general terms, where possible, we should lean on "well-known" FORTIFY In this case, it means we need to tweak the lines of the above godbolt link like so: https://godbolt.org/z/h_2N_X (we'd swap fully to the These
Like said, both GCC and Clang know about these The kernel's FORTIFY impl seems small. Lemme see if I can throw together a rough version of what I'd imagine a patch to look like. |
Uploaded the patch in PR form so commenting and such is easier. :) |
So, I think something else might also be going on. AFAICT, the kernel's fortify functions are entirely invisible to Clang due to the "gnu_inline" attribute: If you replace the original memcpy declaration, the call to __write_overflow vanishes from the Clang output and the result is unchecked at runtime. :( :( |
Well, or if I change it from "extern inline" to "static inline", then it works again too... |
Right, clang will refuse to emit bodies for externally-available functions that trivially recurse, since the concept of an available-externally is "this function is implemented elsewhere, but here's its definition if it helps you inline." http://llvm.org/PR9614 is related, as is some of the CGM logic around I see your point, though. The more I look at this, this appears to be an artifact of the LLVM patch: clang will see that the kernel wants to provide a gnu-inline version of memcpy, so it notes that Looking at https://reviews.llvm.org/D71082, if I add So a minimal fix to get our performance back would be to set For those following along at home, the reason my patch worked is because |
Posted https://reviews.llvm.org/D78162 in hopes that we can fix this specific test-case. It may/may not help all of the kernel's FORTIFY builtins, but I think it should unregress functions which are "directly" recursive through Note that this is only intended to bring us back into our previous state (e.g., still no |
There are some inline builtin definitions that we can't emit (isTriviallyRecursive & callers go into why). Marking these nobuiltin is only useful if we actually emit the body, so don't mark these as such unless we _do_ plan on emitting that. This suboptimality was encountered in Linux (see some discussion on D71082, and ClangBuiltLinux/linux#979). Differential Revision: https://reviews.llvm.org/D78162
Okay, that cross-project mentioning support is cool :) The above should fix this immediate bug. Happy to keep working with Kees et al on figuring out a good FORTIFY story for clang if that's interesting. |
@gburgessiv it looks like clang is crashing after your patch. Linaro's CI tripped on an assertion:
extern inline __attribute__((gnu_inline)) void *memset() {}
typeof(memset) memset;
*a = memset; |
Thanks for the reduced test-case! I think I have a fix locally. It's pretty simple; I'll push it in if the tests pass :) |
Please add me to the review if possible (nathanchance), I saw one other crash in |
Since it's a breakage on ToT with a (IMO) pretty trivial fix, I was planning on doing post-commit review. The diff to non-tests is:
Would you like to test that before I land it? |
Yes, I will do it now. |
@gburgessiv that fixes both crashes that I noticed. Hopefully there are not any more, I won't know until I do a full kernel build run tonight. |
Awesome -- I'll land it then and see what happens. Thanks for the extra checking! :) |
In cases where we have multiple decls of an inline builtin, we may need to go hunting for the one with a definition when setting function attributes. An additional test-case was provided on ClangBuiltLinux/linux#979
Since i hear no yelling, closing per #1002 . Thanks! |
should |
Yes please! |
There are some inline builtin definitions that we can't emit (isTriviallyRecursive & callers go into why). Marking these nobuiltin is only useful if we actually emit the body, so don't mark these as such unless we _do_ plan on emitting that. This suboptimality was encountered in Linux (see some discussion on D71082, and ClangBuiltLinux/linux#979). Differential Revision: https://reviews.llvm.org/D78162 (cherry picked from commit 2dd17ff)
In cases where we have multiple decls of an inline builtin, we may need to go hunting for the one with a definition when setting function attributes. An additional test-case was provided on ClangBuiltLinux/linux#979 (cherry picked from commit 9490808)
There are some inline builtin definitions that we can't emit (isTriviallyRecursive & callers go into why). Marking these nobuiltin is only useful if we actually emit the body, so don't mark these as such unless we _do_ plan on emitting that. This suboptimality was encountered in Linux (see some discussion on D71082, and ClangBuiltLinux/linux#979). Differential Revision: https://reviews.llvm.org/D78162 (cherry picked from commit 2dd17ff)
In cases where we have multiple decls of an inline builtin, we may need to go hunting for the one with a definition when setting function attributes. An additional test-case was provided on ClangBuiltLinux/linux#979 (cherry picked from commit 9490808)
There are some inline builtin definitions that we can't emit (isTriviallyRecursive & callers go into why). Marking these nobuiltin is only useful if we actually emit the body, so don't mark these as such unless we _do_ plan on emitting that. This suboptimality was encountered in Linux (see some discussion on D71082, and ClangBuiltLinux/linux#979). Differential Revision: https://reviews.llvm.org/D78162
In cases where we have multiple decls of an inline builtin, we may need to go hunting for the one with a definition when setting function attributes. An additional test-case was provided on ClangBuiltLinux/linux#979
There are some inline builtin definitions that we can't emit (isTriviallyRecursive & callers go into why). Marking these nobuiltin is only useful if we actually emit the body, so don't mark these as such unless we _do_ plan on emitting that. This suboptimality was encountered in Linux (see some discussion on D71082, and ClangBuiltLinux/linux#979). Differential Revision: https://reviews.llvm.org/D78162 (cherry picked from commit 19fc98a)
In cases where we have multiple decls of an inline builtin, we may need to go hunting for the one with a definition when setting function attributes. An additional test-case was provided on ClangBuiltLinux/linux#979 (cherry picked from commit 505dbc0)
There are some inline builtin definitions that we can't emit (isTriviallyRecursive & callers go into why). Marking these nobuiltin is only useful if we actually emit the body, so don't mark these as such unless we _do_ plan on emitting that. This suboptimality was encountered in Linux (see some discussion on D71082, and ClangBuiltLinux/linux#979). Differential Revision: https://reviews.llvm.org/D78162
In cases where we have multiple decls of an inline builtin, we may need to go hunting for the one with a definition when setting function attributes. An additional test-case was provided on ClangBuiltLinux/linux#979
The change in clang (https://reviews.llvm.org/D71082) forces clang to not see resolve the name kernel's fortified version of memcpy as optimizable. After this change, clang is not able to fold some of the simple memcpy calls when FORTIFY is enabled.
Test case in https://reviews.llvm.org/D71082#1953975 :
Before (clang folded mempcy calls):
After (all sort of bound checks and call to memcpy):
The text was updated successfully, but these errors were encountered: