Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memcpy optimizations not working with Fortify after clang d437fba8ef626b6 #979

Closed
m-gupta opened this issue Apr 7, 2020 · 19 comments
Closed
Assignees

Comments

@m-gupta
Copy link

m-gupta commented Apr 7, 2020

The change in clang (https://reviews.llvm.org/D71082) forces clang to not see resolve the name kernel's fortified version of memcpy as optimizable. After this change, clang is not able to fold some of the simple memcpy calls when FORTIFY is enabled.

Test case in https://reviews.llvm.org/D71082#1953975 :

extern inline __attribute__((unused)) __attribute__((no_instrument_function)) __attribute__((always_inline)) __attribute__((gnu_inline)) void *memcpy(void *p, const void *q, size_t size)
{
 size_t p_size = __builtin_object_size(p, 0);
 size_t q_size = __builtin_object_size(q, 0);
 if (__builtin_constant_p(size)) {
  if (p_size < size)
   __write_overflow();
  if (q_size < size)
   __read_overflow2();
 }
 if (p_size < size || q_size < size)
  fortify_panic(__func__);
 return __builtin_memcpy(p, q, size);
}

static inline __attribute__((unused)) __attribute__((no_instrument_function)) void
memcpy_fromio(void *dst, const volatile void *src, size_t count)
{
 memcpy(dst, (const void *)src, count);
}

u64 sst_shim32_read64(void *addr, u32 offset)
{
 u64 val;
 memcpy_fromio(&val, addr + offset, sizeof(val));
 return val;
}

Before (clang folded mempcy calls):

0000000000000000 <sst_shim32_read64>:
   0:	e8 00 00 00 00       	callq  5 <sst_shim32_read64+0x5>
			1: R_X86_64_PLT32	__fentry__-0x4
   5:	55                   	push   %rbp
   6:	48 89 e5             	mov    %rsp,%rbp
   9:	89 f0                	mov    %esi,%eax
   b:	48 8b 04 07          	mov    (%rdi,%rax,1),%rax
   f:	5d                   	pop    %rbp
  10:	c3                   	retq

After (all sort of bound checks and call to memcpy):

0000000000000000 <sst_shim32_read64>:
   0:	e8 00 00 00 00       	callq  5 <sst_shim32_read64+0x5>
			1: R_X86_64_PLT32	__fentry__-0x4
   5:	55                   	push   %rbp
   6:	48 89 e5             	mov    %rsp,%rbp
   9:	53                   	push   %rbx
   a:	48 83 ec 10          	sub    $0x10,%rsp
   e:	65 48 8b 04 25 28 00 	mov    %gs:0x28,%rax
  15:	00 00 
  17:	48 89 45 f0          	mov    %rax,-0x10(%rbp)
  1b:	48 b8 aa aa aa aa aa 	movabs $0xaaaaaaaaaaaaaaaa,%rax
  22:	aa aa aa 
  25:	48 8d 5d e8          	lea    -0x18(%rbp),%rbx
  29:	48 89 03             	mov    %rax,(%rbx)
  2c:	89 f6                	mov    %esi,%esi
  2e:	48 01 fe             	add    %rdi,%rsi
  31:	ba 08 00 00 00       	mov    $0x8,%edx
  36:	48 89 df             	mov    %rbx,%rdi
  39:	e8 00 00 00 00       	callq  3e <sst_shim32_read64+0x3e>
			3a: R_X86_64_PLT32	memcpy-0x4
  3e:	48 8b 03             	mov    (%rbx),%rax
  41:	65 48 8b 0c 25 28 00 	mov    %gs:0x28,%rcx
  48:	00 00 
  4a:	48 3b 4d f0          	cmp    -0x10(%rbp),%rcx
  4e:	75 07                	jne    57 <sst_shim32_read64+0x57>
  50:	48 83 c4 10          	add    $0x10,%rsp
  54:	5b                   	pop    %rbx
  55:	5d                   	pop    %rbp
  56:	c3                   	retq   
  57:	e8 00 00 00 00       	callq  5c <sst_shim32_read64+0x5c>
			58: R_X86_64_PLT32	__stack_chk_fail-0x4
@nickdesaulniers
Copy link
Member

@gburgessiv alluded to making some changes on the kernel side. What would those changes be or look like?

@gburgessiv
Copy link
Member

all sort of bound checks and call to memcpy

To clarify, a lot of the checks there are due to security-oriented compiler flags seeing a function call. A more direct comparison is available here: https://godbolt.org/z/fqAhUC . Specifically, none of the conditions in the asm in comment #1 are from the inlined memcpy.

What would those changes be or look like?

In general terms, where possible, we should lean on "well-known" FORTIFY _chk functions instead of having our own checks inline.

In this case, it means we need to tweak the lines of the above godbolt link like so: https://godbolt.org/z/h_2N_X (we'd swap fully to the USE_FORTIFY_CHK case; the #ifdef is just for easy comparison). For each _chk function we add, we'll need a simple definition somewhere, since if clang can't optimize the call, we'll emit a call to it: https://godbolt.org/z/D8fjqx

These _chk functions generally boil down to

void *__memcpy_chk(void *p, const void *q, size_t size, size_t buf_size) {
  if (UNLIKELY(buf_size < size))
    fortify_fatal(":(");
  return __builtin_memcpy(p, q, size);
} 

Like said, both GCC and Clang know about these _chk functions and will try to optimize them into standard memcpy/etc calls in cases where it's statically obvious that the call won't fortify_fatal. As referenced on the CL, the _chk approach is strictly better for inline cost analysis in LLVM (and I'd assume in GCC, though haven't checked), and can sometimes result in simpler caller code, since we don't have the chance of emitting these checks at each callsite: https://godbolt.org/z/SRiGt-

The kernel's FORTIFY impl seems small. Lemme see if I can throw together a rough version of what I'd imagine a patch to look like.

@gburgessiv
Copy link
Member

Uploaded the patch in PR form so commenting and such is easier. :)

@kees
Copy link

kees commented Apr 8, 2020

So, I think something else might also be going on. AFAICT, the kernel's fortify functions are entirely invisible to Clang due to the "gnu_inline" attribute:

https://godbolt.org/z/fyt8wq

If you replace the original memcpy declaration, the call to __write_overflow vanishes from the Clang output and the result is unchecked at runtime. :( :(

@kees
Copy link

kees commented Apr 8, 2020

Well, or if I change it from "extern inline" to "static inline", then it works again too...

@gburgessiv
Copy link
Member

Right, clang will refuse to emit bodies for externally-available functions that trivially recurse, since the concept of an available-externally is "this function is implemented elsewhere, but here's its definition if it helps you inline." http://llvm.org/PR9614 is related, as is some of the CGM logic around isTriviallyRecursive: https://clang.llvm.org/doxygen/CodeGenModule_8cpp_source.html#2690 .

I see your point, though. The more I look at this, this appears to be an artifact of the LLVM patch: clang will see that the kernel wants to provide a gnu-inline version of memcpy, so it notes that memcpy is nobuiltin. Our isTriviallyRecursive logic above then kicks in, and prevents clang from actually emitting a definition for memcpy.

Looking at https://reviews.llvm.org/D71082, if I add false && to CodeGenModule.cpp:1843, I get the regular, entirely un-FORTIFY'ed memcpy back.

So a minimal fix to get our performance back would be to set nobuiltin only if we can actually emit the function that's meant to be intercepting e.g., memcpy. I think that should win us back cycles for these cases, but it still leaves the kernel's FORTIFY, in functions that appear recursive, effectively #ifdef'ed out. :/

For those following along at home, the reason my patch worked is because __memcpy_chk != memcpy, so clang was happy to emit those function definitions, etc etc.

@gburgessiv
Copy link
Member

Posted https://reviews.llvm.org/D78162 in hopes that we can fix this specific test-case.

It may/may not help all of the kernel's FORTIFY builtins, but I think it should unregress functions which are "directly" recursive through __asm__ labels or use of __builtin_${fn_name}.

Note that this is only intended to bring us back into our previous state (e.g., still no fortify_panics emitted, but it'll at least go faster).

gburgessiv added a commit to llvm/llvm-project that referenced this issue Apr 15, 2020
There are some inline builtin definitions that we can't emit
(isTriviallyRecursive & callers go into why). Marking these
nobuiltin is only useful if we actually emit the body, so don't mark
these as such unless we _do_ plan on emitting that.

This suboptimality was encountered in Linux (see some discussion on
D71082, and ClangBuiltLinux/linux#979).

Differential Revision: https://reviews.llvm.org/D78162
@gburgessiv
Copy link
Member

Okay, that cross-project mentioning support is cool :)

The above should fix this immediate bug. Happy to keep working with Kees et al on figuring out a good FORTIFY story for clang if that's interesting.

@nathanchance
Copy link
Member

@gburgessiv it looks like clang is crashing after your patch. Linaro's CI tripped on an assertion:

https://ci.linaro.org/job/tcwg_kernel-bisect-llvm-master-aarch64-lts-allyesconfig/30/artifact/artifacts/build-2dd17ff08165e6118e70f00e22b2c36d2d4e0a9a/console.log

creduce spits out this if it helps:

extern inline __attribute__((gnu_inline)) void *memset() {}
typeof(memset) memset;
*a = memset;

@gburgessiv
Copy link
Member

Thanks for the reduced test-case! I think I have a fix locally. It's pretty simple; I'll push it in if the tests pass :)

@nathanchance
Copy link
Member

Please add me to the review if possible (nathanchance), I saw one other crash in arch/x86/net/bpf_jit_comp.o that I want to make sure is fixed as well.

@gburgessiv
Copy link
Member

Since it's a breakage on ToT with a (IMO) pretty trivial fix, I was planning on doing post-commit review. The diff to non-tests is:

diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index 1243ce50ec8..ce28d741225 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -1909,9 +1909,15 @@ void CodeGenModule::SetFunctionAttributes(GlobalDecl GD, llvm::Function *F,
      F->setSection(SA->getName());
 
   // If we plan on emitting this inline builtin, we can't treat it as a builtin.
-  if (FD->isInlineBuiltinDeclaration() && shouldEmitFunction(FD)) {
-    F->addAttribute(llvm::AttributeList::FunctionIndex,
-                    llvm::Attribute::NoBuiltin);
+  if (FD->isInlineBuiltinDeclaration()) {
+    const FunctionDecl *FDBody;
+    bool HasBody = FD->hasBody(FDBody);
+    (void)HasBody;
+    assert(HasBody && "Inline builtin declarations should always have an "
+                      "available body!");
+    if (shouldEmitFunction(FDBody))
+      F->addAttribute(llvm::AttributeList::FunctionIndex,
+                      llvm::Attribute::NoBuiltin);
   }
 
   if (FD->isReplaceableGlobalAllocationFunction()) {

Would you like to test that before I land it?

@nathanchance
Copy link
Member

Yes, I will do it now.

@nathanchance
Copy link
Member

@gburgessiv that fixes both crashes that I noticed. Hopefully there are not any more, I won't know until I do a full kernel build run tonight.

@gburgessiv
Copy link
Member

Awesome -- I'll land it then and see what happens. Thanks for the extra checking! :)

gburgessiv added a commit to llvm/llvm-project that referenced this issue Apr 16, 2020
In cases where we have multiple decls of an inline builtin, we may need
to go hunting for the one with a definition when setting function
attributes.

An additional test-case was provided on
ClangBuiltLinux/linux#979
@gburgessiv
Copy link
Member

Since i hear no yelling, closing per #1002 . Thanks!

@nickdesaulniers
Copy link
Member

@zx2c4
Copy link

zx2c4 commented May 5, 2020

@kees
Copy link

kees commented May 5, 2020

Yes please!

github-actions bot pushed a commit to tstellar/llvm-project that referenced this issue May 6, 2020
There are some inline builtin definitions that we can't emit
(isTriviallyRecursive & callers go into why). Marking these
nobuiltin is only useful if we actually emit the body, so don't mark
these as such unless we _do_ plan on emitting that.

This suboptimality was encountered in Linux (see some discussion on
D71082, and ClangBuiltLinux/linux#979).

Differential Revision: https://reviews.llvm.org/D78162

(cherry picked from commit 2dd17ff)
github-actions bot pushed a commit to tstellar/llvm-project that referenced this issue May 6, 2020
In cases where we have multiple decls of an inline builtin, we may need
to go hunting for the one with a definition when setting function
attributes.

An additional test-case was provided on
ClangBuiltLinux/linux#979

(cherry picked from commit 9490808)
tstellar pushed a commit to tstellar/llvm-project that referenced this issue May 7, 2020
There are some inline builtin definitions that we can't emit
(isTriviallyRecursive & callers go into why). Marking these
nobuiltin is only useful if we actually emit the body, so don't mark
these as such unless we _do_ plan on emitting that.

This suboptimality was encountered in Linux (see some discussion on
D71082, and ClangBuiltLinux/linux#979).

Differential Revision: https://reviews.llvm.org/D78162

(cherry picked from commit 2dd17ff)
tstellar pushed a commit to tstellar/llvm-project that referenced this issue May 7, 2020
In cases where we have multiple decls of an inline builtin, we may need
to go hunting for the one with a definition when setting function
attributes.

An additional test-case was provided on
ClangBuiltLinux/linux#979

(cherry picked from commit 9490808)
arichardson pushed a commit to arichardson/llvm-project that referenced this issue Jul 2, 2020
There are some inline builtin definitions that we can't emit
(isTriviallyRecursive & callers go into why). Marking these
nobuiltin is only useful if we actually emit the body, so don't mark
these as such unless we _do_ plan on emitting that.

This suboptimality was encountered in Linux (see some discussion on
D71082, and ClangBuiltLinux/linux#979).

Differential Revision: https://reviews.llvm.org/D78162
arichardson pushed a commit to arichardson/llvm-project that referenced this issue Jul 2, 2020
In cases where we have multiple decls of an inline builtin, we may need
to go hunting for the one with a definition when setting function
attributes.

An additional test-case was provided on
ClangBuiltLinux/linux#979
ajohnson-uoregon pushed a commit to ajohnson-uoregon/clang-rewrite-only that referenced this issue Jul 17, 2022
There are some inline builtin definitions that we can't emit
(isTriviallyRecursive & callers go into why). Marking these
nobuiltin is only useful if we actually emit the body, so don't mark
these as such unless we _do_ plan on emitting that.

This suboptimality was encountered in Linux (see some discussion on
D71082, and ClangBuiltLinux/linux#979).

Differential Revision: https://reviews.llvm.org/D78162

(cherry picked from commit 19fc98a)
ajohnson-uoregon pushed a commit to ajohnson-uoregon/clang-rewrite-only that referenced this issue Jul 17, 2022
In cases where we have multiple decls of an inline builtin, we may need
to go hunting for the one with a definition when setting function
attributes.

An additional test-case was provided on
ClangBuiltLinux/linux#979

(cherry picked from commit 505dbc0)
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
There are some inline builtin definitions that we can't emit
(isTriviallyRecursive & callers go into why). Marking these
nobuiltin is only useful if we actually emit the body, so don't mark
these as such unless we _do_ plan on emitting that.

This suboptimality was encountered in Linux (see some discussion on
D71082, and ClangBuiltLinux/linux#979).

Differential Revision: https://reviews.llvm.org/D78162
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
In cases where we have multiple decls of an inline builtin, we may need
to go hunting for the one with a definition when setting function
attributes.

An additional test-case was provided on
ClangBuiltLinux/linux#979
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants