Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: fortify: swap to _chk-ish intrinsics #980

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 38 additions & 17 deletions include/linux/string.h
Original file line number Diff line number Diff line change
Expand Up @@ -272,31 +272,45 @@ void __read_overflow3(void) __compiletime_error("detected read beyond size of ob
void __write_overflow(void) __compiletime_error("detected write beyond size of object passed as 1st parameter");

#if !defined(__NO_FORTIFY) && defined(__OPTIMIZE__) && defined(CONFIG_FORTIFY_SOURCE)

/*
* NOTE: Proof of __builtin___strncpy_chk => strncpy:
* https://godbolt.org/z/nA4wVC
*/
__FORTIFY_INLINE char *strncpy(char *p, const char *q, __kernel_size_t size)
{
size_t p_size = __builtin_object_size(p, 0);
if (__builtin_constant_p(size) && p_size < size)
__write_overflow();
if (p_size < size)
fortify_panic(__func__);
return __builtin_strncpy(p, q, size);
return __builtin___strncpy_chk(p, q, size, p_size);
}

/* NOTE: Proof of __strcat_chk => strcat: https://godbolt.org/z/Hquxv9 */
extern char *__strcat_chk(char *p, const char *q, size_t p_size);
__FORTIFY_INLINE char *strcat(char *p, const char *q)
{
size_t p_size = __builtin_object_size(p, 0);
if (p_size == (size_t)-1)
return __builtin_strcat(p, q);
if (strlcat(p, q, p_size) >= p_size)
fortify_panic(__func__);
return p;
return __strcat_chk(p, q, p_size);
}

/*
* NOTE: Clang recognizes __strlen_chk, but GCC doesn't. Doesn't seem likely
* that we'll get it backported into previous GCC releases, either.
* https://godbolt.org/z/Sapi3H .
*
* Presented is the 'best' middle ground I think we can reach for both
* compilers WRT optimizability.
*/
#ifdef __clang__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CONFIG_CC_IS_CLANG is used more than __clang__ in the kernel (it actually looks like there is some clean up opportunity there...).

extern __kernel_size_t __strlen_chk(const char *p, size_t p_size);
#endif
__FORTIFY_INLINE __kernel_size_t strlen(const char *p)
{
__kernel_size_t ret;
size_t p_size = __builtin_object_size(p, 0);

#ifdef __clang__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This #ifdef block is an eyesore but I do not really have another solution to offer :/

In general, the kernel prefers IS_ENABLED(CONFIG_...) to #ifdef (https://elixir.bootlin.com/linux/v5.6.2/source/include/linux/kconfig.h#L71). I have no idea if that is possible here.

Another possible solution would be possibly hoisting this into compiler-gcc.h and compiler-clang.h?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err, if __strlen_chk() is implemented below, can't this be unconditional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to answer both of your questions together. :)

This CL's intent is to use code that's as optimizeable as possible for clang + GCC. It does so by leaning heavily on Clang and GCC's special knowledge of well-known FORTIFY functions spelled __*_chk.

For example, Clang has special knowledge of __strlen_chk, so it's capable of e.g., folding it into a constant or deferring to a non-FORTIFY'ed strlen if doing so is obviously safe, whereas GCC isn't: https://godbolt.org/z/BBjEDH . The hope is to produce ~equivalent code to having the safety checks inline, for cases where the safety checks can be trivially elided (importantly, including __builtin_object_size(x, n) == -1).

The issue this is all trying to work around is that the sort of inherent "recursion" in many inline FORTIFY functions apparently forces LLVM to consider the FORTIFY'ed version to not be a builtin. I haven't deeply understood the implications of trying to fix that, but I know we've had many issues in the past of the optimizer being 'too smart' with FORTIFY'ed inline functions, so I'm not optimistic that it'll be a straightforward fix.

FWIW, there's a way cleaner approach we can take here (that'd also give clang the power to give nice compile-time diagnostics for trivial bugs. Not as well as GCC, to be fair, but something is better than nothing), but it'd require a fair amount code in an #ifdef clang block. IDK how kernel people feel about those these days. Happy to whip up a minimal example of what that may look like, if you're interested.

return __strlen_chk(p, p_size);
#else
__kernel_size_t ret;
/* Work around gcc excess stack consumption issue */
if (p_size == (size_t)-1 ||
(__builtin_constant_p(p[p_size - 1]) && p[p_size - 1] == '\0'))
Expand All @@ -305,6 +319,7 @@ __FORTIFY_INLINE __kernel_size_t strlen(const char *p)
if (p_size <= ret)
fortify_panic(__func__);
return ret;
#endif
}

extern __kernel_size_t __real_strnlen(const char *, __kernel_size_t) __RENAME(strnlen);
Expand All @@ -317,6 +332,14 @@ __FORTIFY_INLINE __kernel_size_t strnlen(const char *p, __kernel_size_t maxlen)
return ret;
}

/*
* NOTE: Clang recognizes __strlcpy_chk, but GCC doesn't. Doesn't seem likely
* that we'll get it backported into previous GCC releases, either.
* https://godbolt.org/z/RsTLtV .
*
* I'm not sure how deeply we care about strlcpy optimizations. Maybe not enough
* to have compiler-specific code for this handrolled strlcpy?
*/
/* defined after fortified strlen to reuse it */
extern size_t __real_strlcpy(char *, const char *, size_t) __RENAME(strlcpy);
__FORTIFY_INLINE size_t strlcpy(char *p, const char *q, size_t size)
Expand Down Expand Up @@ -361,9 +384,7 @@ __FORTIFY_INLINE void *memset(void *p, int c, __kernel_size_t size)
size_t p_size = __builtin_object_size(p, 0);
if (__builtin_constant_p(size) && p_size < size)
__write_overflow();
if (p_size < size)
fortify_panic(__func__);
return __builtin_memset(p, c, size);
return __builtin___memset_chk(p, c, size, p_size);
}

__FORTIFY_INLINE void *memcpy(void *p, const void *q, __kernel_size_t size)
Expand All @@ -376,9 +397,9 @@ __FORTIFY_INLINE void *memcpy(void *p, const void *q, __kernel_size_t size)
if (q_size < size)
__read_overflow2();
}
if (p_size < size || q_size < size)
if (q_size < size)
fortify_panic(__func__);
return __builtin_memcpy(p, q, size);
return __builtin___memcpy_chk(p, q, size, p_size);
}

__FORTIFY_INLINE void *memmove(void *p, const void *q, __kernel_size_t size)
Expand All @@ -391,9 +412,9 @@ __FORTIFY_INLINE void *memmove(void *p, const void *q, __kernel_size_t size)
if (q_size < size)
__read_overflow2();
}
if (p_size < size || q_size < size)
if (q_size < size)
fortify_panic(__func__);
return __builtin_memmove(p, q, size);
return __builtin___memmove_chk(p, q, size, p_size);
}

extern void *__real_memscan(void *, int, __kernel_size_t) __RENAME(memscan);
Expand Down
51 changes: 51 additions & 0 deletions lib/string.c
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,14 @@ char *strncpy(char *dest, const char *src, size_t count)
EXPORT_SYMBOL(strncpy);
#endif

char *__strncpy_chk(char *dest, const char *src, size_t count, size_t dest_size)
{
if (unlikely(dest_size < count))
fortify_panic(__func__);
return __builtin_strncpy(dest, src, count);
}
EXPORT_SYMBOL(__strncpy_chk);

#ifndef __HAVE_ARCH_STRLCPY
/**
* strlcpy - Copy a C-string into a sized buffer
Expand Down Expand Up @@ -292,6 +300,14 @@ char *strcat(char *dest, const char *src)
EXPORT_SYMBOL(strcat);
#endif

char *__strcat_chk(char *dest, const char *src, size_t p_size)
{
if (unlikely(strlcat(dest, src, p_size) >= p_size))
fortify_panic(__func__);
return dest;
}
EXPORT_SYMBOL(__strcat_chk);

#ifndef __HAVE_ARCH_STRNCAT
/**
* strncat - Append a length-limited, C-string to another
Expand Down Expand Up @@ -548,6 +564,17 @@ size_t strlen(const char *s)
EXPORT_SYMBOL(strlen);
#endif

#ifdef __clang__
size_t __strlen_chk(const char *p, size_t p_size)
{
__kernel_size_t ret = __real_strnlen(p, p_size);
if (unlikely(p_size <= ret))
fortify_panic(__func__);
return ret;
}
EXPORT_SYMBOL(__strlen_chk);
#endif

#ifndef __HAVE_ARCH_STRNLEN
/**
* strnlen - Find the length of a length-limited string
Expand Down Expand Up @@ -781,6 +808,14 @@ void *memset(void *s, int c, size_t count)
EXPORT_SYMBOL(memset);
#endif

void *__memset_chk(void *s, int c, size_t count, size_t s_size)
{
if (unlikely(s_size < count))
fortify_panic(__func__);
return __builtin_memset(s, c, count);
}
EXPORT_SYMBOL(__memset_chk);

#ifndef __HAVE_ARCH_MEMSET16
/**
* memset16() - Fill a memory area with a uint16_t
Expand Down Expand Up @@ -869,6 +904,14 @@ void *memcpy(void *dest, const void *src, size_t count)
EXPORT_SYMBOL(memcpy);
#endif

void *__memcpy_chk(void *dest, const void *src, size_t count, size_t dest_size)
{
if (unlikely(dest_size) < count)
fortify_panic(__func__);
return __builtin_memcpy(dest, src, count);
}
EXPORT_SYMBOL(__memcpy_chk);

#ifndef __HAVE_ARCH_MEMMOVE
/**
* memmove - Copy one area of memory to another
Expand Down Expand Up @@ -901,6 +944,14 @@ void *memmove(void *dest, const void *src, size_t count)
EXPORT_SYMBOL(memmove);
#endif

void *__memmove_chk(void *dest, const void *src, size_t count, size_t dest_size)
{
if (unlikely(dest_size < count))
fortify_panic(__func__);
return __builtin_memmove(dest, src, count);
}
EXPORT_SYMBOL(__memmove_chk);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original intent was to have all checking inline. Pushing these out external function calls seems not great. Why can't these just all be inline too?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be the least invasive change that appears to work in both GCC and Clang, if my testing is to be believed:

diff --git a/include/linux/string.h b/include/linux/string.h
index 6dfbb2efa815..2388383921bd 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -366,6 +366,16 @@ __FORTIFY_INLINE void *memset(void *p, int c, __kernel_size_t size)
        return __builtin_memset(p, c, size);
 }
 
+#ifdef CONFIG_CC_IS_CLANG
+static inline void *__memcpy_chk(void *p, const void *q, size_t size, size_t p_size)
+{
+       return __builtin_memcpy(p, q, size);
+}
+#define __fortified_memcpy(p, q, size, p_size) __builtin___memcpy_chk(p, q, size, p_size)
+#else
+#define __fortified_memcpy(p, q, size, p_size) __builtin_memcpy(p, q, size)
+#endif
+
 __FORTIFY_INLINE void *memcpy(void *p, const void *q, __kernel_size_t size)
 {
        size_t p_size = __builtin_object_size(p, 0);
@@ -378,7 +388,7 @@ __FORTIFY_INLINE void *memcpy(void *p, const void *q, __kernel_size_t size)
        }
        if (p_size < size || q_size < size)
                fortify_panic(__func__);
-       return __builtin_memcpy(p, q, size);
+       return __fortified_memcpy(p, q, size, p_size);
 }
 
 __FORTIFY_INLINE void *memmove(void *p, const void *q, __kernel_size_t size)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a great mental model for how clang codegens __builtin_memcpy. Namely, whether there are any guarantees that that will always go to a memcpy defined outside of the current TU.

In the past, we've had issues with LLVM seeing things like the above as mutual recursion, and optimizing libfuncs like that to basically nop. Maybe clang has grown guarantees in the interim, but I haven't been paying close enough attention, so I can't say for sure offhand. :)

FWIW, if we don't care about &memcpy == &memcpy or __builtin_object_size(x, 1) being equivalent to __builtin_object_size(x, 0) ISTM we can do

#define __FORTIFY_INLINE_BASE inline __attribute__((always_inline))
#ifdef CONFIG_CC_IS_CLANG
#define __FORTIFY_INLINE static __FORTIFY_INLINE_BASE __attribute__((overloadable)) __attribute__((enable_if(1, "")))
#else 
#define __FORTIFY_INLINE extern __FORTIFY_INLINE_BASE __attribute__((gnu_inline))
#endif 

And everything else should Just Work. This has clang emit our FORTIFY'ed memcpy as a C++-style overload for memcpy: https://godbolt.org/z/qqBP8P .

If we do care about either (or both) of the constraints above, we can declare victory by sprinkling __attribute__((pass_object_size(N)))s around on the params of these functions. It's super simple to do on its own, though admittedly noisy:

/* N.B. -- this also requires the overloadable change in the snippet above */

#ifdef CONFIG_CC_IS_CLANG
#define __pass_object_size const __attribute__((pass_object_size(0)))
#else 
#define __pass_object_size
#endif 

__FORTIFY_INLINE void *memcpy(void * __pass_object_size p, const void *__pass_object_size q, __kernel_size_t size)
{
  ...
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I need to keep __builtin_object_size(x, 0) for memcpy() because the kernel somewhat regularly likes to use the address of a struct member to find where to start writing and then proceeds to intentionally copy across several members. :( The more common flaw is array bounds being overrun with direct access (not with memcpy()), and I'm also trying to get that addressed with UBSan's bounds checking (KSPP#25).

I do want to switch to __builtin_object_size(x, 1) for the str*() family, though, but that's also a separate effort (KSPP#6).

Another issue was the inline requirement being wanted to avoid making function calls. So, while __pass_object_size could absolutely be helpful, I'm not entirely certain it's the right thing for here. I'll need to try it out and see what the resulting kernel image looks like.

My first goal right now is to build testing for the fortify behaviors so any given compiler's output can be validated. I had avoided this in the past because of needing to invent a way to have a build failure be seen as a success by the kernel's build system (i.e. to verify compile-time checks are triggering) in addition to the run-time tests (which I'd add to the kernel's LKDTM subsystem that checks for kernel-faulting style failure conditions).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I need to keep __builtin_object_size(x, 0) for memcpy

Yeah, Bionic and glibc need to do similar things. It's sad, but still I think the best trade-off.

Another issue was the inline requirement being wanted to avoid making function calls.

Totally understandable. To clarify about the following though:

So, while __pass_object_size could absolutely be helpful, I'm not entirely certain it's the right thing for here. I'll need to try it out and see what the resulting kernel image looks like.

FWIW, __pass_object_size doesn't affect inlieability at all when always_inline is present. It only lifts the call to __builtin_object_size() up to the caller: https://godbolt.org/z/LV5TJT . Because of how clang is architected, this helps improve results with __builtin_object_size(x, 1) in some cases. (And can be used for cross-TU __builtin_object_size, but I know of 0 users of that in practice ;) ).

It comes with the side-benefit of making it impossible to take the address of the function that pass_object_size is on, since we're inserting "invisible" params with that attribute. So all &memcpys, for example, will grab the address of the not-FORTIFY'ed wrapper.

I had avoided this in the past because of needing to invent a way to have a build failure be seen as a success by the kernel's build system (i.e. to verify compile-time checks are triggering) in addition to the run-time tests (which I'd add to the kernel's LKDTM subsystem that checks for kernel-faulting style failure conditions).

Fair fair. If it helps spark inspiration, Bionic wraps diagnostic & run-time tests for FORTIFY all up in a single file: https://android.googlesource.com/platform/bionic/+/refs/heads/master/tests/clang_fortify_tests.cpp#600

The diagnostic testing in particular uses clang's internal -verify option, which clang uses for its own diagnostic tests.


#ifndef __HAVE_ARCH_MEMCMP
/**
* memcmp - Compare two areas of memory
Expand Down