Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clang only uses TLS for stack protector on x86 #297

Closed
alexcohn opened this issue Feb 14, 2017 · 14 comments
Closed

Clang only uses TLS for stack protector on x86 #297

alexcohn opened this issue Feb 14, 2017 · 14 comments
Assignees
Labels
Milestone

Comments

@alexcohn
Copy link

alexcohn commented Feb 14, 2017

Description

http://stackoverflow.com/questions/42190295/stack-protection-triggered-by-eglmakecurrent/42233084

We have come across a weird device, named Motorla RAZR i, with x86 CPU and Android version 4.1.2 (circa 2013).

This device crashes sometimes on GL operations in native code. We finally reduced this error to just few lines of innocent function calls:

void stackCorrupt() {
  const EGLint configAttrs[] = {EGL_RENDERABLE_TYPE, EGL_OPENGL_ES2_BIT, EGL_NONE};
  const EGLint contextAttrs[] = {EGL_CONTEXT_CLIENT_VERSION, 2, EGL_NONE};
  const EGLint const surfaceAttrs[] = {EGL_WIDTH, 16, EGL_HEIGHT, 16, EGL_NONE};

  EGLDisplay display = eglGetDisplay(EGL_DEFAULT_DISPLAY);
  eglInitialize(display, 0, 0);

  EGLint numConfigs;
  EGLConfig config;
  eglChooseConfig(display, configAttrs, &config, 1, &numConfigs);

  EGLContext context = eglCreateContext(display, config, NULL, contextAttrs);
  EGLSurface surface = eglCreatePbufferSurface(display, config, surfaceAttrs);

  // commenting out this call makes the stack clean again
  eglMakeCurrent(display, surface, surface, context);

  // only to know that the context is good
  __android_log_print(ANDROID_LOG_DEBUG, "native-lib", "GL_RENDERER %s", glGetString(GL_RENDERER));

  // Cleanup
  eglDestroySurface(display, surface);
  eglDestroyContext(display, context);
  eglTerminate(display);
}

When compiled with -fstack-protector-strong (which is default if I understand correctly), calling this function on the specific device invariably causes

 F /system/bin/app_process: stack corruption detected: aborted

But if I use -fno-stack-protector, execution continues as if nothing bad happened. Which may (if I understand the situation correctly) become a trap for any further execution. Even more dangerous (because not explicit), adding -flto to CFLAGS hides the crash.

In the code above, I removed error checking for brevity. Actually, all calls succeed. Also, the Cleanup section does not change the stack thrashing behavior of eglMakeCurrent(), but sometimes calling eglMakeCurrent(display, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT) before Cleanup, does help.

TL;NR: this is a false alarm. The EGL library on the device (which was never updated since 2013) does not respect the TLS conventions that were introduced years later, and the new NDK (we use the stable r13) falsely blames our innocent function in stack smashing.

I found these (Nov 2015) https://gcc.gnu.org/ml/gcc/2015-11/msg00060.html and (Apr 2013): https://gcc.gnu.org/ml/gcc-patches/2013-04/msg00764.html

x86 android: change -fstack-protector guard default to TLS

Folks that are compiling
native code for pre-4.2 Android will most likely be using older
versions of the Android NDK in any case.

I would like to propose removing the Bionic exception, e.g. changing
the code to read:

 /* Handle stack protector */
 if (!opts_set->x_ix86_stack_protector_guard)
   opts->x_ix86_stack_protector_guard = SSP_TLS;

The assumption is wrong, our code runs same jar's and so's on all platforms from 4.0 to 7.0 and we don't want to use old versions of NDK for some of target devices. Actually, NDK r14 still provides support for android-9 (a.k.a. 2.3 GINGERBREAD).

Unfortunately, this means that at least on this device we must run the debug version of our C++ code with -fno-stack-protector. Luckily, our release build does not apply stack protector.

Environment Details

  • NDK Version: 13.1.3345770
  • Build sytem: ndk-build
  • Host OS: Mac
  • Compiler: both GCC and Clang.
  • ABI: x86
  • NDK API level: 9
  • Device API level: 16
@DanAlbert
Copy link
Member

Thanks for following up on this. Saw your SO post yesterday and was waiting to ask someone about it.

The assumption is wrong, our code runs same jar's and so's on all platforms from 4.0 to 7.0 and we don't want to use old versions of NDK for some of target devices.

Completely agreed. One NDK should work for all of its supported targets. Your link is actually to a different message, btw, you wanted https://gcc.gnu.org/ml/gcc/2015-11/msg00060.html. I'm glad you did link that one though, because it does actually give a clue as to how to fix this.

If you add LOCAL_CFLAGS += -mstack-protector-guard=global, does it work? If so, I should make this change in the NDK itself.

@DanAlbert
Copy link
Member

If it turns out that is what we need (sounds like yes, so I'll go ahead and prepare the patch regardless), I'll add this to r14 since it's low risk.

@DanAlbert DanAlbert self-assigned this Feb 14, 2017
@DanAlbert DanAlbert added this to the r14 milestone Feb 14, 2017
@DanAlbert
Copy link
Member

https://android-review.googlesource.com/338350

@alexcohn
Copy link
Author

Thanks, the -mstack-protector-guard=global workaround is verified for NDK r13. I am glad that we caught this one before it hit the end users.

Thanks for the correction re: gcc maillist link, I fixed my text.

@alexcohn alexcohn reopened this Feb 15, 2017
@DanAlbert
Copy link
Member

Merged into r14.

@alexcohn
Copy link
Author

alexcohn commented Feb 19, 2017

@DanAlbert is there a way to tune clang, too? I couldn't find the flag with r14β2

Android clang version 3.8.275480  (based on LLVM 3.8.275480)
Target: x86_64-apple-darwin16.4.0
Thread model: posix

@DanAlbert
Copy link
Member

I'm not sure what Clang does with this. @enh or @stephenhines should know.

@alexcohn
Copy link
Author

alexcohn commented Feb 20, 2017

I could not find a workaround on macOS. And if I understand correctly, this is an important question, because GCC is not supported anymore.

@enh
Copy link
Contributor

enh commented Feb 20, 2017

i filed an internal clang bug (http://b/27494085) a while back that clang always uses TLS on x86/x86-64 and always uses the global on other architectures. afaik there's been no movement on that (though the bug implies that we might now have a clang that uses TLS for aarch64). regardless, clang has never to my knowledge offered the user a choice.

@DanAlbert DanAlbert reopened this Feb 20, 2017
@DanAlbert DanAlbert changed the title FALSE ALARM: stack protection triggered by an innocent call to egl Clang only uses TLS for stack protector on x86 Feb 20, 2017
@DanAlbert
Copy link
Member

FYI, patch for Clang is up for review now: https://reviews.llvm.org/D30276

Assuming that's submitted soon I'm guessing @stephenhines will be okay with cherry-picking this for r15?

@DanAlbert DanAlbert modified the milestones: r15, r14 Mar 2, 2017
@DanAlbert
Copy link
Member

The change did get submitted, so we just need to make sure it gets cherry-picked into our next Clang build.

@alexcohn
Copy link
Author

alexcohn commented Mar 2, 2017 via email

a252539783 pushed a commit to a252539783/aosp-platform-ndk that referenced this issue May 3, 2017
https://gcc.gnu.org/ml/gcc/2015-11/msg00060.html changed the default
for this from using a global to using the TLS slot. As noted in
android/ndk#297 (and in that commit),
this is not compatible with pre-4.2 devices, so we need to guard
against that in the NDK.

Test: ./validate.py --filter mstack-protector-guard
Bug: android/ndk#297
Change-Id: I48152b452b210ed5d8967c3cb617e6fe97027b10
(cherry picked from commit 1f46cdf)
@DanAlbert
Copy link
Member

This patch was included in the update I just pushed to r15. Will be in beta 2.

@alexcohn
Copy link
Author

alexcohn commented Jun 6, 2017

Oh, sweet! I was under false assumption that we need to specify -mstack-protector-guard=global or its equivalent for clang in our make scripts. Now I understand that llc handles this transparently, based on the android API version! Also, -fstack-protector-strong is set for us.

miodragdinic pushed a commit to MIPS/ndk that referenced this issue Apr 17, 2018
https://gcc.gnu.org/ml/gcc/2015-11/msg00060.html changed the default
for this from using a global to using the TLS slot. As noted in
android/ndk#297 (and in that commit),
this is not compatible with pre-4.2 devices, so we need to guard
against that in the NDK.

Test: ./validate.py --filter mstack-protector-guard
Bug: android/ndk#297
Change-Id: I48152b452b210ed5d8967c3cb617e6fe97027b10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants