Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undefined reference to syscalls in r15 #394

Closed
Mygod opened this issue May 15, 2017 · 24 comments
Closed

Undefined reference to syscalls in r15 #394

Mygod opened this issue May 15, 2017 · 24 comments
Labels
Milestone

Comments

@Mygod
Copy link

Mygod commented May 15, 2017

Description

clang++: error: linker command failed with exit code 1 (use -v to see invocation)
mobile/src/main/jni/../jni/libev/ev_epoll.c:249: error: undefined reference to 'epoll_create1'
make: *** [mobile/target/android/intermediates/ndk/obj/local/armeabi-v7a/libss-tunnel.so] Error 1
mobile/src/main/jni/../jni/libev/ev.c:4405: error: undefined reference to 'inotify_init1'
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [mobile/target/android/intermediates/ndk/obj/local/x86/libss-tunnel.so] Error 1
mobile/src/main/jni/../jni/libev/ev_epoll.c:249: error: undefined reference to 'epoll_create1'
mobile/src/main/jni/../jni/libev/ev.c:4405: error: undefined reference to 'inotify_init1'
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [mobile/target/android/intermediates/ndk/obj/local/x86/libss-local.so] Error 1

Travis using r14 works: https://travis-ci.org/shadowsocks/shadowsocks-android/builds/232310650

Source code: https://github.com/shadowsocks/shadowsocks-android

Environment Details

  • NDK Version: 15.0.3869609-beta1
  • Build sytem: ndk-build
  • Host OS: Linux Mint 18.1 x64
  • Compiler: clang
  • ABI: armeabi-v7a, x86 (error didn't occur to arm64-v8a)
  • STL: stlport_static
  • NDK API level: android-19
  • Device API level: Not applicable
@enh
Copy link
Contributor

enh commented May 15, 2017

looks like a bug in libev:

#ifdef EPOLL_CLOEXEC
backend_fd = epoll_create1 (EPOLL_CLOEXEC);

https://github.com/BeeswaxIO/libev/blob/a92e9ffa951d26a90212a2ff207dbfafd2790e1f/ev_epoll.c#L241

they assume that if the EPOLL_CLOEXEC constant is available, epoll_create1 is available. that's not true on Android. they need either a configure-style test for epoll_create1, or they can check ANDROID_API >= 21.

they make the same incorrect assumption with inotify_init1:

#if defined IN_CLOEXEC && defined IN_NONBLOCK
int fd = inotify_init1 (IN_CLOEXEC | IN_NONBLOCK);

@Mygod
Copy link
Author

Mygod commented May 16, 2017

@enh What? Why isn't that true on Android? And why did it work back in r14?

@DanAlbert
Copy link
Member

Why isn't that true on Android?

Because there wasn't a C library wrapper for epoll_create1 until L: https://android.googlesource.com/platform/bionic/+/master/libc/libc.map.txt#308

EPOLL_CLOEXEC comes from the Linux kernel headers (which are compatible with any kernel version), whereas epoll_create1 comes from the C library. EPOLL_CLOEXEC needs to be available regardless of the existence of the C library wrapper because syscall(__NR_epoll_create1, EPOLL_CLOEXEC) is still valid.

And why did it work back in r14?

r14 defaulted to using the deprecated, waaaaaaay out of date headers. r15 has made unified headers the default.

@Mygod
Copy link
Author

Mygod commented May 16, 2017

Thanks for the explanation! Btw is __ANDROID_API__ a stable API I can use so that I can write preprocessor macros as such:

#if !defined __ANDROID_API__ || __ANDROID_API__ >= 21
...
#endif

@DanAlbert
Copy link
Member

Yep, __ANDROID_API__ will always be defined when using the NDK.

@Mygod
Copy link
Author

Mygod commented May 16, 2017

Thanks. Closing this.

@Mygod Mygod closed this as completed May 16, 2017
Mygod added a commit to Mygod/libev that referenced this issue May 16, 2017
NDK r15 introduced unified headers which will cause libev failing to
link. This commit checks Android API version before attempting to invoke
new syscalls.

More information:
android/ndk#394 (comment)
Mygod added a commit to Mygod/libev that referenced this issue May 16, 2017
NDK r15 introduced unified headers which will cause libev failing to
link. This commit checks Android API version before attempting to invoke
new syscalls.

More information:
android/ndk#394 (comment)
@alexcohn
Copy link

FWIW, this is a duplicate of #302

@alexcohn
Copy link

@DanAlbert , @enh what solution would you recommend?

int fd;
#if defined EPOLL_CLOEXEC && (!defined __ANDROID__ || __ANDROID_API__ >= 21)
  fd = epoll_create1 (EPOLL_CLOEXEC);
  if (fd < 0 && (errno == EINVAL || errno == ENOSYS))
#endif
  fd = epoll_create(epoll_size);
  if (fd != -1)
    ::fcntl(fd, F_SETFD, FD_CLOEXEC);

For boost, we are currently using another ugly workaround (note there is still no activity on https://svn.boost.org/trac/boost/ticket/12863):

#if __ANDROID_API__ < 21
  #define epoll_create1(x) -1; errno=ENOSYS
#endif

@DanAlbert
Copy link
Member

Boost already has fallback code if epoll_create1 isn't present. I think we should just add the __ANDROID_API__ >= 21 to the check. I've opened a bug for us to send that patch, since at this point I don't think boost is going to do it themselves. I'll do the same for libev. If there are any other big open source projects that need some work for migrating here, open more bugs and we'll see what we can do. Obviously we can't fix the world, but if we can fix the big ones that should help a lot.

@alexcohn
Copy link

I was brought up on "detect features, not versions" so I am very unhappy with spreading __ANDROID_API__ >= 21 in the code that is not part of Android API, but I hear your objections to having EPOLL_CLOEXEC conditionally defined in Android headers. I hope there may be some better more robust approach to this problem.

@enh
Copy link
Contributor

enh commented May 17, 2017

well, we've all lived with the alternative in the NDK for years, so we know how well that works out :-)

but in this case detecting the feature is the "try epoll_create1, fall back to epoll_create, fall back to poll/select" code. Android doesn't mandate kernel versions, so even with the legacy headers seeing EPOLL_CLOEXEC didn't mean much if anything.

@alexcohn
Copy link

alexcohn commented May 18, 2017

What about adding to legacy_signal_inlines.h

static __inline int epoll_create1(int) { 
  errno = ENOSYS;
  return -1;
}

This will fix at least boost and libev in backwards-compatible way.

@DanAlbert
Copy link
Member

Actually, I suppose we can do even better and just do:

static __inline int epoll_create1(int flags) {
    return syscall(__NR_epoll_create1, flags);
}

right? We've talked about trying to get libc_syscalls.a into the NDK for this kind of thing already, and this is pretty much the same thing.

@DanAlbert DanAlbert reopened this May 18, 2017
@DanAlbert
Copy link
Member

The downside of that approach is that it turns a build failure into a run time failure if the calling code isn't tolerant of the fact that syscalls with functions might not actually be available in the kernel. Boost and libev both are, but I'm not sure how common that is.

@Mygod
Copy link
Author

Mygod commented May 19, 2017 via email

@alexcohn
Copy link

alexcohn commented May 19, 2017

The major problem with syscall approach for me is that it assumes that all kernes below API 21 will gracefully fail for this call with appropriate flags and return values. I do know that some of the ancient Linux kernels don't.

@jmgao
Copy link
Contributor

jmgao commented May 19, 2017

I do know that some of the ancient Linux kernels don't.

😱

Do you know what they actually do instead of ENOSYS? Is it something like a device vendor using the next available syscall number instead of picking a number way out of range?

@enh
Copy link
Contributor

enh commented May 19, 2017

a device vendor using the next available syscall number

we've certainly seen that happen numerous times, but there's not much we can do about it other than (a) always have CTS tests for any new syscall wrapper we add, and (b) longer term try to move to a world where we do mandate minimum kernel versions.

in this specific case, epoll_create1 and inotify_init1 are so old that there aren't any devices still running with kernels that old.

@alexcohn
Copy link

The question is which fallback is safer: to use syscall() and expect the kernel to fulfill its contract, or to force ENOSYS for API < 21 and fail to get the valid kernel answer when the kernel actually supports __NR_epoll_create1.

My opinion is that the latter is safer, because we do know that libc does not export epoll_create1() under API 21, and nobody expected it to be called on this device. E.g. a developer of NDK library could assume that epoll_create1() won't be called, because he explicitly set APP_PLATFORM=android-14.


Please note that epoll_create1() was used above for illustration only. Same arguments work for other syscall cases, too. And I believe that the approach should be uniform for all these. It would be really confusing to provide different workarounds for inotify_init1 and epoll_create1().

@aberaud
Copy link

aberaud commented May 19, 2017

Also an issue in Asio (might be the same issue as Boost since Asio is now part of Boost)
https://github.com/chriskohlhoff/asio/blob/master/asio/include/asio/detail/impl/epoll_reactor.ipp#L548

From a user (app developer) point of view if EPOLL_CLOEXEC exists then epoll_create1 should exist for consistency, or there should be a simple compile-time way to check for it (checking for specific Android versions in user code is horrible, we should check for features not versions).

@DanAlbert
Copy link
Member

I think we need to do something about this before r15 goes to stable, but we're still talking this one over (been busy with I/O this week). Quick summary of our options:

  1. Add the inline that just returns ENOSYS.

I really don't like turning compile-time errors into run-time errors. Especially because in this case you could be running on a device where epoll_create1(EPOLL_CLOEXEC) doesn't work but syscall(__NR_epoll_create1, EPOLL_CLOEXEC) does, even though both compile.

  1. Add the inline that performs the syscall for you.

The scary part about this is that, as @alexcohn pointed out, the device's kernel might have used that syscall number for an extension already, so this might end up calling some bogus syscall rather than the one you want (or returning ENOSYS).

@jmgao pointed out that we could just skip calling syscall(__NR_epoll_create1, EPOLL_CLOEXEC) and instead just go straight to epoll_create() plus fcntl.

  1. Keep things as they currently are and send patches upstream.

While technically correct (the best kind of correct!), this isn't great. The number of people that have landed on bugs like this from just the beta is a pretty clear indication that we probably can't stand on the ground of "but that code is wrong".

I'd really like to take this option, but I think it will be an uphill battle to convince so many third party projects that the availability of syscalls says nothing about the availability of C library wrappers and vice versa.

  1. Hide EPOLL_CLOEXEC pre-21.

This choice actively restricts functionality. The existence of epoll_create1 (the function in libc) and the availability of the syscall by the same name are in no way linked. sycall(__NR_epoll_create1, EPOLL_CLOEXEC) almost certainly will work on some pre-21 devices (epoll_create1 was added to the kernel in 2.6.27). I dug up some device stats for one of Google's apps, and they have less than 400 users on 2.6.x kernels (that's all the granularity I have for kernels that old, so even some of those will have this syscall).

This might be the least bad choice in the short term. If nothing else, this is the solution which can change our minds on with the least impact (this might be a decision best left until we work out any other kinks in unified headers). This also preserves the behavior of the old headers, so at the very least this doesn't make anything worse.


I was completely against 4 when I started writing this, but I think I've convinced myself that it's what we should do for r15 even if we don't stick with it for future releases.

Thoughts?

@DanAlbert DanAlbert added this to the r15 milestone May 20, 2017
@alexcohn
Copy link

alexcohn commented May 20, 2017

I still don't fully understand the risk of hiding EPOLL_CLOEXEC. People who are smart enough to use sycall(__NR_epoll_create1, EPOLL_CLOEXEC) can set EPOLL_CLOEXEC=O_CLOEXEC themselves, can't they? And make sure that they take full responsibility for bypassing libc.

@DanAlbert
Copy link
Member

I suppose that's true. We all are pretty settled on option 4 for the time being anyway. Odds are we'll do that and no one will ever complain about this issue, so I'm overthinking something that no one actually cares about :)

@jmgao
Copy link
Contributor

jmgao commented May 20, 2017

I doubt there's any actual risk, but from an puritanical standpoint, hiding EPOLL_CLOEXEC and not hiding __NR_epoll_create1 is the same sort of problem as showing EPOLL_CLOEXEC and not having epoll_create1, except even worse, since __NR_epoll_create1 and EPOLL_CLOEXEC actually are a logical unit that comes from the kernel.

hubot pushed a commit to aosp-mirror/platform_bionic that referenced this issue May 23, 2017
Some third-party code uses the existence of EPOLL_CLOEXEC to detect
the availability of epoll_create1. This is not correct, since having
up-to-date UAPI headers says nothing about the C library, but for the
time being we don't want to harm adoption to the unified headers.
We'll undef EPOLL_CLOEXEC if we don't have epoll_create1 for the time
being, and maybe revisit this later.

Test: make checkbuild
Bug: android/ndk#302
Bug: android/ndk#394
Change-Id: I8ee9ce62768fb174070ec51d114f477389befc4a
hubot pushed a commit to aosp-mirror/platform_bionic that referenced this issue May 23, 2017
Some third-party code uses the existence of IN_CLOEXEC/IN_NONBLOCK to
detect the availability of inotify_init1. This is not correct, since
`syscall(__NR_inotify_init1, IN_CLOEXEC)` is still valid even if the C
library doesn't have that function, but for the time being we don't
want to harm adoption to the unified headers. We'll avoid defining
IN_CLOEXEC and IN_NONBLOCK if we don't have inotify_init1 for the time
being, and maybe revisit this later.

Test: make checkbuild
Bug: android/ndk#394
Change-Id: Iefdc1662b21045de886c7ad1cbeba6241163d943
miodragdinic pushed a commit to MIPS/ndk that referenced this issue Apr 17, 2018
We did actually change the unified headers to not expose constants before
the corresponding functions are available (at least in the epoll/inotify
cases that caused trouble, plus sync_file_range which we spotted in
passing).

Bug: android/ndk#302
Bug: android/ndk#394
Test: N/A
Change-Id: Ia46b56862475f68a8ae3fa3677532c828eec390c
tbowley added a commit to LogosDesignAS/oatpp that referenced this issue Jul 1, 2021
…d instead use the older epoll_create

See android/ndk#394 for more information
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants