Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

static binary crashing with NULL symbol #2054

Closed
clnperez opened this issue Feb 2, 2021 · 22 comments
Closed

static binary crashing with NULL symbol #2054

clnperez opened this issue Feb 2, 2021 · 22 comments
Labels
C-bug Category: bug

Comments

@clnperez
Copy link

clnperez commented Feb 2, 2021

target: ppc64le-unknown-linux-gnu & stable-x86_64-unknown-linux-gnu

a statically compiled kata agent binary crashes in what looks to be a dynamic-library-related function. The binaries were built using the latest libc so that we could pick up the fix for statically compiling on non-x86 architectures (#2046).

However, we are seeing this: dl-call-libc-early-init.c:37: _dl_call_libc_early_init: Assertion sym != NULL failed

The place @Amulyam24 has narrowed it down to is possibly in this openpty call: https://github.com/kata-containers/kata-containers/blob/main/src/agent/rustjail/src/container.rs#L881

I was once stuck on an issue with go that involved ptys, and in that case TCSETS and TCGETS values were wrong for ppc64le. See golang/go#19560. So I can't help but be suspicious, but the dl_ part of this makes me think it's probably not related.

Unfortunately, there's not a lot of info in the logs, and we haven't been able to find a small recreator. I'm opening this here in hopes that someone has some tips for getting more information.

kata-issue-1387-trace.out.txt

@clnperez clnperez added the C-bug Category: bug label Feb 2, 2021
@clnperez
Copy link
Author

clnperez commented Feb 4, 2021

fyi @fidencio

@fidencio
Copy link

fidencio commented Feb 4, 2021

/cc @Jakob-Naucke

@Jakob-Naucke
Copy link
Contributor

Hmm @clnperez. Just dumping a few thoughts:
Interestingly enough, openpty is also the call that I tracked #2033 (now fixed) down to.
That function _dl_call_libc_early_init uses some static addresses (mirror here), but I don't see why that would be an issue on its own. However, it is relatively new (glibc 2.32 I believe), is this limited to more recent glibc? It's a bit surprising to me since the osbuilder distros are all older than that (unless you upgraded?) and the container libc shouldn't affect this (or should it? From your logs, it appears to be before the pivot_root/execvp into the container.)

@clnperez
Copy link
Author

clnperez commented Feb 8, 2021

@tuliom FYI

Thanks @Jakob-Naucke. I am still using the older glibc in the default osbuilder distros. I think @Amulyam24 may have tried a newer distro. IIUC, the container libc wouldn't unless this was dynamically compiled either way -- but I could be wrong!

However, it is relatively new (glibc 2.32 I believe), is this limited to more recent glibc?

Can you clarify what you're referring to here when you say "it is relatively new?"

@Jakob-Naucke
Copy link
Contributor

@clnperez I meant: The file elf/dl-call-libc-early-init.c is in glibc 2.32, but not in glibc 2.31. Your error message refers to this file, but Kata's default osbuilder distros are older than that (CentOS 7 is on 2.17, Debian 9 is on 2.24, Fedora 30 is on 2.29, openSUSE 15 is on 2.26, Ubuntu 18.04 is on 2.30). I was wondering if this issue only occurred with osbuilder distros newer than that. Since you're saying you haven't upgraded: If you haven't already, would you mind trying a container image that uses glibc <2.32, such as ubuntu:latest (20.04)? Maybe it does have something to do with the container glibc. If the error message persists although both guest and container use glibc older than 2.32, I'd be really confused.

@Amulyam24
Copy link

Since you're saying you haven't upgraded: If you haven't already, would you mind trying a container image that uses glibc <2.32, such as ubuntu:latest (20.04)? Maybe it does have something to do with the container glibc.

@Jakob-Naucke, to confirm, the error persists irrespective of the glibc version used by the guest or the container(Tried with glibc<2.32 in combination of guest OS - Fedora 30 - glibc 2.29 + container - Ubuntu 18.04 - glibc 2.27).

@Jakob-Naucke
Copy link
Contributor

Ah, it probably stems from the build host glibc (specifically due to the static linkage). I don't have an idea about the underlying issue though.

@clnperez
Copy link
Author

clnperez commented Mar 19, 2021

Ok, so, it turns out we did have a small recreator for this (thanks @Amulyam24). I just hadn't run it on a different host with a downlevel gcc. I can recreate this on my laptop (fedora 33) and a ppc64le system, so it at least isn't a power-only problem.

So I compiled it locally (fc33 on x86), and copied it into a fc32 container to run.

> rustup show
Default host: x86_64-unknown-linux-gnu
rustup home:  /home/christy/.rustup

stable-x86_64-unknown-linux-gnu (default)
rustc 1.50.0 (cb75ad5db 2021-02-10)
use libc;
use std::{mem,ptr};
fn main() {
    let mut slave = mem::MaybeUninit::<libc::c_int>::uninit();
    let mut master = mem::MaybeUninit::<libc::c_int>::uninit();
    let p;
    unsafe {
       p = libc::openpty(
            master.as_mut_ptr(),
            slave.as_mut_ptr(),
            ptr::null_mut(),
            ptr::null_mut(),
            ptr::null_mut()
        );
    }
    println!("p:{}",p);
}

RUSTFLAGS="-C target-feature=+crt-static" cargo build

FROM fedora:32

RUN dnf install -y glibc
RUN curl https://sh.rustup.rs -sSf | sh -s -- -y
#RUN source $HOME/.cargo/env
ENV PATH="/root/.cargo/bin:$PATH"
RUN echo $PATH
# compiled with RUSTFLAGS="-C target-feature=+crt-static"
COPY openpty/target/debug/openpty .
RUN  ./openpty
> docker build -t openpty:fc32 .
Sending build context to Docker daemon  23.53MB
Step 1/7 : FROM fedora:32
 ---> eb7f88a194d8
Step 2/7 : RUN dnf install -y glibc
 ---> Using cache
 ---> 5ea930089860
Step 3/7 : RUN curl https://sh.rustup.rs -sSf | sh -s -- -y
 ---> Using cache
 ---> 151e71ed5a86
Step 4/7 : ENV PATH="/root/.cargo/bin:$PATH"
 ---> Using cache
 ---> 1a2d67e93c69
Step 5/7 : RUN echo $PATH
 ---> Using cache
 ---> bdd9afd9e135
Step 6/7 : COPY openpty/target/debug/openpty .
 ---> 91e228a9c6be
Step 7/7 : RUN  ./openpty
 ---> Running in 5506ca1edb92
openpty: dl-call-libc-early-init.c:37: _dl_call_libc_early_init: Assertion `sym != NULL' failed.

Just to make sure:

> ldd openpty/target/debug/openpty 
        not a dynamic executable

After talking with someone on our toolchain team, she suggested I look for anything calling _dl_open.

> nm -an openpty/target/debug/openpty  | grep dl_open$
0000000000493280 T _dl_open

I'm not sure how to track that down. Hopefully some of this is helpful for some more digging at least.

@Jakob-Naucke
Copy link
Contributor

It really seems this only happens with more recent glibc.
While I didn't have the time for an exact bisection, I can e.g. reproduce this on Ubuntu 20.10 (2.32) and an Ubuntu 18.04 container (2.27), but I cannot reproduce it on RHEL 8.3 (2.28) and even the most archaic containers like CentOS 6 (2.12). (I used up-to-date 1.51 from Rustup and libc 0.2.92 in both cases.)

@clnperez
Copy link
Author

@Jakob-Naucke -- I may be misunderstanding your last comment. I thought that you'd mostly root-caused this in this comment that it would only happen with glibc 2.32 and later if built using an earlier glibc. But I'm also surprised that you could reproduce it with 2.27 in the Ubuntu 18.04 container.

@Jakob-Naucke
Copy link
Contributor

if built using an earlier glibc

@clnperez No, I meant building it on a more recent glibc than running it all along. But my comment was very confusing 🙂

Poking around a little more, I found that this happens when building with glibc >= 2.32 and running on glibc <= 2.31. I think it's also noteworthy that when you build dynamically instead, you will get the error

/lib64/libc.so.6: version `GLIBC_2.32' not found

The error _dl_call_libc_early_init: Assertion sym != NULL failed is only observed when building statically. However, you can e.g. build on 2.31 and run on 2.28, so not every version mismatch is fatal.

@Jakob-Naucke
Copy link
Contributor

This is reproducible without Rust:

#include <stdio.h>
#include <stdlib.h>
#include <pty.h>

int main() {
	int* child = malloc(sizeof(int));
	int* parent = malloc(sizeof(int));
	int p = openpty(parent, child, 0, 0, 0);
	printf("p:%d\n", p);
	free(child);
	free(parent);
}

built with gcc openpty.c -lutil -static on a 2.32 system and run on 2.31:

a.out: dl-call-libc-early-init.c:37: _dl_call_libc_early_init: Assertion `sym != NULL' failed.
Aborted (core dumped)

@tuliom
Copy link

tuliom commented Apr 29, 2021

Thank you @Jakob-Naucke !
I reported this issue to glibc here: https://sourceware.org/bugzilla/show_bug.cgi?id=27790

I think it's also noteworthy that when you build dynamically instead, you will get the error
/lib64/libc.so.6: version `GLIBC_2.32' not found

Running a program dynamically linked against a newer glibc on a system with an older glibc is unsupported because the older glibc can't support all the features of the new one. IMHO, the real question is in the static case which does make usage of dynamic linking via dl_open().
I prefer to collect more details before giving you an answer.

@Jakob-Naucke
Copy link
Contributor

From @tuliom in Bugzilla:

Interestingly, I can't reproduce this issue when the binary is statically linked against glibc 2.33 and executed on 2.31.

@clnperez I can reproduce this with Rust too. Building the Rust example against 2.33 and linking statically, the code runs just fine. So I think this is an issue with glibc 2.32 and no version before or after.

@clnperez
Copy link
Author

Thanks guys. It's good to know it's not just rust.

And @Jakob-Naucke, I flipped the versions around in my comment . So that only added to the confusion here. :D It does seem to be only related to that one add in glibc 2.32.

@fweimer-rh
Copy link

Statically linked glibc binaries aren't very portable. In general, you need to run them on a system with exactly the same glibc version. (Dynamically linked binaries can run on newer glibc versions, too.)
In glibc 2.33, the behavior of openpty changed due to the commit, Linux: Require properly configured /dev/pts for PTYs, so openpty is safer to use in statically linked programs.

@clnperez
Copy link
Author

@fweimer-rh -- That seems to indicate that there shouldn't be static agent binaries for kata at all then? /cc @fidencio

@fweimer-rh
Copy link

fweimer-rh commented Apr 30, 2021

@clnperez I don't know your exact requirements. If you need static binaries for isolation from the host environment, you need to stick to a certain subset if using glibc. It so happens that openpty is in that subset only starting with glibc 2.33. There is no good way to check whether an application sticks to the subset, as every static link currently pulls in the dynamic loader (so its presence in the static binary does not tell you anything about compatibility). We're making slow progress towards improved static linking, but other things have higher priority for upstream work.

Using one of the smaller libcs instead might be a better alternative for you. The other alternative would be to inject the application along with its own dynamically-linked glibc.

@clnperez
Copy link
Author

clnperez commented May 3, 2021

Thanks @fweimer-rh Sounds like we should close this as a known limitation then? Also happy to keep it open and test as progress towards improved static linking is made.

@belloyang
Copy link

building it against musl libc should solve the issue
ref: https://www.graalvm.org/22.0/reference-manual/native-image/StaticImages/#preparation

@polarathene
Copy link

This issue can be closed?

This was a helpful reference for me to better understand when static linking to glibc breaks beyond the more commonly cited examples.

@tgross35
Copy link
Contributor

Thanks for the updates, I'll close based on the above. If there is more to figure out here, feel free to create a discussion, or reopen/create a new issue if our libc needs to do something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: bug
Projects
None yet
Development

No branches or pull requests

9 participants