Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nightlies since 2017-07-20 break when linking some libraries targeting Musl #43486

Closed
golddranks opened this issue Jul 26, 2017 · 19 comments
Closed
Assignees
Labels
A-linkage Area: linking into static, shared libraries and binaries C-bug Category: This is a bug. P-high High priority regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-cargo Relevant to the cargo team, which will review and decide on the PR/issue.

Comments

@golddranks
Copy link
Contributor

golddranks commented Jul 26, 2017

In my experience, this breaks builds that try to link the crate libpq-sys when targeting Musl. I haven't confirmed any other libraries breaking.

This is a stable-to-nightly regression. The exact breakage happens between nightlies 2017-07-19 and 2017-07-20. Here's a comparison between the commits that the two nightlies were built from: 83c659e...582af6e I haven't yet looked at the commits.

Here's a test case that runs in a docker container. See the scripts build_musl_breaks.sh and build_musl_works.sh: https://github.com/golddranks/diesel_musl_testcase

The containers use this docker image: https://github.com/golddranks/rust_musl_docker

@Mark-Simulacrum
Copy link
Member

My guess is #43170, though I seem to recall something about that being disabled on musl... @kyrias -- is this possibly caused by that PR?

@Mark-Simulacrum Mark-Simulacrum added A-linkage Area: linking into static, shared libraries and binaries C-bug Category: This is a bug. regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 26, 2017
@kyrias
Copy link
Contributor

kyrias commented Jul 26, 2017

Interesting. Do you have some build logs? Not currently at a computer.

Musl has had support for full RELRO for a few years, but it's certainly possible that it's related. Will look into it when I get to a computer.

@golddranks
Copy link
Contributor Author

https://gist.github.com/golddranks/45f156e611adbcea24574e0dddaccaef Here are the logs. On the previous nightly it links the same code without problems.

@golddranks
Copy link
Contributor Author

Hm, seems that the logs there don't show that the strlcpy that failed linking is used in libpq (and the strlcpy it attempts to use is likely to be the one declared here: https://github.com/postgres/postgres/blob/1d25779284fe1ba08ecd57e647292a9deb241376/src/port/strlcpy.c [strlcpy is an unofficial C extension that exists on some platfroms, and Postgres provides polyfills for its own code.]) some of the logs I had show that.

@Mark-Simulacrum
Copy link
Member

Nominating for a priority and assignee from the compiler team.

@golddranks
Copy link
Contributor Author

Btw. some of my comments may seem confused: here the problem seems to manifest when it's apparently trying to run/load the compiler plugin. I think I had earlier some code that refused to link and that showed a better error message than the testcase code. I'll see if I can find that example tomorrow.

@kyrias
Copy link
Contributor

kyrias commented Jul 28, 2017

I get the same when I set -Zrelro-level=off though, so it would be quite weird if it is the problem, hm. I can't easily build a musl environment at the moment though, so doing a git bisect would be problematic for me. Any chance you could try it?

@aidanhs
Copy link
Member

aidanhs commented Aug 4, 2017

(When target is musl, the compiler plugins are built for musl too, so we need dynamically-linked musl-libc for the plugins)

(from your dockerfile)

This is not true - compiler plugins need to be compatible with the compiler, which means they link against the system libc (which is glibc on debian). As a rule of thumb, you should basically never set LD_LIBRARY_PATH to a path filled with musl libraries on debian (it's also generally not wise to set CC=musl-gcc globally on a non-musl system, just selectively set it for the commands that you need it for).

So it's not really a surprise that diesel_codegen is failing to load - musl does provide strlcpy but glibc does not, so when the diesel_codegen is loaded by rustc, the statically linked postgres library within it is expecting to be linked against musl. But compiler plugins are dynamic libraries so the executable chooses the libc, and rustc uses glibc!

root@01ae1d1cce84:/workdir# nm /workdir/target/release/deps/libdiesel_codegen-5bbd71c5b82e9c69.so | grep strlcpy
                 U strlcpy
root@01ae1d1cce84:/workdir# nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep strlcpy
root@01ae1d1cce84:/workdir#

So wait, why did this 'work' previously? Well, this comment is helpful: "Full RELRO makes the dynamic linker resolve all symbols at startup". So, although I've not verified this, my strong suspicion is that strlcpy wasn't being used and so you got away with it because the symbol never needed resolving.

Why did I put 'work' in quotes above? Whether you can get away with this kind of thing is purely down to chance, and even if it compiles you can get pretty terrible subtle corruptions in data due to libc mismatches. It's good that you've hit this, because it means you now know there's a problem.

No perfect solution immediately springs to mind. Fundamentally you need two versions of libpq - one dynamically linked against glibc for the compiler plugin, one statically linked against musl for your actual binary. Unfortunately, I am not aware of any lever you can pull in pkg-config-rs that lets you set static library preferences per target (perhaps @alexcrichton has some thoughts). One thing I think you can do for the time being is to do something like LIB_PQ_DYNAMIC=true cargo build -p diesel_codegen && LIB_PQ_STATIC=true cargo build - the idea is to build up to and including diesel_codegen with a dynamic libpq, then do the last step static with the musl pkg-config (so you'll probably need to point to the different pkg-config directory for the second command).

@codyps
Copy link
Contributor

codyps commented Aug 4, 2017

Ah, the pkg-config-rs-can't cross issue. I've had a note about it in meta-rust for a while as it seemed likely we'd hit it there.

Likely solution is to allow 2 PKG_CONFIG_PATHs with the same override mechanism used for CC, etc. Probably a pkg-config-rs bug anyhow.

@arielb1 arielb1 added I-nominated T-cargo Relevant to the cargo team, which will review and decide on the PR/issue. and removed T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Aug 10, 2017
@michaelwoerister
Copy link
Member

Discussed in the compiler team meeting. @alexcrichton will kindly take a look at it

@arielb1 arielb1 added P-medium Medium priority P-high High priority and removed I-nominated P-medium Medium priority labels Aug 10, 2017
@alexcrichton
Copy link
Member

Ok looking into this, here's some thoughts.

  • Cargo handles cross-compiled projects by compiling requested crates for both the target and the host.
  • In the build script logs we can see, for example, the pq-sys crate is compiled twice. Once for musl, once not.
  • Consequently, the build script is also executed twice. The build script both times indicates that it's linking /musl/lib/libpq.a. This is invalid for the glibc case.
  • @aidanhs's analysis I think is a strong reason as to why this just happened to work before, and no longer works. This was, as far as I know, never intended to work.
  • I'm not sure where the discussion of pkg-config came into play here. The build script conditionally uses pkg-config and the build script was not compiled with these features. As a result I believe that pkg-config is disabled here.
  • Ok so if pkg-config isn't used where did these env vars come from? Taking a look at the build script and the docker container we see that PQ_LIB_DIR isn't defined, pkg-config is disabled, and vcpkg is disabled. When executing pg_config --libdir sure enough we get /musl/lib printed out.

So to me the bug can be boiled down to two things:

  • The build script for pq-sys isn't properly accounting for a cross-compiled scenario. The target that pg_config is compiled for and the TARGET are not taken into account, which can lead to a mismatch and problems at compile time.
  • This just-happened-to-work before because symbols weren't used at runtime, but as @aidanhs pointed out this is a ticking time bomb. This was going to stop working as soon as more functionality was used.

I would personally close this as "not a bug". The bug is in pq-sys's build script and changes in the compiler just happened to surface it.

For fixing pq-sys I'd recommend steps along the lines of:

  • If a cross-compiled scenario is detected scrape the output of pg_config. Verify that it matches TARGET. For example the docker container's pg_config contains:
CONFIGURE = '--prefix=/musl' '--host=x86_64-unknown-linux-musl' '--without-readline' '--without-zlib' 'host_alias=x86_64-unknown-linux-musl' 'CC=musl-gcc'
  • If a mismatch between TARGET and pg_config happens, search for env vars like PG_CONFIG_$target similarly to what gcc-rs does. If this is found use that.

  • If nothing else is found bail with an informative error message that the cross-compile environment is not configured correctly with a recommendation on how to configure it.

@golddranks
Copy link
Contributor Author

I think that @aidanhs' explanation makes a lot of sense. I think that I'll have some time during the weekend to tweak the image and test a bit.

If the compiler got just stricter about linking upfront, then it just surfaced a bug that was all along there.

@aidanhs
Copy link
Member

aidanhs commented Aug 11, 2017

I'm not sure where the discussion of pkg-config came into play here.

Oh, good catch. I think the note about pkg-config not supporting cross-compiles still applies, though isn't relevant to this issue.

@alexcrichton
Copy link
Member

Ok I'm going to close this in that case, but @golddranks if you need help updating please just let me know!

@golddranks
Copy link
Contributor Author

So, to fix the build environment, I have to install a glibc version of libpq too, and get the build script of the libpq-sys crate to use that when building the .so, right? @alexcrichton

If I enable the pkg-config support, is pkg-config automatically capable of choosing the right target to link? Or do I have to try and change the build script itself?

(Ping @sgrif, TL;DR: When cross-compiling diesel, libpq-sys is built twice, for both the host and the target. The build script of libpq-sys doesn't account for this, and selects always the libpq lib to link regardless of the current target, which breaks cross-compiling builds.)

@alexcrichton
Copy link
Member

@golddranks yes you'll have to have both, and the build script will need to be updated to ensure it finds the right one when configured for the appropriate target

@golddranks
Copy link
Contributor Author

golddranks commented Aug 22, 2017

@alexcrichton or @aidanhs do you happen to know about -fPIE, -fPIC and -pie flags? GCC on Debian Stretch produces position independent code by default, and while fixing the Docker image I use for building, I also updated the distro from Jessie to Stretch. After much of churn, and the the C libraries built without flags "-fPIE -pie" refusing to build or link because of relocations, I managed to build stuff by adding those flags, but the Rust binary produced against the libraries is dynamically loaded even though all the libraries are linked statically:

file target/x86_64-unknown-linux-musl/release/deps/serve-3fd8d9fb50953928
target/x86_64-unknown-linux-musl/release/deps/serve-3fd8d9fb50953928: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=0455b87dd1a33d0d91d05258adcfbe95e72c88df, not stripped

ldd target/x86_64-unknown-linux-musl/release/deps/serve-3fd8d9fb50953928
    statically linked

For now, I can just downgrade to Jessie if I want a fully static binary, but if you have any idea how to build static Rust binaries on Stretch, I'd be happy to hear that, since at some point, older versions of Debian will be old and unsupported.

@aidanhs
Copy link
Member

aidanhs commented Aug 22, 2017

Your binary is statically linked, but has a PT_INTERP (i.e. reference to the dynamic loader path). See #40113 (comment) and follow the rabbit hole of those links for more details.

As it happens I think #40113 will fix this (well, depending on what it looks like when it gets merged!), but in the meantime you should be able to create a file called mycc like

set -e
cc -static "$@"

and then use the -C linker=/path/to/mycc flag to rustc. If you're using cargo you could create a cargo config (https://users.rust-lang.org/t/how-to-pass-cargo-linker-args/3163) on the fly or use cargo rustc and pass it there.

@golddranks
Copy link
Contributor Author

golddranks commented Aug 24, 2017

#40113 landed and is in the newest nightly. I tried to build with the combination: nightly-2018-08-24 + Debian Stretch + native libraries compiled and linked with "-fPIE -pie" flags. This time, it doesn't link the binary: /usr/bin/ld: cannot find -lunwind.

Combination nightly-2018-08-24 + Debian Jessie without the "-fPIE -pie" similarly refuses to link, with the same error message. So the new nightly seems to be broken WRT static musl builds.

Like noted above, the combination nightly-2018-08-21 + Debian Stretch + "-fPIE -pie" flags did link, but produced a binary with PT_INTERP, while linking the libs statically. The only combination that succeeds to build and link fully statically is nightly-2018-08-21 + Debian Jessie without the "-fPIE -pie".

This thread is offtopic for this so I'll create a new issue. Edit. Here: #44069

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-linkage Area: linking into static, shared libraries and binaries C-bug Category: This is a bug. P-high High priority regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-cargo Relevant to the cargo team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants