Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate tiny compiled binary for wrapping executables #124556

Merged
merged 49 commits into from
Dec 9, 2021

Conversation

bergkvist
Copy link
Member

@bergkvist bergkvist commented May 26, 2021

Generate tiny compiled binary for wrapping executables

Try out the latest version
env NIX_PATH=nixpkgs=https://github.com/bergkvist/nixpkgs/archive/bergkvist/make-c-wrapper.tar.gz \
nix-shell -p makeBinaryWrapper --run "makeWrapper [EXECUTABLE] [OUT_PATH] [ARGS]"
Supported ARGS:
  • --argv0 NAME: set name of executed process to NAME (defaults to EXECUTABLE)
  • --inherit-argv0: the executable inherits argv0 from the wrapper. (use instead of --argv0 '$0')
  • --set VAR VAL: add VAR with value VAL to the executable’s environment
  • --set-default VAR VAL: like --set, but only adds VAR if not already set in the environment
  • --unset VAR: remove VAR from the environment
  • --chdir DIR: change working directory (use instead of --run "cd DIR")
  • --add-flags FLAGS: add FLAGS to invocation of executable
  • --prefix ENV SEP VAL: prefix ENV with VAL, separated by SEP
  • --suffix ENV SEP VAL: suffix ENV with VAL, separated by SEP
Explore generated code

If you are curious about the generated code, and not interested in actually creating a wrapper, you can use the following, which will simply print out the C-code:

env NIX_PATH=nixpkgs=https://github.com/bergkvist/nixpkgs/archive/bergkvist/make-c-wrapper.tar.gz \
nix-shell -p makeBinaryWrapper --run "makeCWrapper [EXECUTABLE] [ARGS]"
Motivation for this change

Nix uses bash-wrappers to inject custom environment variables into executables. In the cases where this executable is an interpreter (like Python or Perl), it can be placed in the shebang line of a script. On MacOS, you can't put a script (with its own shebang) in the shebang line of an executable due to a limitation with the execve-syscall.

If we had some way of generating tiny compiled binaries instead of bash-scripts as wrappers then this would work on both Linux and MacOS.

See #23018 for more discussion.

A focus in this PR has been to minimize the number of dependencies - and also keep the implementation itself as minimal as possible.

Dependencies
  • bash
  • awk
  • A C-compiler: gcc on Linux or clang on MacOS
  • unistd + stdlib + assert + stdio (C POSIX libraries)
  • asprintf (GNU extension for stdio which is available on macOS/OpenBSD 2.3+)

Consider the following wrapper shell script:

#! /nix/store/ra8yvijdfjcs5f66b99gdjn86gparrbz-bash-4.4-p23/bin/bash -e
export NIX_PYTHONPREFIX='/nix/store/i46k148mi830riq4wxh49ki8qmq0731k-python3-3.9.2-env'
export NIX_PYTHONEXECUTABLE='/nix/store/i46k148mi830riq4wxh49ki8qmq0731k-python3-3.9.2-env/bin/python3.9'
export NIX_PYTHONPATH='/nix/store/i46k148mi830riq4wxh49ki8qmq0731k-python3-3.9.2-env/lib/python3.9/site-packages'
export PYTHONNOUSERSITE='true'
exec "/nix/store/7pjbbmnrch7frgyp7gz19ay0z1173c7y-python3-3.9.2/bin/python"  "$@"

Putting this script in a shebang works fine on Linux, but doesn't work on MacOS. If we want to write C-code that replaces it (and works on MacOS+Linux), we can do it like this in C:

#include <unistd.h>
#include <stdlib.h>

int main(int argc, char** argv) {
    putenv("NIX_PYTHONPREFIX=/nix/store/i46k148mi830riq4wxh49ki8qmq0731k-python3-3.9.2-env");
    putenv("NIX_PYTHONEXECUTABLE=/nix/store/i46k148mi830riq4wxh49ki8qmq0731k-python3-3.9.2-env/bin/python3.9");
    putenv("NIX_PYTHONPATH=/nix/store/i46k148mi830riq4wxh49ki8qmq0731k-python3-3.9.2-env/lib/python3.9/site-packages");
    putenv("PYTHONNOUSERSITE=true");
    argv[0] = "/nix/store/7pjbbmnrch7frgyp7gz19ay0z1173c7y-python3-3.9.2/bin/python";
    return execv(argv[0], argv);
}

This PR has creates a simple bash function that generates C-code for such a tiny compiled binary (and compiles it), with an interface similar to the existing makeWrapper. There are some features of the original makeWrapper which is not yet implemented here.

Debuggability

A big concern with using a binary wrapper is that people can't just open up the file to see what it is doing when debugging their own problems. This is fixed by embedding the arguments used to create the wrapper as a string variable into the source code itself. The result is that the binary file will contain the wrapper arguments in human readable format when opening the file in a plain text editor or using the strings command on MacOS or Linux.

Example of how it looks right now:

makeBinaryWrapper /usr/bin/python3 ./wrapper \
  --set HELLO WORLD --set-default X $'Y\n"' --unset Z --argv0 python3

cat ./wrapper
...binary-data...
# ------------------------------------------------------------------------------------
# The C-code for this binary wrapper has been generated using the following command:


makeCWrapper /usr/bin/python3 \
    --set 'HELLO' 'WORLD' \
    --set-default 'X' $'Y\n"' \
    --unset 'Z' \
    --argv0 'python3'


# (Use `nix-shell -p makeBinaryWrapper` to get access to makeCWrapper in your shell)
# ------------------------------------------------------------------------------------
...binary-data...

C String literals in the generated code (including the documentation) are properly escaped. I got some help with how to do this properly on StackOverflow: https://stackoverflow.com/questions/67710149/how-can-i-sanitize-user-input-into-valid-c-string-literals.

Generated source code example

makeDocumentedCWrapper /usr/bin/python3 --set HELLO WORLD --set-default X $'Y\n"' --unset Z --argv0 python3
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

#define assert_success(e) do { if ((e) < 0) { perror(#e); abort(); } } while (0)

int main(int argc, char **argv) {
    putenv("HELLO=WORLD");
    assert_success(setenv("X", "Y\n\"", 0));
    assert_success(unsetenv("Z"));
    argv[0] = "python3";
    return execv("/usr/bin/python3", argv);
}

const char * DOCSTRING = "\n\n\n# ------------------------------------------------------------------------------------\n# The C-code for this binary wrapper has been generated using the following command:\n\n\nmakeCWrapper /usr/bin/python3 \\\n    --set 'HELLO' 'WORLD' \\\n    --set-default 'X' $'Y\\n\"' \\\n    --unset 'Z' \\\n    --argv0 'python3'\n\n\n# (Use `nix-shell -p makeBinaryWrapper` to get access to makeCWrapper in your shell)\n# ------------------------------------------------------------------------------------\n\n\n";
Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Added a release notes entry if the change is major or breaking
  • Fits CONTRIBUTING.md.

Issues that this PR would likely impact:

@bergkvist bergkvist requested a review from Ericson2314 as a code owner May 26, 2021 23:50
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels May 27, 2021
@xaverdh
Copy link
Contributor

xaverdh commented May 28, 2021

see #95569 for prior art

bergkvist added 2 commits June 1, 2021 01:02
… style of other setup-hooks. Add compilation step with gcc. Embed the entire generated source code into the binary for troubleshooting.
@hrhino
Copy link
Contributor

hrhino commented Aug 22, 2021

I love this! It took a moment of fiddling about but I used the default makeWrapper, this PR's makeBinaryWrapper, and makeBinaryWrapper from #95569, and got this:

-r-xr-xr-x 1 root root  80128 Dec 31  1969 binary-nim
-r-xr-xr-x 1 root root    291 Dec 31  1969 current
-rwxr-xr-x 1 haro users 14232 Dec 31  1969 binary-c

so while this is significantly heavier than the shell-script version (unsurprisingly — the debug string literal is going to be around the size of the script) it's significantly smaller than the one that depends on nim and its json libraries and such, so I think it's seriously worth considering this over that (the operation of this executable is also more straightforward IMO).

On the other hand, that one has much more features. Looking at the current uses of makeWrapper, I think you'll have to add --add-flags, --prefix, and --suffix at least to support most use cases.

@bergkvist
Copy link
Member Author

bergkvist commented Aug 23, 2021

@hrhino Thanks for the feedback!
I've added --add-flags, --prefix and --suffix now, without impacting the resulting binary size too much. They were slightly more involved than the previous ones - but hopefully the generated code should still be relatively easy enough to read/understand.

I turned on -Os (optimize for small size) to keep the binary size small as well.

Although the compiler will automatically throw away functions that are not being used - the debug string won't - so in order to keep the debug string clean - I only include the C-functions used by --prefix and --suffix if they are actually used.

…al to improve purity of makeCWrapper. Refactor
Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks promising.

Could you add tests?

  • some examples to test the behavior
  • some means of ensuring memory safety
    • running the examples in valgrind; might not catch everything
    • running a static analysis on the generated code for the examples (don't know which tool to use. Any suggestions?)
    • adding some golden tests, comparing the generated code to known-correct generated code. This way memory safety of the generated code can be checked by hand, at least for some inputs

You can add a derivation that runs these tests in the same directory as the implementation and callPackage it from pkgs/test/default.nix.

Copy link
Member

@SuperSandro2000 SuperSandro2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider using echo instead of printf with a new line at the end.

pkgs/build-support/setup-hooks/make-binary-wrapper.sh Outdated Show resolved Hide resolved
pkgs/build-support/setup-hooks/make-binary-wrapper.sh Outdated Show resolved Hide resolved
@bergkvist
Copy link
Member Author

bergkvist commented Oct 1, 2021

In terms of memory leak risks

As long as the wrapper is executed as its own process, the OS will take care of freeing any remaining memory when the process exits. So if there are memory leaks in the wrapper, it will simply add an additional constant overhead to the process that has been wrapped while it is running.

It is not possible to free things like the argv that was passed to main, even if the child process does not use it - so some constant memory overhead from wrapping seems unavoidable.

We can only safely free memory that is not used by the wrapped program. That essentially means only "temporary" variables can be freed. From the code I've written so far, the only temporary variable that is not consumed by the wrapped process is the result of char *concat3(char *x, char *y, char *z);.

The reason val can be freed after calling setenv(key, val, 1); is that setenv creates its own internal copy of val. Since we are not allowed to free environment variables passed from the parent process, we must make sure to only free val if it is the result of char *concat3(char *x, char *y, char *z);

In contrast to setenv, putenv does not create its own internal copy - which might mean it has slightly less memory overhead. This is why I've used putenv("key=value") instead of setenv(key, value, 1) in the setEnv() {}-bash function.

@roberth
Copy link
Member

roberth commented Oct 1, 2021

I'm not concerned about memory leaks either, but I am concerned about undefined behavior now and in the future.

Seemingly innocent mistakes can lead to security problems, amplified by the fact that such a simple core component would be assumed to be trustworthy by many.

Even just the "golden test" idea goes a long way when it comes to this responsibility, because it will let maintainers check the generated programs manually and help us check them again when the code generation inevitably needs to change.

… Switch to strncpy in concat3. Use multiline strings to print C functions. Switch from if/elif to case.
@doronbehar
Copy link
Contributor

This looks so great :) A few comments, first of all:

diff --git i/pkgs/top-level/all-packages.nix w/pkgs/top-level/all-packages.nix
index c53485f5dea..30b38909952 100644
--- i/pkgs/top-level/all-packages.nix
+++ w/pkgs/top-level/all-packages.nix
@@ -642,6 +642,8 @@ with pkgs;
   makeWrapper = makeSetupHook { deps = [ dieHook ]; substitutions = { shell = targetPackages.runtimeShell; }; }
                               ../build-support/setup-hooks/make-wrapper.sh;
 
+  makeBinaryWrapper = makeSetupHook { } ../build-support/setup-hooks/make-binary-wrapper.sh;
+
   makeModulesClosure = { kernel, firmware, rootModules, allowMissing ? false }:
     callPackage ../build-support/kernel/modules-closure.nix {
       inherit kernel firmware rootModules allowMissing;

And then it's easy to test this with:

env NIX_PATH=nixpkgs=$PWD nix-shell -p makeBinaryWrapper --run makeBinaryWrapper <args>

Or even without the need to clone and checkout your branch:

env NIX_PATH=nixpkgs=https://github.com/bergkvist/nixpkgs/archive/bergkvist/make-c-wrapper.tar.gz nix-shell -p makeBinaryWrapper --run makeBinaryWrapper <args>

It'd be nice to put that info in the first comment of the thread :).

Other then that, When giving a try to --prefix, I forgot to give 3 arguments (forgot the separator) and it took me a while to realize it failed silently to wrap properly the executable - it'd be nice if the bash function would have checked it received a proper number of arguments for each option.

@ofborg ofborg bot added 8.has: package (new) This PR adds a new package 10.rebuild-linux: 1-10 10.rebuild-linux: 1 and removed 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Oct 1, 2021
@bergkvist
Copy link
Member Author

I've created a new PR to fix the Darwin issues we discussed now: #150079

makeSetupHook { deps = [ dieHook ]; } script;
in
lib.makeOverridable f {
cc = stdenv.cc.cc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails because stdenv.cc.cc is unwrapped and has no /bin/cc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is addressed by #150079, but note that it is also a problem on Linux

@roberth
Copy link
Member

roberth commented Dec 11, 2021

for the case of "i want fast wrappers" you can simply build a system with the secure ones, see that it works and then rebuild without sanitizers. disabling this security option should most likely be a conscious choice.

Alternatively, you could prove correctness using a sound static analyzer. Also, perhaps the functions could form a memory-safe interface, such that the generated code is as simple as possible. That would make it feasible to have a high assurance without performance impact; without even having to run the analyzer as a build dependency in practice (because the functions have already been checked and that check has been cached).

@bergkvist
Copy link
Member Author

Yeah, static analyzers are an attractive alternative to using sanitizers - since they move the checking from run-time to build time. This means problems are also detected earlier.

A static analyzer can never prove safety in the general case, since this is an undecidable/uncomputable problem. So I guess sanitizers might provide a somewhat stronger protection against undefined behaviour (although at a high cost to performance).

It seems like it is common to combine multiple static analyzers - since they tend to specialize in different areas/detect different kinds of problems. When I tested infer static analyzer and the clang static analyzer on the binary wrapper code - there was for the most part no overlap in the problems they would detect.

infer seemed a lot better than clang at detecting buffer overruns and potential NULL dereferencing for example:

@roberth
Copy link
Member

roberth commented Dec 11, 2021

A static analyzer can never prove safety in the general case

I don't think we need the general case.

sanitizers might provide a somewhat stronger protection against undefined behaviour

Only if the analysis is unsound. Undecidability does not exclude soundness and our programs are very simple here, so checking with an analyzer that is sound (but incomplete, as per undecidability) seems feasible.

CodeHawk-C is sound but I don't have experience with it and it hasn't been packaged yet. Maybe other analyzers exist that have the soundness property.

@bergkvist
Copy link
Member Author

bergkvist commented Dec 11, 2021

Yes, this is a good point. By having an analyzer that marks everything as potentially unsafe - it would be trivially sound (although not particularly useful). Rather than adding suspicious cases, a sound analyzer would remove things that are provably safe from the pool of potential problems (where everything starts out as potential problems). Undecidability just prevents it from removing everything that is safe - meaning it will tag some safe things as potentially unsafe.

@bergkvist
Copy link
Member Author

bergkvist commented Dec 11, 2021

It seems like the compilation/tests of makeBinaryWrapper fails on aarch64-darwin, due to some codesign issues. Anyone here with an M1 macbook/that know what this might be about? See the OfBorg logs and #150079 for more info.

To run the tests locally:

$ git clone https://github.com/bergkvist/nixpkgs bergkvist-nixpkgs
$ cd bergkvist-nixpkgs
$ git checkout darwin-binary-wrapper-fixes
$ nix-build -A makeBinaryWrapper.passthru.tests

@therealpxc
Copy link
Contributor

Very cool to see this make it into Nixpkgs in some form. Thanks for getting this first version in and for continuing to work to improve this feature! Also thanks to all the reviewers, since this seems like one of the harder changes to review in Nixpkgs and has gotten a lot of work from reviewers. You guys are awesome!

@tobiasBora
Copy link
Contributor

tobiasBora commented Dec 16, 2021

First, thanks for this PR. But since it's time to change the way we handle wrappers, I created a new issue to mention another problem that can be problematic with wrappers (namely that $0 points in scripts to the wrapping program instead of the original program, which causes troubles if $0 is used to restart the program later), together with a proposed solution to solve it #150841.

Note that as it, this PR does not solve it (nor does #150079). Would be great if both issues could be solved at once.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nix-macos-monthly/12330/14

@abathur
Copy link
Member

abathur commented Dec 17, 2021

I haven't had a chance to poke at these wrappers yet, but this seems like a fair place to register that I may need a programmatic way to associate them with their wrapped executables.

I noted this tentatively back in NixOS/rfcs#75, and at the time I was imagining a metadata dotfile alongside the wrapper. I didn't realize there was work under-way here, and the feature I needed this for has been in nixpkgs for a few months. Luckily I noticed toonn mention this on discourse. 😊

Basically:

  1. resholve uses binlore to collect some information about executables.
  2. part of this analysis maps Shell wrappers to wrapped executables by parsing the exec out of the Shell wrapper in order to associate information derived from the executable with the wrapper as well.

@Janik-Haag Janik-Haag added the 12. first-time contribution This PR is the author's first one; please be gentle! label Jun 12, 2023
boltzmannrain added a commit to boltzmannrain/nixpkgs that referenced this pull request Nov 20, 2023
https://hydra.nixos.org/build/240805256/nixlog/1
https://hydra.nixos.org/build/240805170/nixlog/2
Failure is a bit obscured but long story short, a script within
bazel gets custom nixpkgs shebang which in turn makes shell run
in POSIX-compatible mode. Bazel expects bash in non-POSIX mode
and osx-specific script starts to fail due to `set -e` and subshell
interaction differences in those modes (sub-shells and functions
suddently start inheriting `set -e` and fail to produce desired
output). More debug info is available in NixOS#267670

Shell scripts aren't guaranteed to work as interpreters in shebang.
In particular thin shell wrappers aren't shebang-ready on MacOS.
It may work sometimes depending on what exactly would try to execute
a script with such shebang, but generally it's not guaranteed to work.
See NixOS#124556

Bash wrapper was introduced in NixOS#266847 and so far seems like the
issue only affects darwin builds: hydra failure is in osx-specific
script, also shebang issue is usually darwin-specific.

Let's wrap it as a native binary to make it shebang-compatible.

The wrapper is only currently added to `bazel_6` so no need for
changes in other versions.

ZHF: NixOS#265948
github-actions bot pushed a commit that referenced this pull request Dec 7, 2023
https://hydra.nixos.org/build/240805256/nixlog/1
https://hydra.nixos.org/build/240805170/nixlog/2
Failure is a bit obscured but long story short, a script within
bazel gets custom nixpkgs shebang which in turn makes shell run
in POSIX-compatible mode. Bazel expects bash in non-POSIX mode
and osx-specific script starts to fail due to `set -e` and subshell
interaction differences in those modes (sub-shells and functions
suddently start inheriting `set -e` and fail to produce desired
output). More debug info is available in #267670

Shell scripts aren't guaranteed to work as interpreters in shebang.
In particular thin shell wrappers aren't shebang-ready on MacOS.
It may work sometimes depending on what exactly would try to execute
a script with such shebang, but generally it's not guaranteed to work.
See #124556

Bash wrapper was introduced in #266847 and so far seems like the
issue only affects darwin builds: hydra failure is in osx-specific
script, also shebang issue is usually darwin-specific.

Let's wrap it as a native binary to make it shebang-compatible.

The wrapper is only currently added to `bazel_6` so no need for
changes in other versions.

ZHF: #265948
(cherry picked from commit 7377bba)
bjornfor pushed a commit that referenced this pull request Dec 7, 2023
https://hydra.nixos.org/build/240805256/nixlog/1
https://hydra.nixos.org/build/240805170/nixlog/2
Failure is a bit obscured but long story short, a script within
bazel gets custom nixpkgs shebang which in turn makes shell run
in POSIX-compatible mode. Bazel expects bash in non-POSIX mode
and osx-specific script starts to fail due to `set -e` and subshell
interaction differences in those modes (sub-shells and functions
suddently start inheriting `set -e` and fail to produce desired
output). More debug info is available in #267670

Shell scripts aren't guaranteed to work as interpreters in shebang.
In particular thin shell wrappers aren't shebang-ready on MacOS.
It may work sometimes depending on what exactly would try to execute
a script with such shebang, but generally it's not guaranteed to work.
See #124556

Bash wrapper was introduced in #266847 and so far seems like the
issue only affects darwin builds: hydra failure is in osx-specific
script, also shebang issue is usually darwin-specific.

Let's wrap it as a native binary to make it shebang-compatible.

The wrapper is only currently added to `bazel_6` so no need for
changes in other versions.

ZHF: #265948
(cherry picked from commit 7377bba)
@Artturin Artturin mentioned this pull request Sep 14, 2024
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.has: documentation This PR adds or changes documentation 8.has: package (new) This PR adds a new package 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 12. first-time contribution This PR is the author's first one; please be gentle!
Projects
None yet
Development

Successfully merging this pull request may close these issues.