Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a bazel-wrapper #226049

Open
aaronmondal opened this issue Apr 13, 2023 · 9 comments
Open

Introduce a bazel-wrapper #226049

aaronmondal opened this issue Apr 13, 2023 · 9 comments

Comments

@aaronmondal
Copy link
Contributor

Issue description

We already have pkgs.buildBazelPackage to build bazel packages. This can often simplify building a package, but it's a rather high-level wrapper and doesn't account for situations where we would want to build toolchains around Bazel. This wrapper can also get tricky to use for packages with complex setups like Python/GPU packages where downstream projects tend to use custom rules_cc setups to get CUDA, ROCm etc working. buildBazelPackage also seems to be rather incompatible with remote execution.

The upcoming Bazel 7 will deprecate the WORKSPACE setups that most users are still used to. It will be superceded by bzlmod, a new package management system that rather similar to Nix Flakes.

All of this makes me think that we are lacking a lower-level wrapper for Bazel, similar to the cc-wrapper, that provides lower-level control over the toolchains and environments that are passed to Bazel invocations.

Technical details

Some notes:

Here is what Bazel needs to run:

  • A C++ toolchain. A C toolchain is not enough, except if you statically link libc++ into its subtools.
  • A Java runtime.
  • findutils, gnutar and coreutils.

So essentially we should be more or less fine with a C++ stdenv and Java.

In multilayered container images Bazel also needs:

  • A user, e.g. pkgs.fakeNss
  • A temporary directory (mkdir -m 0777 tmp)
  • A certificate authority, e.g. pkgs.cacert

Related projects

Our use case at @eomii and for users of rules_ll is cache interoperability between remote execution docker images built with nix and local nix development environments. We can't use regular RBE images because they are too slow moving to reliably build upstream LLVM. Standard RBE images are also irreproducible unnecessarily bloated.

We are building the LLVM project and the ROCm/HIP stack many times per day and any build time improvements are worth it for us. So far, experiments have shown us something more along the lines of "infinity" improvements due to essentially completely skipping builds.

I believe this could also be interesting for tweag and users of rules_nixpkgs. Maybe @aherrmann, can comment on this. I've seen that rules_nixpkgs_cc_toolchain already interoperates with the nix stdenv. Do you think a bazel-wrapper could yield any benefits? (Ah and as side note - does your cc toolchain work with RBE? In our experiments we've generated cc toolchain configs with the bazel-toolchains tools so far, but I suspect that your implementation might actually fit our usecase better.)


If it turns out that a bazel-wrapper is actually a useful idea, I'd be happy to implement it. @jaroeichler might also be interested in helping out with an implementation.

cc @rrbutani @SomeoneSerge

@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented Apr 13, 2023

buildBazelPackage also seems to be rather incompatible with remote execution

Are you referring to accommodating remote builds (if I understood #225074 (comment) correctly, "execution" is the Bazel term for a build) using Bazel, rather than Nix? Remote builds sounds like something you're better off accomplishing using Nix than Bazel, does it not?

It will be superceded by bzlmod, a new package management system that rather similar to Nix Flakes.

That sounds interesting, I haven't heard of it! Do you know what is going to be the situation with substituting Nix-managed dependencies in place of ones pinned by Bazel with the introduction of bzlmod? There probably are different opinions, but I think that ideally we'd want Bazel respect the "dependency injection" workflow similar to that of CMake

bazel-wrapper

I think I understand from "Technical Details" what is needed to prepare a container for Bazel to run in.
What I fail to grasp is what the bazel-wrapper for Bazel to be run within Nix builds should do?

I may have misunderstood your entire post. If you begin to feel like I did, feel free to reply as if I were 5

EDIT: I'm now looking up what RBE is and looking back at tensorflow build times 🙈

@aaronmondal
Copy link
Contributor Author

aaronmondal commented Apr 13, 2023

@SomeoneSerge Upon rereading my post I realized that I was rather imprecise, apoligies 😅

Are you referring to accommodating remote builds (if I understood #225074 (comment) correctly, "execution" is the Bazel term for a build) using Bazel, rather than Nix?

Yes. Bazel. Remote execution in bazel is when you invoke Bazel locally, but use a remote server to run the actual build. E.g. you have a laptop but want to use a an 80 core machine in the cloud to speed up the build. Such setups look roughly like this:

image

Note that noone runs any build locally since the remote executors need to be exactly the same to be able to reuse the shared build cache.

But nix is already reproducible. It is so reproducible, that it is possible to recreate the build environment of the remote executor locally. This makes setups like these possible:

image

This is an extreme example and has some security implications, but it essentially means that you could build your project locally once, push the artifacts to the remote cache and then have someone else reuse that same cache without the need for intermediary remote executors.

Bazel has a new, highly experimental feature called build without bytes where you wouldn't even need to download the entire artifact cache anymore but just the leaves of the subgraph you want to rebuild.

All of this is a bit difficult to get working because every tool involved in the build process needs to be exactly reproducible. Not just the compiler, also things like archivers, the java version that Bazel runs on etc. EVERYTHING 😂

That sounds interesting, I haven't heard of it! Do you know what is going to be the situation with substituting Nix-managed dependencies in place of ones pinned by Bazel with the introduction of bzlmod? There probably are different opinions, but I think that ideally we'd want Bazel respect the "dependency injection" workflow similar to that of CMake

I'm not entirely sure, but I think this could make it easier to get at least some hashes easier into nix. The bazel central registry already stores repos and hashes in a way that are fairly similar to those json files currently used in nix. For instance, this: https://github.com/bazelbuild/bazel-central-registry/blob/main/modules/fmt/9.1.0/source.json

I could imagine that crawling bazel registries is easier to maintain than manually keeping up with those hashes.

What I fail to grasp is what the bazel-wrapper for Bazel to be run within Nix builds should do?

I'm thinking something rougly like this:

bazel = wrapBazelWith {
    bazel = pkgs.bazel; # Or some custom built bazel
    ccToolchain = pkgs.stdenv;  # or cudaPackages.stdenv, or llvmPackages.stdenv
    javaToolchain = pkgs.somejavasetup;
    ...
    };
  };

Then this Bazel could be passed to e.g. pkgs.dockerTools.buildLayeredImage.contents and a flake's devShells.default.packages. It also might simplify pkgs.buildBazelPackage by decoupling the environment and build process.

Something like this is already somewhat possible by e.g overriding the pkgs.bazel package. But then we can't reuse the already cached default pkgs.bazel. pkgs.buildBazelPackage already implements environment configuration for the Bazel build but doesn't decouple the Bazel build environment from the build invocation. (I might just be using those tools wrong though, so I'm also not sure whether a bazel-wrapper is actually a good idea 😅).

EDIT: We already have the RBE setup without remote executors seemingly working, but its very fragile and it'll take some time and further testing until I can push this to GitHub.

@aaronmondal
Copy link
Contributor Author

aaronmondal commented Apr 14, 2023

WIP commit that implements the setup I described: eomii/rules_ll#83

Essentially this builds a container image and uses the rbe_configs_gen tool to generate remote execution compatible toolchains. It does this in a way that where every tool is mapped to /nix/store paths which are the same in both the container and in the local nix installation.

Switching back and forth between a regular build and --config=rbe_local which uses the container as remote executor appears to fully reuse the bazel buildcache.

This uses a currently very unelegant pseudo-bazel-wrapper to aggregate the toolchains and Bazel. This could be made more flexible to be compatible with arbitrary toolchain configs/stdenvs.

@uri-canva
Copy link
Contributor

Is this related to this idea? Having something like bazelWith to configure bazel with specific toolchains? #185742 (comment)

@uri-canva
Copy link
Contributor

Also relevant: tweag/rules_nixpkgs#180

@Artturin
Copy link
Member

Artturin commented Sep 12, 2023

Tried to get tensorflow-lite to cross-compile for someone in the matrix cross channel, but couldn't and i'm not interested in bazel so I'm dumping this here.

diff --git a/pkgs/development/libraries/science/math/tensorflow-lite/default.nix b/pkgs/development/libraries/science/math/tensorflow-lite/default.nix
index 1ac08ce0cd2f..7f626827728f 100644
--- a/pkgs/development/libraries/science/math/tensorflow-lite/default.nix
+++ b/pkgs/development/libraries/science/math/tensorflow-lite/default.nix
@@ -12,10 +12,10 @@ let
   bazelDepsSha256ByBuildAndHost = {
     x86_64-linux = {
       x86_64-linux = "sha256-61qmnAB80syYhURWYJOiOnoGOtNa1pPkxfznrFScPAo=";
-      aarch64-linux = "sha256-sOIYpp98wJRz3RGvPasyNEJ05W29913Lsm+oi/aq/Ag=";
+      aarch64-linux = "sha256-WVOMYvwm6yHl3T4gS/7YWaN0CC9m1ayr3zIBQyaX6b8=";
     };
     aarch64-linux = {
-      aarch64-linux = "sha256-MJU4y9Dt9xJWKgw7iKW+9Ur856rMIHeFD5u05s+Q7rQ=";
+      aarch64-linux = "sha256-WVOMYvwm6yHl3T4gS/7YWaN0CC9m1ayr3zIBQyaX6b8=";
     };
   };
   bazelHostConfigName.aarch64-linux = "elinux_aarch64";
@@ -84,6 +84,11 @@ buildBazelPackage rec {
 
   postPatch = ''
     rm .bazelversion
+
+    substituteInPlace tensorflow/tools/toolchains/embedded/arm-linux/cc_config.bzl.tpl \
+      --replace '%{AARCH64_COMPILER_PATH}%/lib/gcc/aarch64-none-linux-gnu/11.3.1/include' "${lib.getDev stdenv.cc.libc}/include" \
+      --replace '%{AARCH64_COMPILER_PATH}%/aarch64-none-linux-gnu/include/c++/11.3.1/' "${lib.getDev stdenv.cc.libc}/include" \
+      --replace '%{AARCH64_COMPILER_PATH}%/bin/aarch64-none-linux-gnu-' "${stdenv.cc}/bin/${stdenv.cc.targetPrefix}"
   '';
 
   preConfigure = ''

pkgsCross.aarch64-multiplatform.tensorflow-lite

ERROR: /build/output/external/XNNPACK/BUILD.bazel:2686:19: Compiling src/tables/exp2minus-k-over-2048.c failed: undeclared inclusion(s) in rule '@XNNPACK//:tables':
this rule is missing dependency declarations for the following files included by 'src/tables/exp2minus-k-over-2048.c':
  '/nix/store/qmk2paipxchxncxnvr7mn12hvdfn59sg-aarch64-unknown-linux-gnu-stage-final-gcc-12.3.0/aarch64-unknown-linux-gnu/sys-include/stdc-predef.h'
  '/nix/store/qmk2paipxchxncxnvr7mn12hvdfn59sg-aarch64-unknown-linux-gnu-stage-final-gcc-12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/stdint.h'
  '/nix/store/qmk2paipxchxncxnvr7mn12hvdfn59sg-aarch64-unknown-linux-gnu-stage-final-gcc-12.3.0/aarch64-unknown-linux-gnu/sys-include/stdint.h'
  '/nix/store/qmk2paipxchxncxnvr7mn12hvdfn59sg-aarch64-unknown-linux-gnu-stage-final-gcc-12.3.0/aarch64-unknown-linux-gnu/sys-include/bits/libc-header-start.h'
  '/nix/store/qmk2paipxchxncxnvr7mn12hvdfn59sg-aarch64-unknown-linux-gnu-stage-final-gcc-12.3.0/aarch64-unknown-linux-gnu/sys-include/features.h'
  '/nix/store/qmk2paipxchxncxnvr7mn12hvdfn59sg-aarch64-unknown-linux-gnu-stage-final-gcc-12.3.0/aarch64-unknown-linux-gnu/sys-include/features-time64.h'
  '/nix/store/qmk2paipxchxncxnvr7mn12hvdfn59sg-aarch64-unknown-linux-gnu-stage-final-gcc-12.3.0/aarch64-unknown-linux-gnu/sys-include/bits/wordsize.h'
  '/nix/store/qmk2paipxchxncxnvr7mn12hvdfn59sg-aarch64-unknown-linux-gnu-stage-final-gcc-12.3.0/aarch64-unknown-linux-gnu/sys-include/bits/timesize.h'
  '/nix/store/qmk2paipxchxncxnvr7mn12hvdfn59sg-aarch64-unknown-linux-gnu-stage-final-gcc-12.3.0/aarch64-unknown-linux-gnu/sys-include/sys/cdefs.h'
  '/nix/store/qmk2paipxchxncxnvr7mn12hvdfn59sg-aarch64-unknown-linux-gnu-stage-final-gcc-12.3.0/aarch64-unknown-linux-gnu/sys-include/bits/long-double.h'
  '/nix/store/qmk2paipxchxncxnvr7mn12hvdfn59sg-aarch64-unknown-linux-gnu-stage-final-gcc-12.3.0/aarch64-unknown-linux-gnu/sys-include/gnu/stubs.h'
  '/nix/store/qmk2paipxchxncxnvr7mn12hvdfn59sg-aarch64-unknown-linux-gnu-stage-final-gcc-12.3.0/aarch64-unknown-linux-gnu/sys-include/gnu/stubs-lp64.h'
  '/nix/store/qmk2paipxchxncxnvr7mn12hvdfn59sg-aarch64-unknown-linux-gnu-stage-final-gcc-12.3.0/aarch64-unknown-linux-gnu/sys-include/bits/types.h'
  '/nix/store/qmk2paipxchxncxnvr7mn12hvdfn59sg-aarch64-unknown-linux-gnu-stage-final-gcc-12.3.0/aarch64-unknown-linux-gnu/sys-include/bits/typesizes.h'
  '/nix/store/qmk2paipxchxncxnvr7mn12hvdfn59sg-aarch64-unknown-linux-gnu-stage-final-gcc-12.3.0/aarch64-unknown-linux-gnu/sys-include/bits/time64.h'
  '/nix/store/qmk2paipxchxncxnvr7mn12hvdfn59sg-aarch64-unknown-linux-gnu-stage-final-gcc-12.3.0/aarch64-unknown-linux-gnu/sys-include/bits/wchar.h'
  '/nix/store/qmk2paipxchxncxnvr7mn12hvdfn59sg-aarch64-unknown-linux-gnu-stage-final-gcc-12.3.0/aarch64-unknown-linux-gnu/sys-include/bits/stdint-intn.h'
  '/nix/store/qmk2paipxchxncxnvr7mn12hvdfn59sg-aarch64-unknown-linux-gnu-stage-final-gcc-12.3.0/aarch64-unknown-linux-gnu/sys-include/bits/stdint-uintn.h'

@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented Sep 12, 2023

I skimmed through this again after the notification, and I realize I may have misunderstood what is being proposed here. Is this issue only about using the nix-packaged Bazel outside nix-build, or is this also about fixing the effectively unusable (at least when it comes to tf, tfp, xla in nixpkgs) buildBazelPackage?

Btw, @aaronmondal you mention bzlmd. Do you know if the lockfile it generates includes the transitive dependencies/can it be used same as Cargo.lock or poetry.lock? In the context of tf/tfp/xla, could we write a bzlmodHook, similar to cargoSetupHook, and just re-use the bazel-generated lock file to fetchurl all the dependencies tf might want during the nix-build?

@SomeoneSerge
Copy link
Contributor

My understanding now is that this issue is about being able to re-use a particular configuration of Bazel (which mostly means the choice of toolchains?) between the nix-build and the dev env. Is that understanding correct?

@aaronmondal
Copy link
Contributor Author

Is this issue only about using the nix-packaged Bazel outside nix-build, or is this also about fixing the effectively unusable (at least when it comes to tf, tfp, xla in nixpkgs) buildBazelPackage?

When I originally built the wrapper it was just about having a reproducible environment around Bazel. But now I'm quite interested in getting a better build experience for xla and jax to stick as close to upstream as possible. So I guess I'm saying that the current buildBazelPackage is indeed fairly useless and the current configurations are IMO too complicated and error-prone to reasonably work with it.

I'm still trying to understand the internals in nix a bit better, but so far I still think that something like a bazelStdEnv could potentially simplify the builds for XLA and JAX a lot.

Btw, @aaronmondal you mention bzlmd. Do you know if the lockfile it generates includes the transitive dependencies/can it be used same as Cargo.lock or poetry.lock? In the context of tf/tfp/xla, could we write a bzlmodHook, similar to cargoSetupHook, and just re-use the bazel-generated lock file to fetchurl all the dependencies tf might want during the nix-build?

AFAIK the lockfile contains the hashes of the sources for the bazel registry dependencies, i.e. the starlark sources of custom build rules. Note that this is still experimental and last time I checked (~3 months ago) it didn't work with nix-built bazel at all.

I might be wrong though. I can very well imagine that the lockfile has a similar functionality as what you describe. Even if it doesn't it can be changed in bazel. The larger hurdle I see is that neither xla nor jax nor tf nor anyone else has yet migrated to bzlmod lol 🤣 Bzlmod makes it a lot harder to use hacky workarounds, and there are ... a few of those in xla and friends 😆

My understanding now is that this issue is about being able to re-use a particular configuration of Bazel (which mostly means the choice of toolchains?) between the nix-build and the dev env. Is that understanding correct?

Yes I was originally talking about devenv-interoperability. But I'm starting to think that both a devenv-reusable bazelwrapper and a bazelStdenv in nixpkgs might actually be the same thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants