Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python3Packages.jaxlib: fix darwin build #184395

Merged
merged 1 commit into from
Aug 18, 2022
Merged

Conversation

uri-canva
Copy link
Contributor

@uri-canva uri-canva commented Aug 1, 2022

Description of changes

Fix the darwin build for jaxlib.

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 22.11 Release Notes (or backporting 22.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

@uri-canva uri-canva requested a review from ndl August 1, 2022 01:26
@ofborg ofborg bot added the 6.topic: darwin Running or building packages on Darwin label Aug 1, 2022
@uri-canva
Copy link
Contributor Author

@ofborg build python39Packages.jaxlib

@uri-canva
Copy link
Contributor Author

@ofborg build python3Packages.jaxlib

@uri-canva
Copy link
Contributor Author

@samuela
Copy link
Member

samuela commented Aug 3, 2022

I don't have x86_64-darwin to test, but I can try on aarch64-darwin

@uri-canva
Copy link
Contributor Author

On ofborg it failed to build aarch64-darwin, and not due to timeout, some table gen issue.

@uri-canva
Copy link
Contributor Author

Ah no I remembered it wrong, it was because it can't find a darwin_arm64 toolchain: https://logs.nix.ci/?attempt_id=4a181b56-e474-4456-8ac7-47daf859ad27&key=nixos%2Fnixpkgs.184395, probably something to do with the different ways M1 identifies itself, I think I've hit it before. Sorry I didn't want to look more into aarch64-darwin because mine is a 13", so the builds take forever.

Copy link
Member

@samuela samuela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing this! Getting jaxlib support on Darwin would be a big win!

To some extent I'm flying blind here without access to x86_64-darwin hardware, but I have a few questions first anyhow...

Comment on lines 202 to 206
# Make sure Bazel knows about our configuration flags during fetching so that the
# relevant dependencies can be downloaded.
bazelFetchFlags = bazel-build.bazelBuildFlags;

bazelBuildFlags = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this removed and merged into a single bazelFlags? It looks like we're losing bazelFetchFlags entirely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bazelFlags are inherited by both the build (

inherit name bazelFlags bazelBuildFlags bazelFetchFlags bazelTarget;
) and the fetch (
inherit bazelFlags bazelBuildFlags bazelFetchFlags bazelTarget;
) derivations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, thanks for explaining. In that case, could we keep that comment -- or something analogous -- explaining why we need those flags passed to both steps?

if stdenv.targetPlatform.isLinux then
"manylinux2010_${stdenv.targetPlatform.linuxArch}"
else if stdenv.targetPlatform.isDarwin then
"macosx_10_9_${stdenv.targetPlatform.linuxArch}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why macosx_10_9? It looks like the pypi wheels are using the macosx_11_0 tag already: https://pypi.org/project/jaxlib/#files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self: we should also update to manylinux_2014 as it seems that's what upstream is doing nowadays as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comes from https://github.com/google/jax/blob/jaxlib-v0.3.0/build/build_wheel.py#L261-L262, looks like macosx_11_0 is only used for arm on 0.3.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see... very good catch. And I guess for the time being we will only support x86_64-darwin anyhow?

Comment on lines 4506 to 4565
buildBazelPackage = pkgs.buildBazelPackage.override {
stdenv = pkgs.darwin.apple_sdk_11_0.stdenv;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the kind of thing we should only do on Darwin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right, https://nixos.org/manual/nixpkgs/unstable/#sec-darwin documents that this is fine:

On Linux, this will have the same effect as pkgs.callPackage, so you can use pkgs.darwin.apple_sdk_11_0.callPackage regardless of platform.

but it's unintuitive enough that I'll make this conditional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see. yeah I think sticking to a conditional given the unintuitive-ness is a good idea

@uri-canva
Copy link
Contributor Author

Addressed review.

@uri-canva
Copy link
Contributor Author

Fixed it on aarch64-darwin too, so you can test it locally now.

@samuela
Copy link
Member

samuela commented Aug 5, 2022

Sadly it seems that darwin builds may be re-broken until we can get macOS SDK 10.14 into nixpkgs (#183051) 😞

@uri-canva
Copy link
Contributor Author

That should be ok at least from an SDK point of view, we're using the macOS 11.0 SDK here, that's what the pkgs.darwin.apple_sdk_11_0.stdenv is about. Hopefully it will just work or require only small changes.

@uri-canva
Copy link
Contributor Author

Does anyone know why ofborg won't pick up python3Packages.jaxlib from the PR title automatically?

@uri-canva
Copy link
Contributor Author

@ofborg build python3Packages.jaxlib python39Packages.jaxlib

@uri-canva
Copy link
Contributor Author

Would also be good to figure out why the linux derivations fail to evaluate on ofborg, they build properly in hydra.

@uri-canva
Copy link
Contributor Author

@ofborg build python3Packages.jaxlib

@samuela
Copy link
Member

samuela commented Aug 6, 2022

Result of nixpkgs-review pr 184395 run on aarch64-darwin 1

34 packages marked as broken and skipped:
  • python310Packages.aeppl
  • python310Packages.aesara
  • python310Packages.arviz
  • python310Packages.augmax
  • python310Packages.chex
  • python310Packages.dalle-mini
  • python310Packages.distrax
  • python310Packages.dm-sonnet
  • python310Packages.elegy
  • python310Packages.flax
  • python310Packages.objax
  • python310Packages.optax
  • python310Packages.pymc
  • python310Packages.rlax
  • python310Packages.tensorflow-datasets
  • python310Packages.treex
  • python310Packages.vqgan-jax
  • python39Packages.aeppl
  • python39Packages.aesara
  • python39Packages.arviz
  • python39Packages.augmax
  • python39Packages.chex
  • python39Packages.dalle-mini
  • python39Packages.distrax
  • python39Packages.dm-sonnet
  • python39Packages.elegy
  • python39Packages.flax
  • python39Packages.objax
  • python39Packages.optax
  • python39Packages.pymc
  • python39Packages.rlax
  • python39Packages.tensorflow-datasets
  • python39Packages.treex
  • python39Packages.vqgan-jax
10 packages failed to build:
  • python310Packages.dm-haiku
  • python310Packages.jax
  • python310Packages.jmp
  • python310Packages.numpyro
  • python310Packages.treeo
  • python39Packages.dm-haiku
  • python39Packages.jax
  • python39Packages.jmp
  • python39Packages.numpyro
  • python39Packages.treeo

@samuela
Copy link
Member

samuela commented Aug 6, 2022

Fixed it on aarch64-darwin too, so you can test it locally now.

Hmm I'm having trouble building on aarch64-darwin: https://gist.github.com/samuela/471a67182e4c7dded61675fba72f18cf. Looks like it may be a 404

@uri-canva
Copy link
Contributor Author

That's a red herring, it's strange that it's not available on the bazel mirror, but it will get it from github: bazelbuild/rules_cc@081771d.

The error is:

ERROR: /private/tmp/nix-build-bazel-build-jaxlib-0.3.0-deps.tar.gz.drv-0/output/external/local_config_cc/BUILD:48:19: in cc_toolchain_suite rule @local_config_cc//:toolchain: cc_toolchain_suite '@local_config_cc//:toolchain' does not contain a toolchain for cpu 'darwin_arm64'

Are you sure you checked out the latest commit? That's what the patch file is supposed to fix.

@uri-canva
Copy link
Contributor Author

I'm seeing the same error on ofborg, will try to repro, there might be something wrong with my own setup.

@uri-canva
Copy link
Contributor Author

But maybe let's merge the working x86_64-darwin build in the meantime.

Copy link
Member

@samuela samuela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @uri-canva, apologies for the delay! I have a lingering comment about using fetchpatch and one place where I think a comment might be useful. But after this I think we should be good to merge!

@@ -232,17 +239,31 @@ let
# 4) Patch tensorflow sources to work with later versions of protobuf. See
# https://github.com/google/jax/issues/9534. Note that this should be
# removed on the next release after 0.3.0.
# 5) Patch in https://github.com/bazelbuild/rules_cc/pull/136 to fix the bazel compiler detection on aarch64-darwin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for filling this upstream! Since that's done, let's just pull it via fetchpatch here instead of copying the patch file into nixpkgs's tree.

Alternatively we can remove it for now since AFAIU the aarch64-darwin build is not yet functional anyhow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think fetchpatch will work: the commit is on a PR, so when the PR is merged / closed and the branch deleted it will eventually be garbage collected. I'll remove it then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a common pattern in other packages in nixpkgs. It's generally preferred to fetchpatch commits rather than PRs/branches, since PRs are mutable. That's an interesting question re commits being deleted... I'm not sure how that's handled. Perhaps they just link to privately controlled forks of repos? Idk...

Comment on lines +60 to +64
platforms = platforms.unix;
# aarch64-darwin is broken because of https://github.com/bazelbuild/rules_cc/pull/136
# however even with that fix applied, it doesn't work for everyone:
# https://github.com/NixOS/nixpkgs/pull/184395#issuecomment-1207287129
broken = stdenv.isAarch64;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the meta to reflect the new state of platform support. Given that I got it working on aarch64-darwin (albeit only on my machine) I assume that not building on aarch64-linux is because something is broken, not because it's not supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it even builds correctly, I haven't tried.

@uri-canva
Copy link
Contributor Author

@ofborg build python3Packages.jaxlib

@ofborg ofborg bot added the 8.has: package (new) This PR adds a new package label Aug 13, 2022
@uri-canva
Copy link
Contributor Author

The failure on ofborg is:

ERROR: /private/tmp/nix-build-bazel-build-jaxlib-0.3.0.drv-0/output/external/org_tensorflow/tensorflow/compiler/mlir/hlo/BUILD:400:18: TdGenerate external/org_tensorflow/tensorflow/compiler/mlir/hlo/lib/Dialect/mhlo/IR/mhlo_canonicalize.inc failed: (Illegal instruction): mlir-tblgen failed: error executing command bazel-out/darwin-opt-exec-50AE0418/bin/external/llvm-project/mlir/mlir-tblgen -gen-rewriters external/org_tensorflow/tensorflow/compiler/mlir/hlo/lib/Dialect/mhlo/IR/mhlo_canonicalize.td -I ... (remaining 17 arguments skipped)

I've had that locally when nix was confused on what the current system architecture was, because it was running on an M1 with rosetta, but I'm not sure why that would happen on an x86_64 mac. I've seen in it some of the past builds too, but not all, so it seems to only affect specific machines.

@samuela
Copy link
Member

samuela commented Aug 18, 2022

I've had that locally when nix was confused on what the current system architecture was, because it was running on an M1 with rosetta, but I'm not sure why that would happen on an x86_64 mac. I've seen in it some of the past builds too, but not all, so it seems to only affect specific machines.

Huh, very odd... doesn't look like your fault in any case. I would be curious to figure out the underlying issue, but doesn't need to be in scope for this particular PR.

@samuela
Copy link
Member

samuela commented Aug 18, 2022

Thanks for all your hard work on this @uri-canva! These changes LGTM and are a big leap forward in terms of our ability to support more platforms.

@samuela samuela merged commit 5b71d23 into NixOS:master Aug 18, 2022
@uri-canva uri-canva deleted the jaxlib-darwin branch August 18, 2022 02:37
Comment on lines +251 to +254
substituteInPlace ../output/external/rules_cc/cc/private/toolchain/osx_cc_wrapper.sh.tpl \
--replace "/usr/bin/install_name_tool" "${cctools}/bin/install_name_tool"
substituteInPlace ../output/external/rules_cc/cc/private/toolchain/unix_cc_configure.bzl \
--replace "/usr/bin/libtool" "${cctools}/bin/libtool"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This broke jaxlib on non-Darwin because cctools is Darwin-specific.

broken = !stdenv.targetPlatform.isDarwin; # Only supports darwin targets

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunate, it wasn't evaluating on ofborg so I couldn't check, have you already fixed it? I'll put up a fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants