Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add osx-arm64 + switch to rattler-build #33

Merged
merged 9 commits into from
Dec 17, 2024
Merged

Conversation

hadim
Copy link
Member

@hadim hadim commented Sep 11, 2024

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

  • No valid build backend found for Python recipe for package xformers using pip. Python recipes using pip need to explicitly specify a build backend in the host section. If your recipe has built with only pip in the host section in the past, you likely should add setuptools to the host section of your recipe.

@hadim
Copy link
Member Author

hadim commented Sep 11, 2024

@conda-forge-admin, please rerender

@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

@hadim
Copy link
Member Author

hadim commented Sep 11, 2024

@conda-forge-admin, please rerender

@hadim
Copy link
Member Author

hadim commented Sep 11, 2024

The build passes locally on a MacBook Pro M3 but fails here. I guess it's due to the cross compiling.

@hadim
Copy link
Member Author

hadim commented Sep 11, 2024

Error:

Processing $SRC_DIR
  Added file://$SRC_DIR to build tracker '/private/tmp/pip-build-tracker-mcfems6f'
  Running setup.py (path:$SRC_DIR/setup.py) egg_info for package from file://$SRC_DIR
  Created temporary directory: /private/tmp/pip-pip-egg-info-kx9vnmvx
  Preparing metadata (setup.py): started
  Running command python setup.py egg_info
  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/Users/runner/miniforge3/conda-bld/xformers_1726074224611/work/setup.py", line 24, in <module>
      import torch
    File "/Users/runner/miniforge3/conda-bld/xformers_1726074224611/_build_env/venv/lib/python3.11/site-packages/torch/__init__.py", line 238, in <module>
      _load_global_deps()
    File "/Users/runner/miniforge3/conda-bld/xformers_1726074224611/_build_env/venv/lib/python3.11/site-packages/torch/__init__.py", line 197, in _load_global_deps
      raise err
    File "/Users/runner/miniforge3/conda-bld/xformers_1726074224611/_build_env/venv/lib/python3.11/site-packages/torch/__init__.py", line 178, in _load_global_deps
      ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
    File "/Users/runner/miniforge3/conda-bld/xformers_1726074224611/_build_env/lib/python3.11/ctypes/__init__.py", line 376, in __init__
      self._handle = _dlopen(self._name, mode)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^
  OSError: dlopen(/Users/runner/miniforge3/conda-bld/xformers_1726074224611/_build_env/venv/lib/python3.11/site-packages/torch/lib/libtorch_global_deps.dylib, 0x000A): tried: '/Users/runner/miniforge3/conda-bld/xformers_1726074224611/_build_env/venv/lib/python3.11/site-packages/torch/lib/libtorch_global_deps.dylib' (no such file), '/Users/runner/miniforge3/conda-bld/xformers_1726074224611/_build_env/lib/python3.11/site-packages/torch/lib/libtorch_global_deps.dylib' (no such file)
  error: subprocess-exited-with-error

@hadim
Copy link
Member Author

hadim commented Sep 11, 2024

@conda-forge/xformers I am happy to keep digging here. I tried to look around for this _build_env/lib/python3.11/site-packages/torch/lib/libtorch_global_deps.dylib' (no such file) error but could not find anything.

I feel like there is some confusion in between the host and build envs. Also worth noting, the build passes locally on osx-arm64.

Any idea? Did I miss something obvious here?

@hadim hadim changed the title Add osx arm64 Add osx arm64 + try rattler-build Sep 11, 2024
@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml, recipe/recipe.yaml) and found some lint.

Here's what I've got...

For recipe/meta.yaml:

  • Failed to even lint the recipe, probably because of a conda-smithy bug 😢. This likely indicates a problem in your meta.yaml, though. To get a traceback to help figure out what's going on, install conda-smithy and run conda smithy recipe-lint . from the recipe directory.

For recipe/recipe.yaml:

This is a v1 recipe and not yet lintable. We are working on it!

@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/recipe.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/recipe.yaml:

This is a v1 recipe and not yet lintable. We are working on it!

@hadim
Copy link
Member Author

hadim commented Sep 11, 2024

A new bug that seems to be due to rattler this time:

 × error Error building package: Failed to resolve dependencies: Cannot solve the request because of: No candidates were found for __osx >=11.0.
Error:   × Failed to resolve dependencies: Cannot solve the request because of: No
  │ candidates were found for __osx >=11.0.
  │ 
  ╰─▶ Cannot solve the request because of: No candidates were found for __osx
      >=11.0.

@hadim
Copy link
Member Author

hadim commented Sep 12, 2024

Seems related to prefix-dev/rattler-build#1052. I wonder whether fixing this would also fix the above cross compilation issue using meta.yaml.

@hadim
Copy link
Member Author

hadim commented Sep 12, 2024

(and yes, using rattler decreases the build duration from ~15 min to ~5 min on osx-64)

@hadim
Copy link
Member Author

hadim commented Sep 12, 2024

Looking at https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=1025920&view=logs&jobId=1b8be447-c2bd-5772-b66a-a1146441bf88&j=1b8be447-c2bd-5772-b66a-a1146441bf88&t=0da7bde9-21e8-5d19-f418-738876121ca7, it seems that target_platform is set osx-64 instead of osx-arm64 for an osx-arm64 build. Could that explain the virtual package error?

@hadim
Copy link
Member Author

hadim commented Sep 12, 2024

+ rattler-build build --recipe ./recipe -m ./.ci_support/osx_arm64_python3.11.____cpython.yaml --output-dir /Users/runner/miniforge3/conda-bld --no-test

 ╭─ Finding outputs from recipe
 │ Found 1 variants
 │ Build variant: xformers-0.0.27-cpu_py311h1f6929c_1
 │ 
 │ ╭───────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 │ │ Variant               ┆ Version                                                                                                 │
 │ ╞═══════════════════════╪═════════════════════════════════════════════════════════════════════════════════════════════════════════╡
 │ │ CONDA_BUILD_SYSROOT   ┆ /Applications/Xcode_14.2.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.0.sdk │
 │ │ build_platform        ┆ osx-64                                                                                                  │
 │ │ c_stdlib              ┆ macosx_deployment_target                                                                                │
 │ │ c_stdlib_version      ┆ 11.0                                                                                                    │
 │ │ channel_targets       ┆ conda-forge main                                                                                        │
 │ │ cuda_compiler         ┆ None                                                                                                    │
 │ │ cuda_compiler_version ┆ None                                                                                                    │
 │ │ cxx_compiler          ┆ clangxx                                                                                                 │
 │ │ cxx_compiler_version  ┆ 17                                                                                                      │
 │ │ python                ┆ 3.11.* *_cpython                                                                                        │
 │ │ pytorch               ┆ 2.3                                                                                                     │
 │ │ target_platform       ┆ osx-64                                                                                                  │
 │ ╰───────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 │
 ╰─────────────────── (took 0 seconds)

@hadim
Copy link
Member Author

hadim commented Sep 12, 2024

@conda-forge-admin, please rerender

@hadim hadim changed the title Add osx arm64 + try rattler-build Add osx arm64 + try rattler-build + 0.0.28.post1 Sep 26, 2024
Copy link
Contributor

github-actions bot commented Sep 26, 2024

Hi! This is the friendly automated conda-forge-linting service.

I failed to even lint the recipe, probably because of a conda-smithy bug 😢. This likely indicates a problem in your meta.yaml, though. To get a traceback to help figure out what's going on, install conda-smithy and run conda smithy recipe-lint --conda-forge . from the recipe directory. You can also examine the workflow logs for more detail.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12381265860. Examine the logs at this URL for more detail.

@hadim
Copy link
Member Author

hadim commented Sep 26, 2024

osx works but osxarm64 fails and it seems related to prefix-dev/rattler-build-conda-compat#56

  File "/Users/runner/miniforge3/lib/python3.12/site-packages/conda_forge_ci_setup/feedstock_outputs.py", line 158, in main
    distributions = built_distributions_from_recipe_variant(recipe_dir=recipe_dir, variant=variant)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/runner/miniforge3/lib/python3.12/site-packages/conda_forge_ci_setup/utils.py", line 96, in built_distributions_from_recipe_variant
    allowed_dist_names, allowed_subdirs = get_built_distribution_names_and_subdirs(
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/runner/miniforge3/lib/python3.12/site-packages/joblib/memory.py", line 577, in __call__
    return self._cached_call(args, kwargs, shelving=False)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/runner/miniforge3/lib/python3.12/site-packages/joblib/memory.py", line 532, in _cached_call
    return self._call(call_id, args, kwargs, shelving)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/runner/miniforge3/lib/python3.12/site-packages/joblib/memory.py", line 771, in _call
    output = self.func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/runner/miniforge3/lib/python3.12/site-packages/conda_forge_ci_setup/utils.py", line 55, in get_built_distribution_names_and_subdirs
    metas = rattler_build_conda_compat.render.render(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/runner/miniforge3/lib/python3.12/site-packages/rattler_build_conda_compat/render.py", line 314, in render
    m.config.variant = package_variants[0]
                       ~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

@hadim hadim closed this Sep 26, 2024
@hadim hadim reopened this Sep 26, 2024
@hadim hadim force-pushed the osxarm64 branch 2 times, most recently from bf3c66e to bb23602 Compare September 27, 2024 21:22
@hadim
Copy link
Member Author

hadim commented Sep 27, 2024

Blocked by prefix-dev/rattler-build-conda-compat#56

Copy link
Member

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM, thanks. You can of course request to be added to the open-gpu-server, if you'd like (I don't decide, but it makes sense IMO).

@h-vetinari
Copy link
Member

Sorry, for the pileup. I thought I'd fix up your commit so we get a CI run going

@hadim
Copy link
Member Author

hadim commented Dec 17, 2024

Ah ok all good.

I let you take over here from now!

@h-vetinari
Copy link
Member

Actually, I think we can try switching this back to azure. The recent runs on the server only took ~30 min (not hours like it used to; for reasons I don't really understand BTW).

@hadim
Copy link
Member Author

hadim commented Dec 17, 2024

Actually, I think we can try switching this back to azure. The recent runs on the server only took ~30 min (not hours like it used to; for reasons I don't really understand BTW).

I am fine with that since last time I tried, it worked fine.

You do it or I do it?

@h-vetinari h-vetinari changed the title Add osx-arm64 + try rattler-build Add osx-arm64 + switch to rattler-build Dec 17, 2024
@h-vetinari
Copy link
Member

You do it or I do it?

Sorry, was already in the thick of it. ;-)

@hadim
Copy link
Member Author

hadim commented Dec 17, 2024

I bet the osx-arm64 is a transient error. I think a simple restart should do the job.

@h-vetinari
Copy link
Member

I bet the osx-arm64 is a transient error. I think a simple restart should do the job.

I would have just restarted the build. Since it's one of the fast ones, that would have taken less time (<10min) than now restarting the linux+CUDA jobs that had already run for ~20min.

@hadim
Copy link
Member Author

hadim commented Dec 17, 2024

Yup sorry for that and the error happens again so not sure what is going on actually:

thread 'tokio-runtime-worker' panicked at /Users/runner/miniforge3/conda-bld/rattler-build_1734085054975/_build_env/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/time/entry.rs:568:9:
A Tokio 1.x context was found, but it is being shutdown.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'tokio-runtime-worker' panicked at /Users/runner/miniforge3/conda-bld/rattler-build_1734085054975/_build_env/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/time/entry.rs:568:9:
A Tokio 1.x context was found, but it is being shutdown.
thread 'tokio-runtime-worker' panicked at /Users/runner/miniforge3/conda-bld/rattler-build_1734085054975/_build_env/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/time/entry.rs:568:9:
A Tokio 1.x context was found, but it is being shutdown.
thread 'tokio-runtime-worker' panicked at /Users/runner/miniforge3/conda-bld/rattler-build_1734085054975/_build_env/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/time/entry.rs:568:9:
A Tokio 1.x context was found, but it is being shutdown.
Error: 
  × Could not collect run exports
  ├─▶ an io error occurred: failed to unpack `/Users/runner/Library/Caches/
  │   rattler/cache/pkgs/pytorch-2.5.1-cpu_mkl_py312h2233c75_106/lib/
  │   python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py`
  ├─▶ failed to unpack `/Users/runner/Library/Caches/rattler/cache/pkgs/
  │   pytorch-2.5.1-cpu_mkl_py312h2233c75_106/lib/python3.12/site-packages/
  │   torch/_inductor/runtime/triton_heuristics.py`
  ├─▶ failed to unpack `lib/python3.12/site-packages/torch/_inductor/runtime/
  │   triton_heuristics.py` into `/Users/runner/Library/Caches/rattler/cache/
  │   pkgs/pytorch-2.5.1-cpu_mkl_py312h2233c75_106/lib/python3.12/site-
  │   packages/torch/_inductor/runtime/triton_heuristics.py`
  ├─▶ error decoding response body
  ├─▶ request or response body error
  ├─▶ error reading a body from connection
  ╰─▶ stream error received: unexpected internal error encountered

@h-vetinari
Copy link
Member

And it didn't even help, because you still have a transient error now (they seem to be common for some reason, see also the logs from the run on the GPU server).

In any case, please rebase out the last commit

@hadim
Copy link
Member Author

hadim commented Dec 17, 2024

Yup sorry for that and the error happens again so not sure what is going on actually:

thread 'tokio-runtime-worker' panicked at /Users/runner/miniforge3/conda-bld/rattler-build_1734085054975/_build_env/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/time/entry.rs:568:9:
A Tokio 1.x context was found, but it is being shutdown.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'tokio-runtime-worker' panicked at /Users/runner/miniforge3/conda-bld/rattler-build_1734085054975/_build_env/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/time/entry.rs:568:9:
A Tokio 1.x context was found, but it is being shutdown.
thread 'tokio-runtime-worker' panicked at /Users/runner/miniforge3/conda-bld/rattler-build_1734085054975/_build_env/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/time/entry.rs:568:9:
A Tokio 1.x context was found, but it is being shutdown.
thread 'tokio-runtime-worker' panicked at /Users/runner/miniforge3/conda-bld/rattler-build_1734085054975/_build_env/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/time/entry.rs:568:9:
A Tokio 1.x context was found, but it is being shutdown.
Error: 
  × Could not collect run exports
  ├─▶ an io error occurred: failed to unpack `/Users/runner/Library/Caches/
  │   rattler/cache/pkgs/pytorch-2.5.1-cpu_mkl_py312h2233c75_106/lib/
  │   python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py`
  ├─▶ failed to unpack `/Users/runner/Library/Caches/rattler/cache/pkgs/
  │   pytorch-2.5.1-cpu_mkl_py312h2233c75_106/lib/python3.12/site-packages/
  │   torch/_inductor/runtime/triton_heuristics.py`
  ├─▶ failed to unpack `lib/python3.12/site-packages/torch/_inductor/runtime/
  │   triton_heuristics.py` into `/Users/runner/Library/Caches/rattler/cache/
  │   pkgs/pytorch-2.5.1-cpu_mkl_py312h2233c75_106/lib/python3.12/site-
  │   packages/torch/_inductor/runtime/triton_heuristics.py`
  ├─▶ error decoding response body
  ├─▶ request or response body error
  ├─▶ error reading a body from connection
  ╰─▶ stream error received: unexpected internal error encountered

@wolfv any idea what could be causing this? I thought it was transient, but it seems not.

@wolfv
Copy link
Member

wolfv commented Dec 17, 2024

argh, no idea what's going on. Might be network related. It's already pretty late over here so I won't be able to look into it more tonight. We also pushed a new rattler-build release which should at least speed things up and use fewer file handles. It might help, idk! if someone wants to shepherd that release now - it's already on github.

@h-vetinari
Copy link
Member

if someone wants to shepherd that release now - it's already on github.

Happy to help, though there's only a commit, no tag. Perhaps someone forget to push the tags?

@wolfv
Copy link
Member

wolfv commented Dec 17, 2024

Just made it latest release and the tag should be there.

@h-vetinari h-vetinari merged commit 383b693 into conda-forge:main Dec 17, 2024
25 of 28 checks passed
@hadim
Copy link
Member Author

hadim commented Dec 17, 2024

Thanks both of you for the help here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants