Move libtorch bits to PREFIX & enable cuDSS, cuSPARSELt, CUPTI on win; fix duplicate `.pyc` versions #328

mgorny · 2025-01-19T14:12:28Z

Checklist

Used a personal fork of the feedstock to propose changes
Bumped the build number (if the version is unchanged)
Reset the build number to 0 (if the version changed)
Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
Ensured the license file is being packaged.

Fixes #319
Fixes #320
Fixes #321
Fixes #327
Fixes #329

Fixes conda-forge#319 Fixes conda-forge#320 Fixes conda-forge#321

conda-forge-admin · 2025-01-19T14:13:57Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.
ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

_{This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12939294964. Examine the logs at this URL for more detail.}

h-vetinari · 2025-01-19T22:28:55Z

Would also be good to address #327 in this PR.

hmaarrfk · 2025-01-19T23:31:42Z

(draft until all builds are enabled again)

h-vetinari · 2025-01-20T10:24:59Z

So the extra CUDA deps on windows seem to work fine 🥳

mgorny · 2025-01-20T13:11:25Z

Okay, I'll try adding some logic for that here later today.

mgorny · 2025-01-20T13:59:32Z

Ok, I've decided to instead start by adding error handling to .bat, to see which of the existing commands fail.

By the way, any chance you could give me the power to run CI myself?

h-vetinari · 2025-01-20T14:48:38Z

By the way, any chance you could give me the power to run CI myself?

You already should have AFAIU, and I don't have the power to change it further. This should be a matter of Quansight/open-gpu-server#52 plus a sync of the setup to https://github.com/conda-forge/.cirun (which happened), so I'm really not sure where this goes wrong. Pinging @jaimergp & @aktech

One conceivable angle is that the lack of authorization for the windows agents prevents you from starting any job at all (though IIRC you already weren't able to start things before we merged windows support). For that, you could make a PR like conda-forge/.cirun#27

h-vetinari

I'd prefer to add

if %ERRORLEVEL% neq 0 exit 1

to calls that can fail, rather than adding || goto error everywhere. There's no functional difference AFAIK, and although both styles have their pros & cons, the goto-variant is quite rare in conda-forge (a quick search shows 170 feedstocks using goto, while ~1.6 feedstocks use an errorlevel variant).

recipe/bld.bat

aktech · 2025-01-20T15:01:56Z

@mgorny You need to add your name here: https://github.com/conda-forge/.cirun/blob/218150c90899c7dfeb771d0bd9b176ca191848f3/.access.yml#L89 like this PR: conda-forge/.cirun#27 and this is subject to @wolfv's approval I believe as they (prefix.dev) are the ones sponsoring the windows machine.

and you need to be in any of these roles for this repo to be able to trigger workflows that involves GPU CI runs. cc @jaimergp

h-vetinari

We're going to need existence tests for the DLLs to be under %LIBRARY_BIN%.

The tests are already there, and just need to be modified, e.g.

pytorch-cpu-feedstock/recipe/meta.yaml

Line 199 in 39b84c3

- if not exist %SP_DIR%\torch\lib\{{ each_lib }}.dll exit 1 # [win]

(similar below)

h-vetinari · 2025-01-20T15:09:37Z

Thanks for the quick response @aktech!

and you need to be in any of these roles for this repo to be able to trigger workflows that involves GPU CI runs.

Ah, perhaps it's just a question of making @mgorny a maintainer here? That's a hypothesis we can test easily (given your involvement here, I'm assuming you have nothing against this @mgorny) 😉

h-vetinari · 2025-01-20T15:17:31Z

By the way, any chance you could give me the power to run CI myself?

You already should have AFAIU, and I don't have the power to change it further.

Well, perhaps I did have the power after all: 4ff92b3 ;-)

We'll find out on your next push here!

mgorny · 2025-01-20T15:18:52Z

Ok, I'll replace || goto :error with if errorlevel checks, then push and take a break.

h-vetinari · 2025-01-20T15:19:59Z

We'll find out on your next push here!

Sorry, I was talking about the linux builds. For the windows agents, we'll definitely still need to wait for conda-forge/.cirun#32

h-vetinari · 2025-01-20T15:23:49Z

Ok, I'll replace || goto :error with if errorlevel checks

You're probably aware, but just for completeness (and because batch is an insane language): if errorlevel is a very different construct from if %errorlevel%. In particular, the more "advanced" relational operators like NEQ are not supported; more details

mgorny · 2025-01-20T15:24:29Z

Ah, okay, sorry. Either case, I really need to take a break now, so maybe it'll be done when I get back.

h-vetinari · 2025-01-20T15:25:43Z

Ah, okay, sorry. Either case, I really need to take a break now, so maybe it'll be done when I get back.

No worries. Perhaps until then the PR is merged. ;-)

When you do get back, please squash f9fe7bf & ad726bc to avoid the back-and-forth.

mgorny · 2025-01-23T19:22:16Z

Don't you want to enable the remaining Windows builds? I wanted to add removal of TH_BINARY_BUILD too. If that's fine, I'll push it along with rerender to restore all build targets.

h-vetinari · 2025-01-23T21:57:30Z

Don't you want to enable the remaining Windows builds?

Of course, but I wanted to solve all the current issues first, and also ask @hmaarrfk and @isuruf for review, if they have time. CC also @danpetry if interested.

I wanted to add removal of TH_BINARY_BUILD too. If that's fine, I'll push it along with rerender to restore all build targets.

How high is your confidence that this will work first try? I think this PR is already doing a lot, and I'd like to get it out so we can start building dependent packages and test out whether things work as intended. I wouldn't be surprised if follow-ups turn out to be necessary anyway, so I'd prefer to put TH_BINARY_BUILD into the next round.

h-vetinari · 2025-01-23T22:03:00Z

FWIW, #329 is now fixed as well, so I think this is ready (modulo a revert of 3ebf1a9 + a rerender, which I'm holding off on for now, in case we need some more rounds after feedback).

This reverts commit 3ebf1a9.

h-vetinari · 2025-01-23T22:21:12Z

Nevermind, I mind found an issue with the windows CUDA builds that needs fixing anyway, so pushing a rerendered build, but will cancel everything but the windows+CUDA job. And it'll also be good to see if CI on osx passes with the couple relevant changes here.

…nda-forge-pinning 2025.01.23.20.59.37

This reverts commit fc5693c.

h-vetinari · 2025-01-24T13:01:47Z

the win+cuda build passed fine, and so did the osx builds. I've launched aarch+CUDA as a final smoketest, but I expect this PR to be ready. Please let me know if there are comments @conda-forge/pytorch-cpu, otherwise I'm looking to merge this soon.

mgorny · 2025-01-24T14:05:53Z

I'd still do the TH_BINARY_BUILD thing, but perhaps after all builds pass, so we don't bump the revision twice (or we can defer that to anaconda sync PR) — i.e. basically add it, if CI passes, merge with it, if CI fails, merge the previous commit.

h-vetinari · 2025-01-24T20:44:46Z

Can you raise a PR on your fork against this branch? If you enable actions on your repo, this should actually run on the open-gpu server (at least it does by default on my fork). That way we can keep the CI runs separate and don't have to roll back if it goes wrong.

mgorny · 2025-01-25T06:29:45Z

Can you raise a PR on your fork against this branch? If you enable actions on your repo, this should actually run on the open-gpu server (at least it does by default on my fork). That way we can keep the CI runs separate and don't have to roll back if it goes wrong.

Done that in mgorny#4. It seems that it created the 16 jobs, but I don't see the CIrun gating job, so I don't think they'll start. But I have another idea: I'll just open another PR to this repo, adding the extra commits, and you'll merge whichever works.

h-vetinari · 2025-01-25T06:54:08Z

But I have another idea: I'll just open another PR to this repo, adding the extra commits, and you'll merge whichever works.

Sounds good to me!

h-vetinari · 2025-01-25T16:07:45Z

Any comments here @conda-forge/pytorch-cpu? CI in #331 (which adds one more change on top of this PR) is looking good and I'd like to merge this soon.

h-vetinari · 2025-01-25T22:40:53Z

Alrighty! Thanks a lot for the reviews! Post-merge comments are still very welcome, we'll have yet another round with #318 coming right up anyway where we can take this up. My goal would be to wrap up that PR in a few days as well and then get 2.6 out the door. 🙃

h-vetinari · 2025-01-25T22:54:44Z

Ah dammit, the workaround for conda-forge/conda-forge-pinning-feedstock#6910, causes windows to fail here 😑

Checking out the ref
  "C:\Program Files\Git\bin\git.exe" checkout --progress --force -B main refs/remotes/origin/main
  Error: error: unable to create file .ci_support/linux_64_blas_implgenericc_compiler_version13channel_targetsconda-forge_maincuda_compilerNonecuda_compiler_versionNonecxx_compiler_version13github_actions_labelscirun-openstack-cpu-2xlargeis_rcFalse.yaml: Filename too long
  Error: error: unable to create file .ci_support/linux_64_blas_implgenericc_compiler_version13channel_targetsconda-forge_maincuda_compilerNonecuda_compiler_versionNonecxx_compiler_version13github_actions_labelscirun-openstack-gpu-2xlargeis_rcFalse.yaml: Filename too long

(logs, originally for commit 1c645fa, now force-pushed away)

Attempt reenabling cuDSS, cuSPARSELt and CUPTI on Windows

c3b2a1f

Fixes conda-forge#319 Fixes conda-forge#320 Fixes conda-forge#321

mgorny requested review from Tobias-Fischer, baszalmstra, beckermr, benjaminrwilson, h-vetinari, hmaarrfk, jeongseok-meta and sodre as code owners January 19, 2025 14:12

hmaarrfk marked this pull request as draft January 19, 2025 23:31

h-vetinari reviewed Jan 20, 2025

View reviewed changes

recipe/bld.bat Outdated Show resolved Hide resolved

h-vetinari previously requested changes Jan 20, 2025

View reviewed changes

mgorny added 3 commits January 20, 2025 17:38

Bump build number

673dd87

Skip non-Windows builds

9008ca4

Remove osx-64 code from bld.bat

d241483

h-vetinari changed the title ~~Attempt reenabling cuDSS, cuSPARSELt and CUPTI on Windows~~ Move libtorch bits to PREFIX & enable cuDSS, cuSPARSELt, CUPTI on win; fix duplicate .pyc versions Jan 23, 2025

h-vetinari marked this pull request as ready for review January 23, 2025 22:05

h-vetinari added 3 commits January 24, 2025 09:12

fix handling of CUDA libraries on windows

6355886

add regression test for duplicate .pyc files

6c7d83f

Revert "DEBUG: build only one python version"

eb3c6d0

This reverts commit 3ebf1a9.

h-vetinari added 3 commits January 24, 2025 09:30

Merge remote-tracking branch 'upstream/main' into more-win

92b4f30

MNT: Re-rendered with conda-build 25.1.1, conda-smithy 3.45.3, and co…

1c1ac05

…nda-forge-pinning 2025.01.23.20.59.37

Revert "pin python in build environment to same version as in host"

1c3a925

This reverts commit fc5693c.

h-vetinari force-pushed the more-win branch from 934c8b9 to 1c3a925 Compare January 23, 2025 22:31

h-vetinari mentioned this pull request Jan 23, 2025

Pkg 6001 win cf AnacondaRecipes/pytorch-feedstock#53

Open

h-vetinari requested review from hmaarrfk and isuruf January 24, 2025 12:59

h-vetinari mentioned this pull request Jan 24, 2025

Sync some features with anaconda's recipe #318

Draft

5 tasks

mgorny mentioned this pull request Jan 25, 2025

#328 + remove TH_BINARY_BUILD #331

Merged

5 tasks

baszalmstra approved these changes Jan 25, 2025

View reviewed changes

jeongseok-meta approved these changes Jan 25, 2025

View reviewed changes

h-vetinari merged commit 1c3a925 into conda-forge:main Jan 25, 2025
21 of 27 checks passed

mgorny deleted the more-win branch January 26, 2025 02:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move libtorch bits to PREFIX & enable cuDSS, cuSPARSELt, CUPTI on win; fix duplicate `.pyc` versions #328

Move libtorch bits to PREFIX & enable cuDSS, cuSPARSELt, CUPTI on win; fix duplicate `.pyc` versions #328

mgorny commented Jan 19, 2025 •

edited by h-vetinari

Loading

conda-forge-admin commented Jan 19, 2025 •

edited

Loading

h-vetinari commented Jan 19, 2025

hmaarrfk commented Jan 19, 2025

h-vetinari commented Jan 20, 2025

mgorny commented Jan 20, 2025

mgorny commented Jan 20, 2025

h-vetinari commented Jan 20, 2025

h-vetinari left a comment

aktech commented Jan 20, 2025 •

edited

Loading

h-vetinari left a comment

h-vetinari commented Jan 20, 2025

h-vetinari commented Jan 20, 2025

mgorny commented Jan 20, 2025

h-vetinari commented Jan 20, 2025 •

edited

Loading

h-vetinari commented Jan 20, 2025

mgorny commented Jan 20, 2025

h-vetinari commented Jan 20, 2025

mgorny commented Jan 23, 2025

h-vetinari commented Jan 23, 2025

h-vetinari commented Jan 23, 2025

h-vetinari commented Jan 23, 2025

h-vetinari commented Jan 24, 2025

mgorny commented Jan 24, 2025

h-vetinari commented Jan 24, 2025

mgorny commented Jan 25, 2025

h-vetinari commented Jan 25, 2025

h-vetinari commented Jan 25, 2025 •

edited

Loading

h-vetinari commented Jan 25, 2025

h-vetinari commented Jan 25, 2025 •

edited

Loading

Move libtorch bits to PREFIX & enable cuDSS, cuSPARSELt, CUPTI on win; fix duplicate .pyc versions #328

Move libtorch bits to PREFIX & enable cuDSS, cuSPARSELt, CUPTI on win; fix duplicate .pyc versions #328

Conversation

mgorny commented Jan 19, 2025 • edited by h-vetinari Loading

conda-forge-admin commented Jan 19, 2025 • edited Loading

h-vetinari commented Jan 19, 2025

hmaarrfk commented Jan 19, 2025

h-vetinari commented Jan 20, 2025

mgorny commented Jan 20, 2025

mgorny commented Jan 20, 2025

h-vetinari commented Jan 20, 2025

h-vetinari left a comment

Choose a reason for hiding this comment

aktech commented Jan 20, 2025 • edited Loading

h-vetinari left a comment

Choose a reason for hiding this comment

h-vetinari commented Jan 20, 2025

h-vetinari commented Jan 20, 2025

mgorny commented Jan 20, 2025

h-vetinari commented Jan 20, 2025 • edited Loading

h-vetinari commented Jan 20, 2025

mgorny commented Jan 20, 2025

h-vetinari commented Jan 20, 2025

mgorny commented Jan 23, 2025

h-vetinari commented Jan 23, 2025

h-vetinari commented Jan 23, 2025

h-vetinari commented Jan 23, 2025

h-vetinari commented Jan 24, 2025

mgorny commented Jan 24, 2025

h-vetinari commented Jan 24, 2025

mgorny commented Jan 25, 2025

h-vetinari commented Jan 25, 2025

h-vetinari commented Jan 25, 2025 • edited Loading

h-vetinari commented Jan 25, 2025

h-vetinari commented Jan 25, 2025 • edited Loading

Move libtorch bits to PREFIX & enable cuDSS, cuSPARSELt, CUPTI on win; fix duplicate `.pyc` versions #328

Move libtorch bits to PREFIX & enable cuDSS, cuSPARSELt, CUPTI on win; fix duplicate `.pyc` versions #328

mgorny commented Jan 19, 2025 •

edited by h-vetinari

Loading

conda-forge-admin commented Jan 19, 2025 •

edited

Loading

aktech commented Jan 20, 2025 •

edited

Loading

h-vetinari commented Jan 20, 2025 •

edited

Loading

h-vetinari commented Jan 25, 2025 •

edited

Loading

h-vetinari commented Jan 25, 2025 •

edited

Loading