Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudaPackages: .pc files should not contain /usr paths #224119

Open
SomeoneSerge opened this issue Mar 31, 2023 · 10 comments
Open

cudaPackages: .pc files should not contain /usr paths #224119

SomeoneSerge opened this issue Mar 31, 2023 · 10 comments
Labels
0.kind: bug Something is broken 3.skill: good-first-bug This seems like it's fixable by a non-expert 3.skill: trivial This is trivial to complete (typically find-and-replace) 6.topic: cuda Parallel computing platform and API

Comments

@SomeoneSerge
Copy link
Contributor

Describe the bug

We need to patch cuda's pkg-config files

Steps To Reproduce

Steps to reproduce the behavior:

  1. ❯ NIXPKGS_ALLOW_UNFREE=1 nix build --impure nixpkgs#cudaPackages.cuda_cudart
  2. ❯ cat result/pkg-config/cudart-11.7.pc
  3. observe cudaroot=/usr/local/cuda-11.7

Expected behavior

Paths relative to outputs

Notify maintainers

@NixOS/cuda-maintainers

@SomeoneSerge SomeoneSerge added 0.kind: bug Something is broken 6.topic: cuda Parallel computing platform and API labels Mar 31, 2023
@github-project-automation github-project-automation bot moved this to 🆕 New in CUDA Team Mar 31, 2023
@SomeoneSerge SomeoneSerge moved this from 🆕 New to 📋 Backlog in CUDA Team Apr 1, 2023
@SomeoneSerge SomeoneSerge changed the title cudaPackages: .pc files contain /usr paths cudaPackages: .pc files should not ciontain /usr paths Apr 1, 2023
@SomeoneSerge SomeoneSerge changed the title cudaPackages: .pc files should not ciontain /usr paths cudaPackages: .pc files should not contain /usr paths Apr 1, 2023
@SomeoneSerge SomeoneSerge moved this from 📋 Backlog to Roadmap in CUDA Team Apr 12, 2023
@SomeoneSerge SomeoneSerge added 3.skill: good-first-bug This seems like it's fixable by a non-expert 3.skill: trivial This is trivial to complete (typically find-and-replace) labels Apr 12, 2023
@ghost
Copy link

ghost commented Apr 20, 2023

Hellol! I was looking for good first bug issues and tried this.

Describe the bug:

The pkg-config files for the cuda_cudart package in the Nix package repository nixpkgs contain incorrect paths. The paths should be relative to the package outputs, but they currently point to fixed paths like /usr/local/cuda-11.7.

Steps to reproduce the bug:

Build and install the cuda_cudart package using the following command:

NIXPKGS_ALLOW_UNFREE=1 nix build --impure nixpkgs#cudaPackages.cuda_cudart

Check the contents of the pkg-config file for the package:

cat result/pkg-config/cudart-11.7.pc
cudaroot=/usr/local/cuda-11.7
libdir=${cudaroot}/targets/x86_64-linux/lib
includedir=${cudaroot}/targets/x86_64-linux/include

Name: cudart
Description: CUDA Runtime Library
Version: 11.7
Libs: -L${libdir} -lcudart
Cflags: -I${includedir}

Observe that the cudaroot path is set to /usr/local/cuda-11.7, which is incorrect.

Expected behavior

The paths in the pkg-config files should be relative to the package outputs, rather than fixed paths like /usr/local/cuda-11.7 . For instance, the cudaroot path should be replaced with the correct output path in the Nix store.

Let's begin:

Fork the nixpkgs repository:

git clone https://github.com/dooygoy/nixpkgs.git
Cloning into 'nixpkgs'...
remote: Enumerating objects: 3783643, done.
remote: Counting objects: 100% (458/458), done.
remote: Compressing objects: 100% (172/172), done.
remote: Total 3783643 (delta 332), reused 390 (delta 286), pack-reused 3783185
Receiving objects: 100% (3783643/3783643), 3.11 GiB | 6.33 MiB/s, done.
Resolving deltas: 100% (2576941/2576941), done.
Updating files: 100% (34603/34603), done.

cd nixpkgs/

Search for the package:

grep -r --include='*.nix' -l 'cuda_cudart' .
./pkgs/applications/science/math/mathematica/generic.nix
./pkgs/development/libraries/science/math/tiny-cuda-nn/default.nix
./pkgs/development/libraries/science/math/nccl/default.nix
./pkgs/development/libraries/science/math/faiss/default.nix
./pkgs/development/libraries/science/math/magma/generic.nix
./pkgs/development/libraries/opencv/4.x.nix
./pkgs/development/libraries/nvidia-thrust/default.nix
./pkgs/development/python-modules/torchvision/default.nix
./pkgs/development/python-modules/openai-triton/default.nix
./pkgs/development/python-modules/bitsandbytes/default.nix

I see there is the cudatoolkit but I am not sure if this is the package I have to fix:

https://github.com/NixOS/nixpkgs/blob/nixos-22.11/pkgs/development/compilers/cudatoolkit/

There is also the build-cuda-redist-package.nix with the:

# TODO: choose whether to install static/dynamic libs
  installPhase = ''
    runHook preInstall
    rm LICENSE
    mkdir -p $out
    mv * $out
    runHook postInstall
  '';

How should I go from here? I am not sure how to continue for now or whether I am looking at the correct file:

@SomeoneSerge
Copy link
Contributor Author

SomeoneSerge commented Apr 21, 2023

@dooygoy hi, and thank you for taking interest in this issue! In the hindsight, I see I didn't include all of the inputs in the description, sorry about that. I also should warn you in advance that although I added the good-first-bug label, merging a PR to close this issue might take a little time and a bit of experimenting. The individual steps should be very simple though!

I won't assume how familiar you are with nixpkgs, so I'll include details that may possibly be (probably are) redundant.

A note on impact: hypothetically, we could just delete the pkg-config files, because most of our downstream packages seem to be using cmake, meson, or bazel for builds, and they either rely on proper mechanisms like FindCUDAToolkit.cmake to discover CUDA, or they hard-code paths like /opt/something/cuda. However, the upstream keeps publishing the .pc files, so some people we don't see may be relying on them, so it's reasonable for us to keep them, as long as making them correct isn't too much of a burden. I think we should.

Also, do not worry if you decide to spend your time elsewhere: I think it's good to have these notes written up anyway

You guessed right, most of our CUDA-related expressions live in pkgs/development/compilers/cudatoolkit, and the actual cudaPackages.cuda_cudart package (and all other cudatoolkit components from recent releases) is built from build-cuda-redist-package.nix, which you also have already found1. The expression in build-cuda-redist-package.nix is re-used for every CUDA package, and the actual hashes (package names, URLs, hashes) come from JSON files published by nvidia along with cudatoolkit. It should be sufficient that we edit build-cuda-redist-package.nix appropriately

  • It might be sufficient that we simply patch the existing .pc files to replace /usr/local/cuda-XX.Y with $out. This can be done in the installPhase before runHook postInstall. I imagine we could start by just trying sed, e.g. sed -i 's|^cudaroot=.*$|cudaroot=${placeholder "out"}|' $out/pkg-config/*.pc.
  • While we're at it, we probably should start using the validatePkgConfig and testMetaPkgConfig hooks: in build-cuda-redist-package.nix we'd need to append validatePkgConfig to the nativeBuildInputs list. We'll take validatePkgConfig as an argument: you add it after autoAddOpenGLRunpathHook at the top of the file, and it should get auto-wired from there. The hook itself is defined in all-packages.nix.
    • The testMetaPkgConfig is more useful: we can expose "test derivations" that verify availability of specific pkg-config targets e.g. as cudaPackages.cuda_cudart.tests.pkg-config, and we can configure our CI to run these tests. However, testMetaPkgConfig is also a little more complicated. We'd need to populate the pkgConfigModules list in meta, cf. the example. E.g. cuda_cudart includes a pkg-config/cuda-11.7.pc file that defines a Name: cuda target, so it would go in the list as meta.pkgConfigModules = [ "cuda" ];. However, different cuda components define different pkg-config targets and under different file names: e.g. cuda_nvrtc includes a pkg-config/nvrtc-11.7.pc with Name: nvrtc, and cudaPackages.libcublas contains pkg-config/cublas-11.7.pc with Name: cublas. I think we'd have write these out by hand, and it would also cumbersome to do that in build-cuda-redist-package.nix (we want to keep that generic expression simple). For per-package modifications we have the overrides.nix file. But we can chat about this one in details later, and there's enough stuff to do before we touch testMetaPkgConfig
  • Some packages do not ship any .pc files, e.g. cuda_nvcc doesn't seem to. We don't want to break them with our hooks
  • You can also test the .pc files manually by entering a nix-shell such as mkShell { packages = [ pkg-config ]; buildInputs = with cudaPackages; [ cuda_cudart cuda_nvrtc cuda_nvtx libcublas libcufft ]; } and running pkg-config --list-all, pkg-config --cflags --libs cuda cudart nvrtc ...
  • We'd want to verify that the configured flags actually are correct in the sense that we can use them to build a cuda program. Ideally we'd find a package that uses pkg-config to discover cuda, although, as I mentioned, people mostly use CMake (amen!)
  • Eventually we'd run a nixpkgs-review on the PR to verify we haven't broken any builds

This should be enough to get started

Footnotes

  1. By the way, there's a quick way to locate the expression for a package: nix edit nixpkgs#cudaPackages.cuda_cudart opens build-cuda-redist-package.nix in your editor.

@ghost
Copy link

ghost commented Apr 21, 2023

Thank you very much for such a long description and thank you for including the details that may possibly be redundant (which are not!).

Since this is my very first package issue and I wish to blog about it too (and I am pretty excited about it too, yay!) I will document it in detail including my analysis which may also go into much detail. So let's begin:

A note on impact: hypothetically, we could just delete the pkg-config files, because most of our downstream packages seem to be using cmake, meson, or bazel for builds, and they either rely on proper mechanisms like FindCUDAToolkit.cmake to discover CUDA, or they hard-code paths like /opt/something/cuda. However, the upstream keeps publishing the .pc files, so some people we don't see may be relying on them, so it's reasonable for us to keep them, as long as making them correct isn't too much of a burden. I think we should.

Analysis:

I see downstream packages and upstream is mentioned and I wasn't sure about this terminology. Upstream refers to the original software projects, while downstream refers to the packages in the Nix package repository (nixpkgs) that depend on those upstream projects. When a bug is fixed or a new version is released upstream, it's usually the responsibility of the downstream package maintainers to update their packages to incorporate those changes.

I see this quote "hard-code paths like /opt/something/cuda," is referring to the practice of hardcoding absolute paths in the build process or configuration files which looks to me as impure with the regard to how Nix packages are made. Why is that? Hardcoding paths can cause issues because it makes assumptions about the file system layout and the location of dependencies, which may not hold true for every user or system. Nix reproducible builds actually solve this problem, in this case by making these paths relative instead of absolute.

You guessed right, most of our CUDA-related expressions live in pkgs/development/compilers/cudatoolkit, and the actual cudaPackages.cuda_cudart package (and all other cudatoolkit components from recent releases) is built from build-cuda-redist-package.nix, which you also have already found1. The expression in build-cuda-redist-package.nix is re-used for every CUDA package, and the actual hashes (package names, URLs, hashes) come from JSON files published by nvidia along with cudatoolkit. It should be sufficient that we edit build-cuda-redist-package.nix appropriately

I open the linked JSON file published by nvidia and I see the relative paths which seem to be important to make the package functional. In the Nix expression, the relative_path is used along with the fetchurl function to download the appropriate source file for building the package. This part caught my attention that shows the relative paths for each platform. For example for the Linux it shows:

"linux-x86_64": {
            "relative_path": "cuda_cudart/linux-x86_64/cuda_cudart-linux-x86_64-11.7.60-archive.tar.xz",
            "sha256": "1c079add60a107f6dd9e72a0cc9cde03eb9d833506f355c22b9177c47a977552",
            "md5": "1ef515eb31691f2c43fb0de1443893a3",
            "size": "854744"
        },

But let's read further instructions.

It might be sufficient that we simply patch the existing .pc files to replace /usr/local/cuda-XX.Y with $out. This can be done in the installPhase before runHook postInstall. I imagine we could start by just trying sed, e.g. sed -i 's|^cudaroot=.*$|cudaroot=${placeholder "out"}|' $out/pkg-config/*.pc.

I am familiar with sed but have never actually used it and I am wondering at the same time if there is a more Nix-way to solve this too. It seems the suggested solution is to patch the existing .pc files by replacing /usr/local/cuda-XX.Y with $out in the installPhase before runHook postInstall.

installPhase = ''
  runHook preInstall
  rm LICENSE
  mkdir -p $out
  mv * $out

  # Patch the .pc files to replace /usr/local/cuda-XX.Y with $out
  sed -i 's|^cudaroot=.*$|cudaroot=${placeholder "out"}|' $out/pkg-config/*.pc

  runHook postInstall
'';

What is sed actually doing here? The sed command is used to replace the cudaroot path in all .pc files under $out/pkg-config. The ${placeholder "out"} expression is used to reference the $out variable within the single quotes of the sed command.

Now is there a more Nix way to solve this? Is there a Nix function that would do this operation instead of sed? There is a Nix function called substituteInPlace that we could use in the postInstall phase of the installPhase. This function is designed to replace strings in files within Nix derivations. This function is like the substitute function. For example I am looking at the nixpkgs manual and searching for examples of this function and we see this:

postPatch = ''
        substituteInPlace lib/zoneinfo.rs \
          --replace "/usr/share/zoneinfo" "${tzdata}/share/zoneinfo"
      '';

How could we apply this function in our case instead of sed and would that be a better alternative? Something like this:

postInstall = ''
  substituteInPlace $out/pkg-config/cudart-11.7.pc \
    --replace "cudaroot=/usr/local/cuda-11.7" "cudaroot=${placeholder "out"}"
'';

But another issue is that unlike sed, substituteInPlace does not support regular expressions it seems, so I wonder what would be the best path to take here. Let's move on.

UPDATE: Is the sed imperative solution? We dont need sed with nix expressions in the sense that sed uses regular expressions to actually move those directories in imperative way and not functional by substituting those directories with a placeholder value, a variable. sed directly makes a destructive update, changing the content of the file. Instead of modifying the original file directly, substituteInPlace creates a new file with the desired changes.

While we're at it, we probably should start using the validatePkgConfig and testMetaPkgConfig hooks: in build-cuda-redist-package.nix we'd need to append validatePkgConfig to the nativeBuildInputs list. We'll take validatePkgConfig as an argument: you add it after autoAddOpenGLRunpathHook at the top of the file, and it should get auto-wired from there. The hook itself is defined in all-packages.nix.

OK, let's look at the build-cuda-redist-package.nix:

  1. Add validatePkgConfig as an argument at the top of the file, after autoAddOpenGLRunpathHook.
{ lib
, stdenv
, backendStdenv
, fetchurl
, autoPatchelfHook
, autoAddOpenGLRunpathHook
, validatePkgConfig
}:
  1. Append validatePkgConfig to the nativeBuildInputs list:
nativeBuildInputs = [
  autoPatchelfHook
    # This hook will make sure libcuda can be found
    # in typically /lib/opengl-driver by adding that
    # directory to the rpath of all ELF binaries.
    # Check e.g. with `patchelf --print-rpath path/to/my/binary
  autoAddOpenGLRunpathHook
  validatePkgConfig
];

How is the hook defined in all-packages.nix? Let's see that too:

validatePkgConfig = makeSetupHook
    { name = "validate-pkg-config"; propagatedBuildInputs = [ findutils pkg-config ]; }
    ../build-support/setup-hooks/validate-pkg-config.sh;

Now I read test derivations and CI are mentioned too which sounds cool and definetly as something useful and functional to have when building packages. Tests are super important, but I am not familiar with how these things work

The testMetaPkgConfig is more useful: we can expose "test derivations" that verify availability of specific pkg-config targets e.g. as cudaPackages.cuda_cudart.tests.pkg-config, and we can configure our CI to run these tests.

We must somehow create a kind of test-derivation.nix that would use the testMetaPkgConfig hook within which would then check the availability of pkg-config targets.

That sounds super interesting!

For per-package modifications we have the overrides.nix file. But we can chat about this one in details later, and there's enough stuff to do before we touch testMetaPkgConfig.

OK so we move on and leave this for later. I am not familiar with overrides.nix and how they work but I do remember seeing a nice 2019? Nixcon talk by beloved Graham. Maybe I should check that too? Excellent!

Some packages do not ship any .pc files, e.g. cuda_nvcc doesn't seem to. We don't want to break them with our hooks

In order to ensure that the hooks don't break packages that don't ship any .pc files, how can we conditionally apply the hooks based on the presence of .pc files in the package? Upon further investigation I see that one way to achieve this is by using the optional function provided by Nix, which conditionally adds an element to a list based on a condition. Maybe we could use this function in nativeBuildInputs to conditionally add the hooks based on the presence of .pc files in the package.

You can also test the .pc files manually by entering a nix-shell such as mkShell { packages = [ pkg-config ]; buildInputs = with cudaPackages; [ cuda_cudart cuda_nvrtc cuda_nvtx libcublas libcufft ]; } and running pkg-config --list-all, pkg-config --cflags --libs cuda cudart nvrtc ...

Oh, let's try this right now!

mkShell { packages = [ pkg-config ]; buildInputs = with cudaPackages; [ cuda_cudart cuda_nvrtc cuda_nvtx libcublas libcufft ]; }
bash: syntax error near unexpected token `}'

Oh, we got an error! Was there a typo? Is something missing? I am instructed to enter nix shell and it seems to me I didn't actually start the nix shell. Maybe I need to pull this command after that:

nix-shell -E 'with import <nixpkgs> {}; mkShell { buildInputs = with cudaPackages; [ pkg-config cuda_cudart cuda_nvrtc cuda_nvtx libcublas libcufft ]; }'
error: Packagecuda_cudart-11.7.60in /nix/store/79fr5k4kky1p59y865ivnaxxq23y0nxg-source/pkgs/development/compilers/cudatoolkit/redist/build-cuda-redist-package.nix:55 has an unfree license (unfree), refusing to evaluate.

       a) To temporarily allow unfree packages, you can use an environment variable
          for a single invocation of the nix tools.

            $ export NIXPKGS_ALLOW_UNFREE=1

        Note: For `nix shell`, `nix build`, `nix develop` or any other Nix 2.4+
        (Flake) command, `--impure` must be passed in order to read this
        environment variable.

       b) For `nixos-rebuild` you can set
         { nixpkgs.config.allowUnfree = true; }
       in configuration.nix to override this.

       Alternatively you can configure a predicate to allow specific packages:
         { nixpkgs.config.allowUnfreePredicate = pkg: builtins.elem (lib.getName pkg) [
             "cuda_cudart"
           ];
         }

       c) For `nix-env`, `nix-build`, `nix-shell` or any other Nix command you can add
         { allowUnfree = true; }
       to ~/.config/nixpkgs/config.nix.
(use '--show-trace' to show detailed location information)

Another error! This tells us we need to export NIXPKGS_ALLOW_UNFREE=1 into our shell. So this tells me that by default the nix shell provides an environment where unfree packages aren't allowed? Let's export it and then try again:

export NIXPKGS_ALLOW_UNFREE=1

~ via 🐍 v3.10.10nix-shell -E 'with import <nixpkgs> {}; mkShell { buildInputs = with cudaPackages; [ pkg-config cuda_cudart cuda_nvrtc cuda_nvtx libcublas libcufft ]; }'
these 8 derivations will be built:
  /nix/store/53qvvhwj1mqdssx6lnpgymdpvv15jbxh-cuda_nvrtc-linux-x86_64-11.7.50-archive.tar.xz.drv
  /nix/store/4zk729lbyn8381ijj1ff1vh8wg03mv41-cuda_nvrtc-11.7.50.drv
  /nix/store/6ciynn4hdamj6ihym3icyh37ndybancc-libcufft-linux-x86_64-10.7.2.50-archive.tar.xz.drv
  /nix/store/maz1aq725zxk8wdp38b7nlfcx5lq45qv-libcublas-linux-x86_64-11.10.1.25-archive.tar.xz.drv
  /nix/store/7lvyc15c2f3qqjhwpxfgq0hc5w2wfic8-libcublas-11.10.1.25.drv
  /nix/store/hfr7kk26wqfbqs89qdi61n1f0wlza0c6-libcufft-10.7.2.50.drv
  /nix/store/sfccgfg4f0ry6d44grwa5zshxzryqy6j-cuda_nvtx-linux-x86_64-11.7.50-archive.tar.xz.drv
  /nix/store/xwqbkra4rhg02nadk8v792f93w51pi73-cuda_nvtx-11.7.50.drv
building '/nix/store/53qvvhwj1mqdssx6lnpgymdpvv15jbxh-cuda_nvrtc-linux-x86_64-11.7.50-archive.tar.xz.drv'...
building '/nix/store/sfccgfg4f0ry6d44grwa5zshxzryqy6j-cuda_nvtx-linux-x86_64-11.7.50-archive.tar.xz.drv'...
building '/nix/store/maz1aq725zxk8wdp38b7nlfcx5lq45qv-libcublas-linux-x86_64-11.10.1.25-archive.tar.xz.drv'...
building '/nix/store/6ciynn4hdamj6ihym3icyh37ndybancc-libcufft-linux-x86_64-10.7.2.50-archive.tar.xz.drv'...
...

And now we are back in business! There is a huge output running so let's see.. I wonder if it's necessary to copy the whole output here? Now it is unpacking a source archive into the /nix/store. Let's wait because it is taking a little while.. The output ends with fixupPhase completed in 31 seconds. Now let's try some of the commands I was instructed to do:

pkg-config --list-all

~ via 🐍 v3.10.1 via impure (nix-shell) 

We get nothing which means it is somehow not finding it in nix store? I thought while in nix shell everything is magically available. Well, let's try to find it?

echo $cuda_cudart

Hmm, still nothing. It seems that the environment variables are not set when entering a nix-shell. Instead, let's find the location of the .pc files by using find command to search the Nix store for the .pc files associated with the CUDA packages:

find /nix/store -name "*.pc" | grep cuda

/nix/store/vqqdcl22a5cynh9m7jxnmmhi4ylyy47r-cuda_nvtx-11.7.50/pkg-config/nvToolsExt-11.7.pc
/nix/store/4qz7zqrxd1anyhhi0fyix8dn4hqbifps-cuda_cudart-11.7.60/pkg-config/cudart-11.7.pc
/nix/store/4qz7zqrxd1anyhhi0fyix8dn4hqbifps-cuda_cudart-11.7.60/pkg-config/cuda-11.7.pc
/nix/store/vz5jd0y7wlj94cj4k3abcb94njz3f52d-cuda_nvrtc-11.7.50/pkg-config/nvrtc-11.7.pc

Oh and what do we do now? We must find a way to extract these directory paths and add them to our environment.
Great! Now we can set the PKG_CONFIG_PATH environment variable by concatenating these paths with colons.

export PKG_CONFIG_PATH="/nix/store/vqqdcl22a5cynh9m7jxnmmhi4ylyy47r-cuda_nvtx-11.7.50/pkg-config:/nix/store/4qz7zqrxd1anyhhi0fyix8dn4hqbifps-cuda_cudart-11.7.60/pkg-config:/nix/store/vz5jd0y7wlj94cj4k3abcb94njz3f52d-cuda_nvrtc-11.7.50/pkg-config"

Now when we try to run the pkg-config --list-all it still shows nothing! What do we do now? Let's make more checks:

Let's verify that the PKG_CONFIG_PATH environment variable is correctly set:

echo $PKG_CONFIG_PATH
/nix/store/vqqdcl22a5cynh9m7jxnmmhi4ylyy47r-cuda_nvtx-11.7.50/pkg-config:/nix/store/4qz7zqrxd1anyhhi0fyix8dn4hqbifps-cuda_cudart-11.7.60/pkg-config:/nix/store/vz5jd0y7wlj94cj4k3abcb94njz3f52d-cuda_nvrtc-11.7.50/pkg-config

Let's make sure the .pc files exist in the specified directories:

ls -l /nix/store/vqqdcl22a5cynh9m7jxnmmhi4ylyy47r-cuda_nvtx-11.7.50/pkg-config
ls -l /nix/store/4qz7zqrxd1anyhhi0fyix8dn4hqbifps-cuda_cudart-11.7.60/pkg-config
ls -l /nix/store/vz5jd0y7wlj94cj4k3abcb94njz3f52d-cuda_nvrtc-11.7.50/pkg-config
total 4
-r--r--r-- 2 root root 249 sij   1  1970 nvToolsExt-11.7.pc
total 8
-r--r--r-- 2 root root 234 sij   1  1970 cuda-11.7.pc
-r--r--r-- 2 root root 239 sij   1  1970 cudart-11.7.pc
total 4
-r--r--r-- 2 root root 259 sij   1  1970 nvrtc-11.7.pc

But still, pkg-config --list-all is not showing anything. I seem to be running in circles now. I am not sure what to do next. Maybe I did something wrong with the nix shell and missed some environment setting? I will need to look into this too. Let's finish and see what else we will have to do.

We'd want to verify that the configured flags actually are correct in the sense that we can use them to build a cuda program. Ideally we'd find a package that uses pkg-config to discover cuda, although, as I mentioned, people mostly use CMake (amen!)

Oh, then can we make a small hello-world cuda program to test if this works?

Conclusion

Overall this is a wonderful experience and thank you very much for insightful advices and directions. I would very much like to continue helping with this. I will study the steps made and try to learn more about it. Among others the nix shell issue. Let me know how to continue from here too <3

@SomeoneSerge
Copy link
Contributor Author

pkg-config --list-all ... empty output

Ah, I guess we need to move $out/pkg-config to $out/lib/pkg-config for pkg-config setup hooks to pick these up automatically. E.g. once you enter mkShell { packages = [ pkg-config ]; buildInputs = [ zlib ]; }, PKG_CONFIG_PATH should already be extended with a path to zlib.pc's directory

You can do this with mv and in the same installPhase

But another issue is that unlike sed, substituteInPlace does not support regular expressions it seems

Yes, and these .pc files include the cuda version (e.g. 11.7) which is different e.g. for cudaPackages_11_7 and cudaPackages_11_8. It would be better to do without sed, but we also don't want to overcomplicate things without good reason

Is the sed imperative solution?

Yes, and in fact a lot of nixpkgs is (managed) "imperative". A way to think about this is that Bash is the effect system for the purely-functional Nix. Of course, we do prefer declarative solutions

with import <nixpkgs> {};

Note that <nixpkgs> is impure and it resolves into the channel your system is configured to follow (you can just enter nix repl and see what <nixpkgs> evaluates to). In particular, this way you won't see the effect of the changes you made to cuda packages. You'll have to import the particular nixpkgs revision you're working on, e.g. you can enter its directory and import ./. { ... }

$ export NIXPKGS_ALLOW_UNFREE=1

Yes, and note that this also only works because nix-shell (unlike the new nix shell, which may be confusing) is impure and reads environment variables. As the message suggests, you can also just import <nixpkgs> { config.allowUnfree = true; } if you like, instead of relying on environment variables

OK so we move on and leave this for later. I am not familiar with overrides.nix

The file itself is specific to our cuda packages, but the mechanisms are common in nixpkgs. One is overrideAttrs which lets you modify which arguments end up going into mkDerivation. For example, in build-cuda-redist-package.nix we already pass the attribute "meta" to mkDerivation, but that meta doesn't include pkgConfigModules. For testMetaPkgConfig to work we need to piggyback that attribute, and we need to piggyback different values for different cuda packages. This is what we can do in overrides.nix:

  cuda_cudart = prev.cuda_cudart.overrideAttrs (oldAttrs: {
    meta = oldAttrs.meta // {
        pkgConfigModules = [ "cuda" "cudart" ];
    };
  });

Here, oldAttrs will be the exact same dictionary we've defined in build-cuda-redist-packages.nix, but the dictionary that final mkDerivation will see will have meta replaced with one that additionally has pkgConfigModules.

Extra info that you can totally live without. There are, in fact, two mechanisms at play here, and the second has to do with where prev and final variables come from in that file. As you can guess prev.cuda_cudart refers to an intermediate version of the cudart derivation that hasn't been embellished with meta.pkgConfigModules yet and that will never in fact get fully evaluated or built. The prev itself is an intermediate version of the cudaPackages dictionary

Oh, then can we make a small hello-world cuda program to test if this works?

Yes, that would be the reasonable next step once we get pkg-config working "automatically"

I open the linked JSON file published by nvidia and I see the relative paths which seem to be important to make the package functional

Aha, maybe I should have pointed out that this is somewhat specific to how we handle cudatoolkit, which is unlike "normal" Nix packages: we don't have access to sources, so we download pre-built binaries from nvidia and ensure that they can be used within nixpkgs. We do so in a semi-automated manner by re-using these jsons in the nix expression

@SomeoneSerge
Copy link
Contributor Author

Thinking of it, maybe we could wrap this one up quickly and switch to some issue where you could see the practical benefit of your work 🙃 E.g. #224533 shouldn't be hard to do, but you would be able to measure the impact in gigabytes of saved storage when including something like pytorch in Nix-built docker images

I think you can already open a draft PR with the .pc patches and @ reference this issue. There we can link and discuss concrete pieces of code

@ghost
Copy link

ghost commented Apr 22, 2023

I hope it is OK I am writing all the steps which are obvious for you and thank you for the patience and guidance. As you can tell I am a beginner so I know if you see some weird steps that I am making you will let me know

I begin like this:

cd nixpkgs

Create and switch to a new branch for changes:

git checkout -b fix-cuda-pkg-config

nix edit nixpkgs#cudaPackages.cuda_cudart opens build-cuda-redist-package.nix in your editor:

Great tip!

nix edit nixpkgs#cudaPackages.cuda_cudart

Now the fix for the missing pkg-config --list-all output is:

Ah, I guess we need to move $out/pkg-config to $out/lib/pkg-config for pkg-config setup hooks to pick these up automatically. E.g. once you enter mkShell { packages = [ pkg-config ]; buildInputs = [ zlib ]; }, PKG_CONFIG_PATH should already be extended with a path to zlib.pc's directory

You can do this with mv and in the same installPhase

Here's the breakdown as I understand:

  1. Create a new directory for pkg-config: mkdir -p $out/lib/pkg-config. I guess this doesn't exist so we have to create it. Aha, the lib directory should already exist so with -p flag in mkdir -p we make sure it creates the directory path if any part of it is missing.

  2. Move the contents of pkg-config to the new directory: mv $out/pkg-config/* $out/lib/pkg-config.

  3. Update the path in the sed command to patch the .pc files: $out/lib/pkg-config/*.pc.

# Patch the .pc files to replace /usr/local/cuda-XX.Y with $out
    sed -i 's|^cudaroot=.*$|cudaroot=${placeholder "out"}|' $out/lib/pkg-config/*.pc

A way to think about this is that Bash is the effect system for the purely-functional Nix. Of course, we do prefer declarative solutions

Mhmm this makes sense. Does this also mean, and I'm just thinking out loud, that since bash is the actual effect system (I'm not aware how huge is this) that there are numerous opportunities for declarative programs to be built on top on Nix that would functionally encapsulate the effects? In a way Nix is a very fertile ground. OK

You'll have to import the particular nixpkgs revision you're working on, e.g. you can enter its directory and import ./. { ... }

What does this mean actually? Are you refering to a specific nixpkgs version and leaving me with a placeholder { ... } or can one put {} too which will by default load the available one? Am I supposed to make a shell.nix environment where this specific <nixpkgs> will be declared and then work from that? I understand that it is probably importing the available nixpkgs in my cloned directory. Something like:

with import ./. { ... };

mkShell {
  packages = [ pkg-config ];
  buildInputs = [ zlib ];
}

I do have some reading to do to better understand all this and the difference between pure and impure nix-shell, so thank you also for being patient with me. I wonder if nix shell is out of the question with these packages since it is a newer experimental command?

OK, this is all super interesting, and i am doing it in spare time so I will keep digging at this. Regarding the second issue you referenced I investigated it a bit and will work on it tonight after which I will write about it too.

@SomeoneSerge
Copy link
Contributor Author

mkdir -p $out/lib/pkg-config ... mv $out/pkg-config/* $out/lib/pkg-config

Or just

mkdir $out/lib
mv $out/pkg-config $out/lib/

What does this mean actually? Are you refering to a specific nixpkgs version

So, by "nixpkgs" I refer to any check out of the nixpkgs git repo. When you write import /path/to/nixpkgs/ what Nix does is it reads and evaluates /path/to/nixpkgs/default.nix. An example of that is import <nixpkgs>, where <nixpkgs> is being expanded depending on how your system is configured. E.g. I get:

❯ nix eval --impure --expr "<nixpkgs>"
/etc/nixpkgs/channels/nixpkgs
❯ ls /etc/nixpkgs/channels/nixpkgs -l
lrwxrwxrwx 1 root root 50 Mar 17 02:29 /etc/nixpkgs/channels/nixpkgs -> /nix/store/rmw0kq593g38kj9dwpg5ka81bbskh9dm-source
❯ ls /etc/nixpkgs/channels/nixpkgs
CONTRIBUTING.md  COPYING  default.nix  doc  flake.nix  lib  maintainers  nixos  pkgs  README.md

Your path will likely be different, but the structure will be the same and it'll match that of https://github.com/NixOS/nixpkgs/

and leaving me with a placeholder { ... } or can one put {} too which will by default load the available one

Now, after import <nixpkgs> comes the import <nixpkgs> { }.
If you look at <nixpkgs>/default.nix, you'll find that it is actually a function.
It accepts arguments such as config, overlays, system.
For example, you can call nixpkgs as import <nixpkgs> { config = { allowUnfree = true; } } and then you'll get an instance of nixpkgs where unfree errors are disabled.

with import ./. { ... };

A required argument of import <nixpkgs> is system, which is a string that looks like "x86_64-linux".
However, when you use nix-shell this argument is allowed to default builtins.currentSystem which evaluates "impurely" into the current system's platform. We call this "impure" because it depends on environment: on one machine you'll get x86_64-linux, and on another you'll get aarch64-darwin.

As a result, you can just use with import ./. { } to get an instance of the nixpkgs package set.
The result of import ./. { } (or import <nixpkgs> { }) is a dictionary that contains attributes like lib, and cudaPackages, and bash, etc.

I wonder if nix shell is out of the question with these packages since it is a newer experimental command?

We use both in practice

@SomeoneSerge
Copy link
Contributor Author

Am I supposed to make a shell.nix

Yes, what I meant is that you can test pkg-config like so:

$ cat << EOF > pkg-config-shell.nix
with import ./. { };

mkShell {
  packages = [ pkg-config ];
  buildInputs = [ zlib ];
}
EOF
$ nix-shell pkg-config-shell.nix
$ pkg-config --list-all
$ pkg-config --cflags --libs zlib

@SomeoneSerge
Copy link
Contributor Author

Update the path in the sed command to patch the .pc files: $out/lib/pkg-config/*.pc.

Github PR web interface is rather convenient to discuss such details!

@ghost
Copy link

ghost commented Apr 22, 2023

Update the path in the sed command to patch the .pc files: $out/lib/pkg-config/*.pc.

Github PR web interface is rather convenient to discuss such details!

Oh I think I understand now what you mean!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken 3.skill: good-first-bug This seems like it's fixable by a non-expert 3.skill: trivial This is trivial to complete (typically find-and-replace) 6.topic: cuda Parallel computing platform and API
Projects
Status: 🔮 Roadmap
Development

No branches or pull requests

1 participant