Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Use Spack OCI build cache for MPI packages #788

Merged
merged 12 commits into from
Nov 3, 2023

Conversation

giordano
Copy link
Member

@giordano giordano commented Nov 3, 2023

Fix #744.

@giordano giordano added the CI label Nov 3, 2023
@giordano
Copy link
Member Author

giordano commented Nov 3, 2023

@haampie libmpi can't be dlopened automatically, do we need to set LD_LIBRARY_PATH? To what?

@haampie
Copy link

haampie commented Nov 3, 2023

dlopen'ed from where? Is julia using runpaths? That can cause rpaths to be ignored when locating dependencies of the library :) ldd looks OK:

$ docker run --rm ghcr.io/juliaparallel/github-actions-buildcache:mvapich2-2.3.7-1-hs7gkcclsnk55kqm52a4behdnt3dug6b.spack ldd /home/runner/work/github-actions-buildcache/github-actions-buildcache/spack/opt/spack/__spack_path_placeholder__/__spack_path_pl/linux-ubuntu22.04-x86_64_v2/gcc-12.3.0/mvapich2-2.3.7-1-hs7gkcclsnk55kqm52a4behdnt3dug6b/lib/libmpi.so.12.1.1
	linux-vdso.so.1 (0x00007ffe6e35d000)
	libpciaccess.so.0 => /home/runner/work/github-actions-buildcache/github-actions-buildcache/spack/opt/spack/__spack_path_placeholder__/__spack_path_pl/linux-ubuntu22.04-x86_64_v2/gcc-12.3.0/libpciaccess-0.17-xvngbzv47q7wrgngonckbjmovahta4qz/lib/libpciaccess.so.0 (0x00007f7114f43000)
	libxml2.so.2 => /home/runner/work/github-actions-buildcache/github-actions-buildcache/spack/opt/spack/__spack_path_placeholder__/__spack_path_pl/linux-ubuntu22.04-x86_64_v2/gcc-12.3.0/libxml2-2.10.3-yd6slxex5luxaq7kcprrwnsdgoc6ikem/lib/libxml2.so.2 (0x00007f7114dd7000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f7114cee000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f7114cce000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7114aa6000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f7115c2b000)
	libz.so.1 => /home/runner/work/github-actions-buildcache/github-actions-buildcache/spack/opt/spack/__spack_path_placeholder__/__spack_path_pl/linux-ubuntu22.04-x86_64_v2/gcc-12.3.0/zlib-ng-2.1.4-baeubzewdmp4e5aou76dhnq4ed3ttwcp/lib/libz.so.1 (0x00007f7114a7d000)
	liblzma.so.5 => /home/runner/work/github-actions-buildcache/github-actions-buildcache/spack/opt/spack/__spack_path_placeholder__/__spack_path_pl/linux-ubuntu22.04-x86_64_v2/gcc-12.3.0/xz-5.4.1-rd2bos6kq3id4homxzin5tppn3724jmh/lib/liblzma.so.5 (0x00007f7114a4c000)
	libiconv.so.2 => /home/runner/work/github-actions-buildcache/github-actions-buildcache/spack/opt/spack/__spack_path_placeholder__/__spack_path_pl/linux-ubuntu22.04-x86_64_v2/gcc-12.3.0/libiconv-1.17-t7f7kkjxsflwhs3fwyq5nsdfatyfm6io/lib/libiconv.so.2 (0x00007f711493e000)

@giordano
Copy link
Member Author

giordano commented Nov 3, 2023

dlopen just by name libmpi 🙂

@giordano
Copy link
Member Author

giordano commented Nov 3, 2023

Ok, giving a hint of where libmpi is did the trick: 211c743

@haampie
Copy link

haampie commented Nov 3, 2023

Ah okay, yes, you can set LD_LIBRARY_PATH for all runtime packages statically with

spack:
  modules:
    prefix_inspections:
      lib: ["LD_LIBRARY_PATH"]
      lib64: ["LD_LIBRARY_PATH"]

in the env, and then spack buildcache push -f (shouldn't require a rebuild, just a one time --force to create new manifest files that contain env variables).

@giordano giordano force-pushed the mg/ci-spack-oci branch 8 times, most recently from ec26bf5 to 31a132c Compare November 3, 2023 15:22
@giordano
Copy link
Member Author

giordano commented Nov 3, 2023

Ok, this is all working well...except GHA is messing up with an environment variable:

- name: Set MPITRAMPOLINE_MPIEXEC
run: echo "MPITRAMPOLINE_MPIEXEC=$(which mpiexec)" >> "${GITHUB_ENV}"
- name: Build MPIwrapper
run: |
echo ${MPITRAMPOLINE_MPIEXEC}
echo $(realpath ${MPITRAMPOLINE_MPIEXEC})
${MPITRAMPOLINE_MPIEXEC} --version
I set MPITRAMPOLINE_MPIEXEC to which mpiexec, and then try to print it. https://github.com/JuliaParallel/MPI.jl/actions/runs/6746922493/job/18341922748?pr=788#step:5:23 shows that MPITRAMPOLINE_MPIEXEC is set to /home/runner/work/github-actions-buildcache/github-actions-buildcache/spack/opt/spack/__spack_path_placeholder__/__spack_path_pl/linux-ubuntu22.04-x86_64_v2/gcc-12.3.0/intel-mpi-2019.9.304-ecfipz6mxgepmrkwp5dl5oohion5m54r/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpiexec (which is the right value), but then when I print its value I get /__w/github-actions-buildcache/github-actions-buildcache/spack/opt/spack/__spack_path_placeholder__/__spack_path_pl/linux-ubuntu22.04-x86_64_v2/gcc-12.3.0/intel-mpi-2019.9.304-ecfipz6mxgepmrkwp5dl5oohion5m54r/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpiexec, which is a non-existing path, basically changing /home/runner/work into /__w. No clue why GHA is doing this non-sense.

Edit: the docker create command looks like

  /usr/bin/docker create --name 539be9ed0fc748a1bddbc6785b503709_ghcriojuliaparallelgithubactionsbuildcacheintelmpi20199304ecfipz6mxgepmrkwp5dl5oohion5m54rspack_5440bb --label 24df96 --workdir /__w/MPI.jl/MPI.jl --network github_network_a5dca70b3d414352aa1704e4ea8d5b99  -e "HOME=/github/home" -e GITHUB_ACTIONS=true -e CI=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/home/runner/work":"/__w" -v "/home/runner/runners/2.311.0/externals":"/__e":ro -v "/home/runner/work/_temp":"/__w/_temp" -v "/home/runner/work/_actions":"/__w/_actions" -v "/opt/hostedtoolcache":"/__t" -v "/home/runner/work/_temp/_github_home":"/github/home" -v "/home/runner/work/_temp/_github_workflow":"/github/workflow" --entrypoint "tail" ghcr.io/juliaparallel/github-actions-buildcache:intel-mpi-2019.9.304-ecfipz6mxgepmrkwp5dl5oohion5m54r.spack "-f" "/dev/null"

Note -v "/home/runner/work":"/__w", so it feels like GHA has some internal automapping /home/runner/work -> /__w, which however is failing badly in our case, sigh. Ok, manually creating a symlink does the trick. See actions/runner#2185

@giordano giordano force-pushed the mg/ci-spack-oci branch 3 times, most recently from 2c2c1b8 to 662f606 Compare November 3, 2023 16:31
@haampie
Copy link

haampie commented Nov 3, 2023

I've updated the paths in the buildcache to /opt/spack, which sounds like a more sensible location in general. Fortunately it's writable...

@giordano
Copy link
Member Author

giordano commented Nov 3, 2023

Alright, this is working great now! I'll uncomment all other jobs and then merge. Thanks @haampie for all the help!

@giordano giordano marked this pull request as ready for review November 3, 2023 17:19
Intel MPI is currently broken, better to use oneAPI MPI.
Now we have Spack build cache in `/opt/spack`, so that we shouldn't hit the
GitHub Action issue.
@giordano
Copy link
Member Author

giordano commented Nov 3, 2023

Alright, this PR cuts CI time by about 1/3, mainly by bringing the mvapich2 job from 45 minutes to less than 5. There are still some failures around (some of which tracked by #749), but these are unrelated to this PR.

Thanks @haampie for implementing the OCI caching feature in spack and all the help here! 🚀

@giordano giordano merged commit 376bc11 into JuliaParallel:master Nov 3, 2023
36 of 44 checks passed
@giordano giordano deleted the mg/ci-spack-oci branch November 3, 2023 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] Use Spack + OCI buildcache when widely available for system MPI
2 participants