Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify declaration of making Alpaka-based library to depend on the main library of the same package #45415

Closed
makortel opened this issue Jul 9, 2024 · 24 comments

Comments

@makortel
Copy link
Contributor

makortel commented Jul 9, 2024

Spinoff from #45285 (comment)

Question

would it be feasible to add the this dependency from the alpaka-backend libraries to the "main library" automatically for all Alpaka-enabled packages?

Answer

The problem is that this will add a lot of extra dependencies to the alpaka libraries, including some dependencies on CUDA that we should not add to the non-CUDA backends.

It may be possible to improve this later, once

When the two items have been completed, we could try to simplify how to make the Alpaka-based libraries of a package to depend on the main library of the same package.

@makortel
Copy link
Contributor Author

makortel commented Jul 9, 2024

assign core, heterogeneous

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 9, 2024

New categories assigned: core,heterogeneous

@Dr15Jones,@fwyzard,@makortel,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 9, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 9, 2024

A new Issue was created by @makortel.

@Dr15Jones, @antoniovilela, @makortel, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

On a further thought

The problem is that this will add a lot of extra dependencies to the alpaka libraries, including some dependencies on CUDA that we should not add to the non-CUDA backends.

isn't this happening already? I mean, for example in,

<use name="FWCore/Utilities"/>
<use name="CondFormats/Serialization"/>
<use name="DataFormats/EcalDetId"/>
<use name="DataFormats/EcalDigi"/>
<use name="DataFormats/Math"/>
<use name="DataFormats/Portable"/>
<use name="DataFormats/SoATemplate"/>
<use name="HeterogeneousCore/CUDACore"/>
<use name="HeterogeneousCore/CUDAUtilities"/>
<use name="boost"/>
<use name="boost_serialization"/>
<use name="rootmath"/>
<use name="clhep"/>
<use name="cuda"/>
<use name="HeterogeneousCore/AlpakaInterface"/>
<flags ALPAKA_BACKENDS="!serial"/>
<export>
<lib name="1"/>
</export>

aren't the per-Alpaka-backend libraries linked to all the dependencies declared above anyhow?

At least in 14_1_0_pre5 I see

$ ldd $CMSSW_RELEASE_BASE/lib/el8_amd64_gcc12/libCondFormatsEcalObjectsROCmAsync.so | grep -i cuda
        libHeterogeneousCoreCUDACore.so => /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_0_pre5/lib/el8_amd64_gcc12/libHeterogeneousCoreCUDACore.so (0x00007f8857af1000)
        libHeterogeneousCoreCUDAServices.so => /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_0_pre5/lib/el8_amd64_gcc12/libHeterogeneousCoreCUDAServices.so (0x00007f8857abd000)
        libCUDADataFormatsCommon.so => /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_0_pre5/lib/el8_amd64_gcc12/libCUDADataFormatsCommon.so (0x00007f8857ab4000)
        libHeterogeneousCoreCUDAUtilities.so => /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_0_pre5/lib/el8_amd64_gcc12/libHeterogeneousCoreCUDAUtilities.so (0x00007f88579a6000)
        libcudart.so.12 => /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_0_pre5/external/el8_amd64_gcc12/lib/libcudart.so.12 (0x00007f8855f9b000)
        libnvidia-ml.so.1 => /cvmfs/cms.cern.ch/el8_amd64_gcc12/external/cuda/12.4.1-fc5cb0e72dba64b6abbf00089f3a044c/lib64/stubs/libnvidia-ml.so.1 (0x00007f8855b82000)
        libcuda.so.1 => /cvmfs/cms.cern.ch/el8_amd64_gcc12/external/cuda/12.4.1-fc5cb0e72dba64b6abbf00089f3a044c/lib64/stubs/libcuda.so.1 (0x00007f885561b000)

@makortel
Copy link
Contributor Author

The problem is that this will add a lot of extra dependencies to the alpaka libraries, including some dependencies on CUDA that we should not add to the non-CUDA backends.

isn't this happening already?

@smuzaffar @fwyzard Any thoughts? As far as I can tell, making each per-Alpaka-backend library to depend on the main library of the same package would not make the dependencies of the per-Alpaka-backend libraries any worse they are already.

@fwyzard
Copy link
Contributor

fwyzard commented Aug 13, 2024

If adding the dependency by default does not actually add more dependencies to the alpaka-based libraries, I would be in favour of doing it.

We should start removing the CUDA-only support early in 14.2.x, which will remove some of the more problematic dependencies.

In addition -Wl,--as-needed could help removing the unused dependencies.
@smuzaffar what happened to cms-sw/cmsdist#9232 ?

@fwyzard
Copy link
Contributor

fwyzard commented Aug 13, 2024

As a test, I've checked out the full CMSSW_14_1_0_pre6, added

<use name="..." for="alpaka"/>

to all packages that build alpaka libraries, and rebuilt all packages.

Comparing the dependencies of the rebuilt alpaka libraries with the original ones, the only difference is the extra dependency on the non-alpaka library itself - all other dependencies were there already, as @makortel observed.

For example, for RecoLocalTracker/SiPixelRecHits, the change is

$ git diff RecoLocalTracker/SiPixelRecHits              
diff --git a/RecoLocalTracker/SiPixelRecHits/BuildFile.xml b/RecoLocalTracker/SiPixelRecHits/BuildFile.xml
index 62787f4c989c..0654e442fbee 100644
--- a/RecoLocalTracker/SiPixelRecHits/BuildFile.xml
+++ b/RecoLocalTracker/SiPixelRecHits/BuildFile.xml
@@ -15,6 +15,7 @@
 <use name="HeterogeneousCore/CUDAUtilities"/>
 <use name="RecoLocalTracker/ClusterParameterEstimator"/>
 <flags ALPAKA_BACKENDS="1"/>
+<use name="RecoLocalTracker/SiPixelRecHits" for="alpaka"/>
 <export>
   <lib name="1"/>
 </export>

The impact on the libraries is

$ ldd $CMSSW_RELEASE_BASE/lib/$SCRAM_ARCH/libRecoLocalTrackerSiPixelRecHitsROCmAsync.so | sed -e's/ (0x\w\+)//' | sort > original.dep
$ ldd $CMSSW_BASE/lib/$SCRAM_ARCH/libRecoLocalTrackerSiPixelRecHitsROCmAsync.so | sed -e's/ (0x\w\+)//' | sort > modified.dep
$ diff -u original.dep modified.dep 
--- original.dep        2024-08-13 10:54:58.524045103 +0200
+++ modified.dep        2024-08-13 10:55:03.804125494 +0200
@@ -95,6 +95,7 @@
        libPhysics.so => /data/cmssw/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_0_pre6/external/el8_amd64_gcc12/lib/libPhysics.so
        libRIO.so => /data/cmssw/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_0_pre6/external/el8_amd64_gcc12/lib/libRIO.so
        libRecoLocalTrackerClusterParameterEstimator.so => /data/user/fwyzard/test_alpaka_link/CMSSW_14_1_0_pre6/lib/el8_amd64_gcc12/libRecoLocalTrackerClusterParameterEstimator.so
+       libRecoLocalTrackerSiPixelRecHits.so => /data/user/fwyzard/test_alpaka_link/CMSSW_14_1_0_pre6/lib/el8_amd64_gcc12/libRecoLocalTrackerSiPixelRecHits.so
        libRint.so => /data/cmssw/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_0_pre6/external/el8_amd64_gcc12/lib/libRint.so
        libSmatrix.so => /data/cmssw/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_0_pre6/external/el8_amd64_gcc12/lib/libSmatrix.so
        libThread.so => /data/cmssw/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_0_pre6/external/el8_amd64_gcc12/lib/libThread.so

The only difference is the explicit dependency on libRecoLocalTrackerSiPixelRecHits.so.

@smuzaffar
Copy link
Contributor

In addition -Wl,--as-needed could help removing the unused dependencies. @smuzaffar what happened to cms-sw/cmsdist#9232 ?

@fwyzard , this was closed in favor of CMSSW_14_1_ASNEEDED_X IBs which are build with -Wl,--as-needed. CMSSW_14_1_ASNEEDED_X IBs build without errors but we have runtime errors. Once we get ride of those then we can enable -Wl,--as-needed for all

@fwyzard
Copy link
Contributor

fwyzard commented Aug 13, 2024

Where can I find the CMSSW_14_1_ASNEEDED_X IBs ?
I didn't see any one the IB page.

@smuzaffar
Copy link
Contributor

We built those on demand, I just have restarted one (it should be available in few hours) and also changed the configuration to automatically build 3 times a week

@smuzaffar
Copy link
Contributor

@fwyzard , CMSSW_14_1_ASNEEDED_X_2024-08-12-2300 is available now

@fwyzard
Copy link
Contributor

fwyzard commented Aug 13, 2024

Thanks.

@makortel
Copy link
Contributor Author

makortel commented Nov 8, 2024

If adding the dependency by default does not actually add more dependencies to the alpaka-based libraries, I would be in favour of doing it.

Comparing the dependencies of the rebuilt alpaka libraries with the original ones, the only difference is the extra dependency on the non-alpaka library itself - all other dependencies were there already, as @makortel observed.

@smuzaffar Would it be feasible to make each per-Alpaka-backend library to depend on the main library? (to simplify the BuildFile.xml setup)

@smuzaffar
Copy link
Contributor

smuzaffar commented Nov 13, 2024

@smuzaffar Would it be feasible to make each per-Alpaka-backend library to depend on the main library?

@makortel , yes

By the way, with #46684 ( and corresponding cmsdist PR) we should be able to enable --as-needded flag for cmssw

@smuzaffar
Copy link
Contributor

ASNEEDED_X IBs which are built with -Wl,--as-needded flag are error free now. Any objection on activating -Wl,--as-needded for 15.0.X ?

@makortel
Copy link
Contributor Author

Any objection on activating -Wl,--as-needded for 15.0.X ?

Fine with me.

@smuzaffar
Copy link
Contributor

smuzaffar commented Nov 27, 2024

-Wl,--as-needded is enabled for 15.0.X now

@smuzaffar
Copy link
Contributor

smuzaffar commented Nov 27, 2024

cms-sw/cmsdist#9537 should enable each alpaka backend library to depend on the package's main library

@fwyzard
Copy link
Contributor

fwyzard commented Nov 29, 2024

#46829 removes the explicit declarations of dependency from alpaka libraries on the non-alpaka main library.

@fwyzard
Copy link
Contributor

fwyzard commented Nov 29, 2024

I think we can close this issue ?

@smuzaffar
Copy link
Contributor

+core

@fwyzard
Copy link
Contributor

fwyzard commented Nov 29, 2024

+heterogeneous

@cmsbuild
Copy link
Contributor

This issue is fully signed and ready to be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants