Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Kokkos 4.5.00 #144

Open
jacobmerson opened this issue Sep 10, 2024 · 10 comments
Open

Upgrade Kokkos 4.5.00 #144

jacobmerson opened this issue Sep 10, 2024 · 10 comments
Assignees

Comments

@jacobmerson
Copy link
Collaborator

Kokkos 4.4.00 and greater have mdspan built in which conflicts with the version we compile inside PCMS.

@jacobmerson jacobmerson changed the title Upgrade Kokkos 4.4.00 Upgrade Kokkos 4.5.00 Dec 18, 2024
@cwsmith
Copy link
Contributor

cwsmith commented Jan 15, 2025

@jacobmerson Do we care about maintaining support for Kokkos 4.4.00 and older that don't have the vendored mdspan enabled? I.e., can we require that Kokkos provides mdspan? If so, it will make the cmake and source code simpler. Specifically, the pcms vendored mdspan provides std::experimental::mdspanStuff while the kokkos vendored mdspan provides Kokkos::mdspanStuff (there may be a kokkos cmake flag to change this).

@jacobmerson
Copy link
Collaborator Author

I don’t think we need to support old Kokkos.

@cwsmith
Copy link
Contributor

cwsmith commented Jan 15, 2025

Here is my in-progress branch attempting to move to kokkos 4.5.01: https://github.com/SCOREC/pcms/tree/cws/kokkos45

Using mdspan from kokkos was going fairly smoothly until I hit the mdarray usage. Specifically, adding #include "mdspan/mdarray.hpp" (after #include "mdspan/mdspan.hpp") resulted in the following compile error:

In file included from /space/cwsmith/pcms/builds/AMPERE86/kokkos/install/include/mdspan/mdarray.hpp:29,
                 from /space/cwsmith/pcms/pcms/src/pcms/arrays.h:4,
                 from /space/cwsmith/pcms/pcms/src/pcms/transfer_field.h:4,
                 from /space/cwsmith/pcms/pcms/src/pcms/common.h:4,
                 from /space/cwsmith/pcms/pcms/src/pcms/server.h:3,
                 from /space/cwsmith/pcms/pcms/src/pcms.h:5,
                 from /space/cwsmith/pcms/pcms/src/pcms.cpp:1:
/space/cwsmith/pcms/builds/AMPERE86/kokkos/install/include/mdspan/../experimental/__p1684_bits/mdarray.hpp:19:10: fatal error: ../mdspan: No such file or directory
   19 | #include "../mdspan"
      |          ^~~~~~~~~~~
compilation terminated.

I think this may be a bug in the way kokkos installs these headers. The following uninformed hack to Kokkos 4.5.01 gets me past the above error, but results in several errors about Kokkos::mdarray not existing.

(ins)smithc11@checkers: /space/cwsmith/pcms/kokkos (cws/mdarray)$ git diff 4.5.01
diff --git a/core/src/CMakeLists.txt b/core/src/CMakeLists.txt
index 72663739a..303b4e2c4 100644
--- a/core/src/CMakeLists.txt
+++ b/core/src/CMakeLists.txt
@@ -165,6 +165,7 @@ if(Kokkos_ENABLE_IMPL_MDSPAN)
     kokkos_lib_include_directories(kokkoscore ${KOKKOS_SOURCE_DIR}/tpls/mdspan/include)
 
     append_glob(KOKKOS_CORE_HEADERS ${KOKKOS_SOURCE_DIR}/tpls/mdspan/include/experimental/__p0009_bits/*.hpp)
+    append_glob(KOKKOS_CORE_HEADERS ${KOKKOS_SOURCE_DIR}/tpls/mdspan/include/experimental/__p1684_bits/*.hpp) # mdarray
     append_glob(KOKKOS_CORE_HEADERS ${KOKKOS_SOURCE_DIR}/tpls/mdspan/include/experimental/mdspan)
 
     install(
diff --git a/tpls/mdspan/include/experimental/__p1684_bits/mdarray.hpp b/tpls/mdspan/include/experimental/__p1684_bits/mdarray.hpp
index bdc5925f7..449c2869a 100644
--- a/tpls/mdspan/include/experimental/__p1684_bits/mdarray.hpp
+++ b/tpls/mdspan/include/experimental/__p1684_bits/mdarray.hpp
@@ -16,7 +16,6 @@
 
 #pragma once
 
-#include "../mdspan"
 #include <cassert>
 #include <vector>
 

The first errors after the above kokkos hack are:

cd /space/cwsmith/pcms/builds/AMPERE86/pcms/src && /space/cwsmith/pcms/builds/AMPERE86/kokkos/install/bin/kokkos_launch_compiler /space/cwsmith/pcms/builds/AMPERE86/kokkos/install/bin/nvcc_wrapper /opt/scorec/spack/rhel9/v0201_4/install/linux-rhel9-x86_64/gcc-12.3.0/mpich-4.1.1-xpoyz4tqgfxtrm6m7qq67q4ccp5pnlre/bin/mp
icxx /opt/scorec/spack/rhel9/v0201_4/install/linux-rhel9-x86_64/gcc-12.3.0/mpich-4.1.1-xpoyz4tqgfxtrm6m7qq67q4ccp5pnlre/bin/mpicxx -DADIOS2_USE_MPI -DFMT_SHARED -DKOKKOS_DEPENDENCE -DMPICH_SKIP_MPICXX -DOMPI_SKIP_MPICXX -DPCMS_HAS_CLIENT -DPCMS_HAS_OMEGA_H -DPCMS_HAS_SERVER -DPCMS_PRINT_ENABLED -DPCMS_SPDLOG_ENABLED 
-DPERFSTUBS_USE_TIMERS -DSPDLOG_COMPILED_LIB -DSPDLOG_SHARED_LIB -D_MPICC_H -I/space/cwsmith/pcms/builds/AMPERE86/pcms/src -I/space/cwsmith/pcms/pcms/src -isystem /space/cwsmith/pcms/builds/AMPERE86/redev/install/include -isystem /opt/scorec/spack/rhel9/v0201_4/install/linux-rhel9-x86_64/gcc-12.3.0/adios2-2.9.0-nztsv
7oahftcta2dntesql2wkep6kdfp/include -isystem /opt/scorec/spack/rhel9/v0201_4/install/linux-rhel9-x86_64/gcc-12.3.0/perfstubs-master-mjccvyovutxdh2jlga4s26tqazp6h2b4/include -isystem /space/cwsmith/pcms/builds/AMPERE86/kokkos/install/include -isystem /opt/scorec/spack/rhel9/v0201_4/install/linux-rhel9-x86_64/gcc-12.3.
0/cuda-12.1.1-zxa4mskqvbkiefzkvnuatlq7skxjnzt6/include -isystem /space/cwsmith/pcms/builds/AMPERE86/omega_h/install/include -isystem /opt/scorec/spack/rhel9/v0201_4/install/linux-rhel9-x86_64/gcc-12.3.0/zlib-1.2.13-mjocrm2cwyth6kvpmj3yrg52ulwd64ow/include -isystem /opt/scorec/spack/rhel9/v0201_4/install/linux-rhel9-x
86_64/gcc-12.3.0/spdlog-1.11.0-tsatcehkgv4rcomhu4vhtmv4nh65hm65/include -O3 -DNDEBUG -std=c++17 -extended-lambda -Wext-lambda-captures-this -expt-relaxed-constexpr -arch=sm_86 --expt-extended-lambda -MD -MT src/CMakeFiles/pcms_core.dir/pcms/xgc_reverse_classification.cpp.o -MF CMakeFiles/pcms_core.dir/pcms/xgc_revers
e_classification.cpp.o.d -o CMakeFiles/pcms_core.dir/pcms/xgc_reverse_classification.cpp.o -c /space/cwsmith/pcms/pcms/src/pcms/xgc_reverse_classification.cpp 
/space/cwsmith/pcms/pcms/src/pcms/arrays.h(105): error: namespace "Kokkos" has no member "mdarray"                                                             
  using CoordinateMDArray = Kokkos::mdarray<                                                                                                                   
                                    ^                                                                                                                          
                                                                                                                                                               
/space/cwsmith/pcms/pcms/src/pcms/arrays.h(105): error: expected a ";"                                                                                         
  using CoordinateMDArray = Kokkos::mdarray<                                                                                                                   
                                           ^                                                                                                                   
                                                                                                                                                               
/space/cwsmith/pcms/pcms/src/pcms/arrays.h(113): error: namespace "Kokkos" has no member "mdarray"                                                             
  using MDArray = Kokkos::mdarray<                                                                                                                             
                          ^                                                                                                                                                                                                                                                            

@cwsmith
Copy link
Contributor

cwsmith commented Jan 15, 2025

@cwsmith
Copy link
Contributor

cwsmith commented Jan 15, 2025

This commit compiles with my hacked kokkos: d449f68

Some tests are failing:

The following tests FAILED:
          2 - smoke_test (Failed)
         26 - test_ohOverlap_d3d_20p (Timeout)
         28 - test_twoClientOverlap_d3d_28p (Failed)
         32 - test_proxy_coupling_26p (Timeout)
         34 - test_proxy_coupling_28p (Timeout)
         36 - test_proxy_coupling_32p (Timeout)

Besides the smoke test, the failing tests require >=20 processes. Given that I'm running on a workstation, it seems somewhat reasonable that there are failures. @jacobmerson @Angelyr Is that correct?

I'll dig into the failing smoke test.

@cwsmith
Copy link
Contributor

cwsmith commented Jan 15, 2025

Here is the kokkos PR: kokkos/kokkos#7671

@cwsmith
Copy link
Contributor

cwsmith commented Jan 15, 2025

@jacobmerson Any thoughts on this kokkos/kokkos#7671 (review) ?

I can think of three options:

  • We copy mdarray from Kokkos (again), and vendor it, but not mdspan. This seems like it is guaranteed to break as the pair will inevitably go out of sync (IIUC, there is a dependency between mdarray and mdspan).
  • Require kokkos builds to not provide mdspan/mdarray. This seems like an immediate dead end as other downstream packages may depend on mdspan from kokkos and most toolchains are unlikely to provide it.
  • Drop the mdarray requirement. I assume this is possible, but I'm not sure how difficult it would be.

@jacobmerson
Copy link
Collaborator Author

I don’t remember using mdarray too many places. So I’d think it wouldn’t be too hard to remove. Please let me know if there are specific places that use it an we can come up with a plan to remove.

@cwsmith
Copy link
Contributor

cwsmith commented Jan 15, 2025

OK. Here is the principle location:

pcms/src/pcms/arrays.h

Lines 104 to 115 in 502999e

using CoordinateMDArray = std::experimental::mdarray<
CoordinateElement<CoordinateSystem, ElementType>,
std::experimental::extents<int, std::experimental::dynamic_extent, N>,
LayoutPolicy, Container>;
template <typename ElementType,
typename LayoutPolicy = std::experimental::layout_right,
typename Container = std::vector<ElementType>>
using MDArray = std::experimental::mdarray<
ElementType,
std::experimental::extents<int, std::experimental::dynamic_extent, 1>,
LayoutPolicy, Container>;

and grep reports (on my branch):

src/pcms/arrays.h:4:#include "mdspan/mdarray.hpp" //from kokkos
src/pcms/arrays.h:76:// TODO convert instances of this to mdarray
src/pcms/arrays.h:105:using CoordinateMDArray = Kokkos::Experimental::mdarray<
src/pcms/arrays.h:113:using MDArray = Kokkos::Experimental::mdarray<
src/pcms/omega_h_field.h:269:    return MDArray<CoordinateElementType>{};
test/test_coordinate.cpp:47:  pcms::CoordinateMDArray<double,Cartesian> a(2);
test/test_coordinate.cpp:49:  std::experimental::mdarray<

@cwsmith
Copy link
Contributor

cwsmith commented Jan 15, 2025

mdspan PR: kokkos/mdspan#376

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants