-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is Tiledarray ready for SYCL conversion? #427
Comments
@victor-anisimov I think this is an issue due to a regression in gcc 10.2 ... do you have 10.3 available? That's the main compiler we use on our cluster, it should work. |
gcc/11.1.0 worked for me. Thank you very much for the suggestion! However, I see many tests failing.
1/26 Test #1: madness/test/world/build ................... Passed 53.98 sec 27% tests passed, 19 tests failed out of 26 |
I recommend you only run TA-specific tests. Build 'check-tiledarray' target.
It looks like the tests that are failing are madness tests. This is likely
that MPI programs on your system cannot be launched directly ... Need srun?
…On Thu, Oct 19, 2023, 1:48 PM Victor Anisimov ***@***.***> wrote:
gcc/11.1.0 worked for me. Thank you very much for the suggestion! However,
I see many tests failing.
Start 1: madness/test/world/build
1/26 Test #1 <#1>:
madness/test/world/build ................... Passed 53.98 sec
Start 2: madness/test/world/test_prof/run
2/26 Test #2 <#2>:
madness/test/world/test_prof/run ...........***Failed 1.19 sec
Start 3: madness/test/world/test_ar/run
3/26 Test #3 <#3>:
madness/test/world/test_ar/run .............***Failed 0.64 sec
Start 4: madness/test/world/test_hashdc/run
4/26 Test #4 <#4>:
madness/test/world/test_hashdc/run .........***Failed 0.64 sec
Start 5: madness/test/world/test_hello/run
5/26 Test #5 <#5>:
madness/test/world/test_hello/run ..........***Failed 0.64 sec
Start 6: madness/test/world/test_atomicint/run
6/26 Test #6 <#6>:
madness/test/world/test_atomicint/run ......***Failed 0.64 sec
Start 7: madness/test/world/test_future/run
7/26 Test #7 <#7>:
madness/test/world/test_future/run .........***Failed 0.64 sec
Start 8: madness/test/world/test_future2/run
8/26 Test #8 <#8>:
madness/test/world/test_future2/run ........***Failed 0.64 sec
Start 9: madness/test/world/test_future3/run
9/26 Test #9 <#9>:
madness/test/world/test_future3/run ........***Failed 0.64 sec
Start 10: madness/test/world/test_dc/run
10/26 Test #10 <#10>:
madness/test/world/test_dc/run .............***Failed 0.64 sec
Start 11: madness/test/world/test_hashthreaded/run
11/26 Test #11 <#11>:
madness/test/world/test_hashthreaded/run ...***Failed 0.64 sec
Start 12: madness/test/world/test_queue/run
12/26 Test #12 <#12>:
madness/test/world/test_queue/run ..........***Failed 0.63 sec
Start 13: madness/test/world/test_world/run
13/26 Test #13 <#13>:
madness/test/world/test_world/run ..........***Failed 0.66 sec
Start 14: madness/test/world/test_worldprofile/run
14/26 Test #14 <#14>:
madness/test/world/test_worldprofile/run ...***Failed 0.68 sec
Start 15: madness/test/world/test_binsorter/run
15/26 Test #15 <#15>:
madness/test/world/test_binsorter/run ......***Failed 0.64 sec
Start 16: madness/test/world/test_vector/run
16/26 Test #16 <#16>:
madness/test/world/test_vector/run ......... Passed 0.35 sec
Start 17: madness/test/world/test_worldptr/run
17/26 Test #17 <#17>:
madness/test/world/test_worldptr/run .......***Failed 0.92 sec
Start 18: madness/test/world/test_worldref/run
18/26 Test #18 <#18>:
madness/test/world/test_worldref/run .......***Failed 0.87 sec
Start 19: madness/test/world/test_stack/run
19/26 Test #19 <#19>:
madness/test/world/test_stack/run .......... Passed 0.12 sec
Start 20: madness/test/world/test_googletest/run
20/26 Test #20 <#20>:
madness/test/world/test_googletest/run ..... Passed 0.66 sec
Start 21: madness/test/world/test_tree/run
21/26 Test #21 <#21>:
madness/test/world/test_tree/run ...........***Failed 0.64 sec
Start 22: btas/unit/build
22/26 Test #22 <#22>:
btas/unit/build ............................ Passed 54.24 sec
Start 23: btas/unit/run
23/26 Test #23 <#23>:
btas/unit/run .............................. Passed 215.30 sec
Start 24: tiledarray/unit/build
24/26 Test #24 <#24>:
tiledarray/unit/build ...................... Passed 1082.45 sec
Start 25: tiledarray/unit/run-np-1
25/26 Test #25 <#25>:
tiledarray/unit/run-np-1 ...................***Failed 1.43 sec
Start 26: tiledarray/unit/run-np-2
26/26 Test #26 <#26>:
tiledarray/unit/run-np-2 ...................***Failed 0.75 sec
27% tests passed, 19 tests failed out of 26
—
Reply to this email directly, view it on GitHub
<#427 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAQXIZ4BSIXMNJANNPQ4B5DYAFYWPAVCNFSM6AAAAAA6HR7QM2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZRGUZTENRXHE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Yes, you are right! My MPI environment is broken. It crashes even on hello world. I'll rerun the tests on Polaris at some later time. I have one more question, though. I saw the change in directory structure from cpu/cuda to host/device. Also a number of cuda suffixes in file names has been changed to suffix device, which is great. However, I still can see cpu_cuda_vector in src/TiledArray/device. Are you going to rename it? For HIP/SYCL, you would also need to rename .cu files to .cpp and modify the CMake scripts accordingly Do you plan to make that change or keep the file extensions as they are? Thanks! |
the problem with In any case, since
See above re: The rest are CUDA-specific implementation files ... they are actually portable, but CMake requires a separate file for each device-specific implementation so notice I just include For SYCL you would need to implement analogs of |
It is up to you if you prefer keeping a separate file for device-specific implementation instead of keeping the alike kernels in one file next to each other separated by compiler preprocessor directives as it is done in LibreTT. Seeing all three platform-specific kernels in one place makes it easier to propagate the change applied to one kernel to the other two making the support easier. The directory thrust basically holds some vector operations. Intel provides similar functionality to thrust in dpl library. I have quite similarly looking SYCL implementation for these kernels. Compare to If it is not an insentive to make the kernel API more general (more vendor-independent) then so it be. |
status quo:
Because both CUDA and HIP provide same Thrust API then there are no platform-dependent code blocks there. I agree with your arguments, and I'm fine adding DPC++-specific extensions to these headers. I believe that the "kernel" functions will be platform independent (e.g. |
I'm trying to compile the latest commit d0c0ded of tiledarray for CUDA platform (V100) and get a compiler error:
[ 95%] Building CXX object src/CMakeFiles/tiledarray.dir/TiledArray/sparse_shape.cpp.o
cd /home/vanisimov/tiledarray/master/build.V100.gcc10.mpich/src && /soft/packaging/spack-builds/linux-opensuse_leap15-x86_64/gcc-10.2.0/gcc-10.2.0-yudlyezca7twgd5o3wkkraur7wdbngdn/bin/g++ -DBOOST_ALL_NO_LIB -DBOOST_SERIALIZATION_DYN_LINK -DBTAS_ASSERT_THROWS=1 -DBTAS_DEFAULT_TARGET_MAX_INDEX_RANK=6 -DBTAS_HAS_BLAS_LAPACK=1 -DBTAS_HAS_BOOST_CONTAINER=1 -DBTAS_HAS_BOOST_ITERATOR=1 -DBTAS_HAS_BOOST_SERIALIZATION=1 -DBTAS_HAS_CBLAS=1 -DBTAS_HAS_INTEL_MKL=1 -DBTAS_HAS_LAPACKE=1 -DLAPACK_COMPLEX_CPP=1 -DLIBRETT_USES_CUDA=1 -DMADNESS_DISABLE_WORLD_GET_DEFAULT=1 -DMADNESS_LINALG_USE_LAPACKE=1 -DMADNESS_MPI_HEADER="/soft/restricted/CNDA/mpich/drop51.2/mpich-ofi-sockets-icc-default-gpu-drop51/include/mpi.h" -DMPICH_SKIP_MPICXX -DOMPI_SKIP_MPICXX -D_MPICC_H -I/home/vanisimov/tiledarray/master/src -I/home/vanisimov/tiledarray/master/build.V100.gcc10.mpich/src -I/soft/compilers/cuda/cuda-12.0.0/targets/x86_64-linux/include -I/soft/restricted/CNDA/updates/2023.05.15.001/oneapi/mkl/2023u2_20230512/include -I/home/vanisimov/tiledarray/master/build.V100.gcc10.mpich/_deps/madness-src/src -I/home/vanisimov/tiledarray/master/build.V100.gcc10.mpich/_deps/madness-build/src -I/home/vanisimov/tiledarray/master/build.V100.gcc10.mpich/_deps/madness-build/src/madness/world -I/home/vanisimov/tiledarray/master/build.V100.gcc10.mpich/_deps/madness-src/src/madness/world -I/soft/restricted/CNDA/mpich/drop51.2/mpich-ofi-sockets-icc-default-gpu-drop51/include -I/home/vanisimov/tiledarray/master/build.V100.gcc10.mpich/_deps/eigen-src -I/home/vanisimov/tiledarray/master/build.V100.gcc10.mpich/_deps/btas-src -I/home/vanisimov/tiledarray/master/build.V100.gcc10.mpich/_deps/blaspp-build/include -I/home/vanisimov/tiledarray/master/build.V100.gcc10.mpich/_deps/blaspp-src/include -I/home/vanisimov/tiledarray/master/build.V100.gcc10.mpich/_deps/lapackpp-build/include -I/home/vanisimov/tiledarray/master/build.V100.gcc10.mpich/_deps/lapackpp-src/include -I/home/vanisimov/tiledarray/master/build.V100.gcc10.mpich/_deps/umpire-src/src -I/home/vanisimov/tiledarray/master/build.V100.gcc10.mpich/_deps/umpire-src/src/tpl -I/home/vanisimov/tiledarray/master/build.V100.gcc10.mpich/_deps/umpire-src/src/tpl/umpire/camp/include -I/home/vanisimov/tiledarray/master/build.V100.gcc10.mpich/_deps/umpire-build/src/tpl/umpire/camp/include -I/home/vanisimov/tiledarray/master/build.V100.gcc10.mpich/deps/umpire-build/include -I/home/vanisimov/tiledarray/master/build.V100.gcc10.mpich/deps/librett-src/src -isystem /soft/compilers/cuda/cuda-12.0.0/include -I/home/vanisimov/tiledarray/install/include -std=c++17 -D__TBB_LEGACY_MODE=1 -DTBB_SUPPRESS_DEPRECATED_MESSAGES=1 -I/home/vanisimov/tiledarray/install/include -O0 -g -Wall -ftemplate-backtrace-limit=0 -std=c++17 -fPIC -I/home/vanisimov/tiledarray/install/include -std=c++17 -D__TBB_LEGACY_MODE=1 -DTBB_SUPPRESS_DEPRECATED_MESSAGES=1 -fopenmp -MD -MT src/CMakeFiles/tiledarray.dir/TiledArray/sparse_shape.cpp.o -MF CMakeFiles/tiledarray.dir/TiledArray/sparse_shape.cpp.o.d -o CMakeFiles/tiledarray.dir/TiledArray/sparse_shape.cpp.o -c /home/vanisimov/tiledarray/master/src/TiledArray/sparse_shape.cpp
In file included from /home/vanisimov/tiledarray/master/src/TiledArray/tensor/utility.h:33,
from /home/vanisimov/tiledarray/master/src/TiledArray/tensor/kernels.h:30,
from /home/vanisimov/tiledarray/master/src/TiledArray/tensor/tensor.h:30,
from /home/vanisimov/tiledarray/master/src/TiledArray/tensor.h:31,
from /home/vanisimov/tiledarray/master/src/TiledArray/sparse_shape.h:31,
from /home/vanisimov/tiledarray/master/src/TiledArray/sparse_shape.cpp:26:
/home/vanisimov/tiledarray/master/src/TiledArray/tiled_range1.h: In static member function 'static TiledArray::TiledRange1 TiledArray::TiledRange1::make_uniform(std::size_t, std::size_t)':
/home/vanisimov/tiledarray/master/src/TiledArray/tiled_range1.h:268:41: warning: comparison of integer expressions of different signedness: 'long int' and 'std::size_t' {aka 'long unsigned int'} [-Wsign-compare]
268 | for (auto i = num_avg_plus_one; i < ntiles;
| ~~^~~~~~~~
In file included from /home/vanisimov/tiledarray/master/src/TiledArray/tensor.h:31,
from /home/vanisimov/tiledarray/master/src/TiledArray/sparse_shape.h:31,
from /home/vanisimov/tiledarray/master/src/TiledArray/sparse_shape.cpp:26:
/home/vanisimov/tiledarray/master/src/TiledArray/tensor/tensor.h: In instantiation of 'TiledArray::Tensor<T, A>::Tensor(const T1&, const Perm&) [with T1 = TiledArray::Tensor; Perm = TiledArray::Permutation; typename std::enable_if<(is_nested_tensor_v && is_permutation_v)>::type* = 0; T = float; Allocator = Eigen::aligned_allocator]':
/home/vanisimov/tiledarray/master/src/TiledArray/tensor/tensor.h:1279:16: required from 'TiledArray::Tensor<T, A> TiledArray::Tensor<T, A>::permute(const Perm&) const [with Perm = TiledArray::Permutation; = void; T = float; Allocator = Eigen::aligned_allocator]'
/home/vanisimov/tiledarray/master/src/TiledArray/sparse_shape.h:1201:44: required from 'TiledArray::SparseShape::SparseShape TiledArray::SparseShape::perm(const TiledArray::Permutation&) const [with T = float; TiledArray::SparseShape::SparseShape = TiledArray::SparseShape]'
/home/vanisimov/tiledarray/master/src/TiledArray/sparse_shape.cpp:30:16: required from here
/home/vanisimov/tiledarray/master/src/TiledArray/tensor/tensor.h:404:24: error: no matching function for call to 'tensor_init(, TiledArray::Permutation, TiledArray::Tensor&, const TiledArray::Tensor&)'
404 | detail::tensor_init(value_converter, outer(perm),
| ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
405 | this, other);
| ~~~~~~~~~~~~~
In file included from /home/vanisimov/tiledarray/master/src/TiledArray/tensor/tensor.h:30,
from /home/vanisimov/tiledarray/master/src/TiledArray/tensor.h:31,
from /home/vanisimov/tiledarray/master/src/TiledArray/sparse_shape.h:31,
from /home/vanisimov/tiledarray/master/src/TiledArray/sparse_shape.cpp:26:
/home/vanisimov/tiledarray/master/src/TiledArray/tensor/kernels.h:449:13: note: candidate: 'template<class Op, class TR, class ... Ts, typename std::enable_if<(TiledArray::detail::is_tensor<T2, Ts ...>::value && TiledArray::detail::is_contiguous_tensor<T2, Ts ...>::value)>::type > void TiledArray::detail::tensor_init(Op&&, TR&, const Ts& ...)'
449 | inline void tensor_init(Op&& op, TR& result, const Ts&... tensors) {
| ^~~~~~~~~~~
/home/vanisimov/tiledarray/master/src/TiledArray/tensor/kernels.h:449:13: note: template argument deduction/substitution failed:
In file included from /home/vanisimov/tiledarray/master/src/TiledArray/tensor.h:31,
from /home/vanisimov/tiledarray/master/src/TiledArray/sparse_shape.h:31,
from /home/vanisimov/tiledarray/master/src/TiledArray/sparse_shape.cpp:26:
/home/vanisimov/tiledarray/master/src/TiledArray/tensor/tensor.h:404:24: note: couldn't deduce template parameter 'Op'
404 | detail::tensor_init(value_converter, outer(perm),
| ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
405 | *this, other);
| ~~~~~~~~~~~~~
The text was updated successfully, but these errors were encountered: