Skip to content

Commit

Permalink
Fix AMDGPU synchronize in tests & update doc (#628)
Browse files Browse the repository at this point in the history
* Add CUDA-aware MPI all-to-all tests and fix typo.

* Update ROCQueue init to latest syntax
  • Loading branch information
luraess authored Sep 17, 2022
1 parent 6ef9d6b commit 4cd7118
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 2 deletions.
7 changes: 6 additions & 1 deletion docs/src/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,12 @@ If your MPI implementation has been compiled with CUDA support, then `CUDA.CuArr
[CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) package) can be passed directly as
send and receive buffers for point-to-point and collective operations (they may also work with one-sided operations, but these are not often supported).

If using Open MPI, the status of CUDA support can be checked via the
Successfully running the [alltoall\_test\_cuda.jl](https://gist.github.com/luraess/0063e90cb08eb2208b7fe204bbd90ed2)
should confirm your MPI implementation to have the CUDA support enabled. Moreover, successfully running the
[alltoall\_test\_cuda\_multigpu.jl](https://gist.github.com/luraess/ed93cc09ba04fe16f63b4219c1811566) should confirm
your CUDA-aware MPI implementation to use multiple Nvidia GPUs (one GPU per rank).

If using OpenMPI, the status of CUDA support can be checked via the
[`MPI.has_cuda()`](@ref) function.

## ROCm-aware MPI support
Expand Down
2 changes: 1 addition & 1 deletion test/common.jl
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ elseif get(ENV,"JULIA_MPI_TEST_ARRAYTYPE","") == "ROCArray"
ArrayType = AMDGPU.ROCArray
function synchronize()
# TODO: AMDGPU synchronization story is complicated. HSA does not provide a consistent notion of global queues. We need a mechanism for all GPUArrays.jl provided kernels to be synchronized.
queue = AMDGPU.get_default_queue()
queue = AMDGPU.ROCQueue()
barrier = AMDGPU.barrier_and!(queue, AMDGPU.active_kernels(queue))
AMDGPU.HIP.hipDeviceSynchronize() # Sync all HIP kernels e.g. BLAS. N.B. this is blocking Julia progress
wait(barrier)
Expand Down

0 comments on commit 4cd7118

Please sign in to comment.