Skip to content
This repository has been archived by the owner on Nov 25, 2024. It is now read-only.

[Performance] Remove unnecessary synchronization using thrust::cuda::par_nosync policy #148

Open
chang-l opened this issue Mar 15, 2024 · 1 comment

Comments

@chang-l
Copy link
Contributor

chang-l commented Mar 15, 2024

We are always using asynchronous thrust launch on a cuda stream, which involves extra cudaStreamSync within thrust calls, e.g.,

thrust::cuda::par(allocator).on(stream), seq_indices, seq_indices + indices_desc.size, 0);
thrust::exclusive_scan(thrust::cuda::par(thrust_allocator).on(stream),

It would be better to change to thrust::cuda::par_nosync, to make it easier to overlap with other operations.

@linhu-nv
Copy link
Contributor

linhu-nv commented Apr 3, 2024

Sorry for the late reply. wg 24.04 is closing, is it ok if we fix this in 24.06?

rapids-bot bot referenced this issue Apr 30, 2024
fix to issue 148[https://github.com/rapidsai/wholegraph/issues/148](url), remove unnecessary sync between thrust ops and host cpu threads

Authors:
  - https://github.com/linhu-nv

Approvers:
  - Chuang Zhu (https://github.com/chuangz0)

URL: #160
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants