Use cuda events API to make profiler better on GPU #6419
Labels
enhancement
New user-visible features or improvements to existing features.
gsoc
Potential Google Summer of Code projects
Currently the sampling profiler doesn't measure anything really meaningful for GPU schedules. Using the CUDA events API we could do better and at least get an accurate per-kernel runtime like nvprof does.
The text was updated successfully, but these errors were encountered: