-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Profiler] Use the Kineto to profile Triton XPU Kernel's accuracy execution time. #1066
Comments
It is the highest priority for collecting accurate Triton performance data for the coming Triton Demo on Jun 25. |
I have added post review comments to the PR that closed this issue, see #1136 (comment). I am concerned the benchmarks use a different way than the |
As the legacy profiler in public Torch doesn't support XPU, we need to use the Kineto to profile the Triton kernel. Change the title of this issue to be more precise. |
The Pytorch Kineto for XPU requires a separate PTI package intel-pti-dev_p_0.9.0.32 which is not included PTDB package so far. I will try it with Triton kernel to see if it works properly. |
Speaking of PTI, can we use it for the |
@chengjunlu could you please provide the detailed instruction on how to use Kineto for PyTorch profiling in general and Triton kernels profiling in particular. The use cases in scope are:
|
The |
I combine all the information here for performance profiling. There are two ways used in Torch + Triton to measure GPU kernel performance:
Here are the cases using 1st way:
Here are the cases using 2nd way:
|
Note: https://github.com/intel/intel-xpu-backend-for-triton/blob/llvm-target/scripts/patch-pytorch.sh can be applied to allow using |
The Kineto is blocked by an issue in Intel PTI not able to trace the Triton kernel launched by SYCL API. We can use the first way as a work around to get the approximate performance profiling with the patch https://github.com/intel/intel-xpu-backend-for-triton/blob/llvm-target/scripts/patch-pytorch.sh. For the Pytorch 2.5 OOB supporting, we have to use the wall time as a work around for now. The changes has been pushed to the PR #1905 |
There is no stand along profiler tools for Triton XPU now.
We used to use:
The Triton has a new component for profiling performance of the Triton kernel. It worth to support it for the Triton XPU.
The text was updated successfully, but these errors were encountered: