You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This RFC proposes modifications to torchvision to support Intel XPU devices. The primary focus is on updating the test suite to accommodate XPU. Since existing implementations of torchvision ops exist in torch-xpu-ops, some XPU support already exists in torchvision. This proposal is to expand torchvision so XPU devices have feature parity with CUDA.
Detailed Proposal
0. Implement torchvision ops for XPU
As mentioned above, this has already been done in torch-xpu-ops (see #1290), so XPU builds currently support nms, deform_conv2d, roi_align, roi_pool, ps_roi_align, ps_roi_pool and corresponding backward ops.
Autocasting and autograd support require minor changes in torchvision.
1. Modify Test Suite
Where existing code tests CUDA devices, we would add equivalent XPU tests, as supported.
The primary change involves updating the test suite to support XPU devices. This includes tests for operators (test/test_ops.py), transforms (test/test_transforms_tensor.py, test/test_transforms_v2.py), and models.
2. Video/Image Features
Based on the H1 2025 TorchCodec Roadmap, we do not plan to support image/video encoding/decoding within torchvision, and would instead add XPU functionality to TorchCodec.
3. Update Documentation, Scripts, and Benchmarks
Efforts will focus on supporting XPU in benchmarking scripts, with a few minor updates to documentation to specify XPU availability. We will not focus on updates to gallery and references scripts in most cases.
4. Continuous Integration
Update the CI configuration to include tests for XPU devices, ensuring that changes are validated across both CUDA and XPU environments. Based on feedback, XPU tests could be part of a specific workflow label (like ciflow/xpu in pytorch/pytorch).
Alternatives and Open Questions
Separate Test Files - Instead of modifying existing tests, create separate test files specifically for XPU. This approach, however, may lead to code duplication and maintenance challenges.
Should operators be written in Triton? - As proposed in #8746, ops could be implemented in Triton, providing a single common code for GPUs from different vendors and reducing code duplication. This would require additional engineering effort, but is quite limited in scale, with only 6 CUDA/SYCL ops currently in Torchvision.
Should basic XPU image/video encoding/decoding be provided in Torchvision? - To provide API consistency, encoding/decoding could be supported in torchvision. This may depend on if/when functionality will be deprecated/removed and moved to torchcodec.
The text was updated successfully, but these errors were encountered:
…pytorch#144120) (pytorch#146372)
Summary:
# Summary
### Sticky points
Cuda-graph rng handling has changed / deviated from original implementation. We will be left with a dangling 'offset' val and confusing naming due to BC
## Dependencies
- Flash PR: Dao-AILab/flash-attention#1419
### Other Points
- The BC linter is complaining about losing generate.py and its functions which is not real BC surface
cc albanD
imported-using-ghimport
Test Plan:
Imported from OSS
Building in dev
`buck build @//mode/dev-nosan -c fbcode.nvcc_arch=h100a //caffe2:ATen-cu --show-full-output `
I and Nming the .so I do see that the flash symbols are correctly named:
```
0000000001c3dfb0 t pytorch_flash::run_mha_bwd(pytorch_flash::Flash_bwd_params&, CUstream_st*)::$_0::operator()() const::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()pytorch#7}::operator()() const
0000000001c36080 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()pytorch#2}::operator()() const::{lambda()#1}::operator()() const::{lambda()pytorch#6}::operator()() const
0000000001c360e0 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()pytorch#2}::operator()() const::{lambda()#1}::operator()() const::{lambda()pytorch#7}::operator()() const
0000000001c35fc0 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()pytorch#6}::operator()() const
0000000001c36020 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()pytorch#7}::operator()() const
```
Reviewed By: vkuzo
Differential Revision: D68502879
Pulled By: drisspg
Pull Request resolved: pytorch#146372
Approved by: https://github.com/jbschlosser
RFC: Support for Intel XPU Devices in torchvision
Summary
This RFC proposes modifications to
torchvision
to support Intel XPU devices. The primary focus is on updating the test suite to accommodate XPU. Since existing implementations of torchvision ops exist in torch-xpu-ops, some XPU support already exists in torchvision. This proposal is to expand torchvision so XPU devices have feature parity with CUDA.Detailed Proposal
As mentioned above, this has already been done in torch-xpu-ops (see #1290), so XPU builds currently support
nms, deform_conv2d, roi_align, roi_pool, ps_roi_align, ps_roi_pool
and corresponding backward ops.Where existing code tests CUDA devices, we would add equivalent XPU tests, as supported.
The primary change involves updating the test suite to support XPU devices. This includes tests for operators (
test/test_ops.py
), transforms (test/test_transforms_tensor.py
,test/test_transforms_v2.py
), and models.Based on the H1 2025 TorchCodec Roadmap, we do not plan to support image/video encoding/decoding within torchvision, and would instead add XPU functionality to TorchCodec.
Efforts will focus on supporting XPU in benchmarking scripts, with a few minor updates to documentation to specify XPU availability. We will not focus on updates to gallery and references scripts in most cases.
Update the CI configuration to include tests for XPU devices, ensuring that changes are validated across both CUDA and XPU environments. Based on feedback, XPU tests could be part of a specific workflow label (like
ciflow/xpu
in pytorch/pytorch).Alternatives and Open Questions
The text was updated successfully, but these errors were encountered: