-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FP16 huggingface accuracy 6 models got failed #195
Comments
The following reporducer got segmentfault on fp16 but passed on fp32:
|
The crash happends in triton, I've submitted a issue to the team: intel/intel-xpu-backend-for-triton#1073 |
This issue is moved to IGC team, the JIRA: https://jira.devtools.intel.com/browse/GSD-9082 |
The fix for segmentfalut will be in next rolling driver and next LTS driver. |
This PR pytorch/pytorch#126261 will make Inductor generator hint for triton that input tensor is |
@mengfei25 PR pytorch/pytorch#126261 has been landed, please verify. |
🐛 Describe the bug
Please refer to https://github.com/intel/torch-xpu-ops/actions/runs/8995661712/job/24710982088, there are 6 models got failed.
Error info
Run failed with return code: -11
Output: None
Error: None
============ Summary for huggingface float16 inference accuracy ============
num_total: 40 (should be 46)
num_passed: 39
num_failed: 1
pass_rate: 97.50%
============ Summary for huggingface float16 training accuracy ============
num_total: 40 (should be 46)
num_passed: 40
num_failed: 0
pass_rate: 100.00%
Versions
#188
Driver: 803.29 LTS
Bundle: 0.5.0
PyTorch: 2024-05-07 nightly release
XPU OPS: d110623
The text was updated successfully, but these errors were encountered: