Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCm] [BUGFIX] Re-enable rocm-specific tuning parameters v2 (#133852) #136139

Merged
merged 1 commit into from
Sep 25, 2024

Conversation

jataylo
Copy link
Collaborator

@jataylo jataylo commented Sep 16, 2024

Small bug fix - #124592 replaced the torch.version.hip with device_props but made a mistake in porting the original logic.

The original code was:
if torch.version.hip is not None:

Which was incorrectly replaced by:
if self.device_props.type != "hip":

Another occurence of #130617

Pull Request resolved: #133852
Approved by: https://github.com/masnesral, https://github.com/malfet

(cherry picked from commit da587de)

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @hongxiayang @naromero77amd @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

…#133852)

Small bug fix - pytorch#124592 replaced the torch.version.hip with device_props but made a mistake in porting the original logic.

The original code was:
`if torch.version.hip is not None:`

Which was incorrectly replaced by:
`if self.device_props.type != "hip":`

Another occurence of pytorch#130617

Pull Request resolved: pytorch#133852
Approved by: https://github.com/masnesral, https://github.com/malfet

(cherry picked from commit da587de)
@jataylo jataylo requested review from malfet and atalman September 16, 2024 12:32
Copy link

pytorch-bot bot commented Sep 16, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136139

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 3 Unrelated Failures

As of commit 683c494 with merge base b7eb725 (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@kit1980
Copy link
Member

kit1980 commented Sep 20, 2024

@jataylo I see rocm failures on this PR, related?

Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jataylo these errors look not really related, however please you can confirm this ? :

[rocm / linux-focal-rocm6.1-py3.8 / test (default, 1, 6, linux.rocm.gpu.2)](https://hud.pytorch.org/pr/pytorch/pytorch/136139#30199911968) ([gh](https://github.com/pytorch/pytorch/actions/runs/10883861799/job/30199911968)) ([trunk failure](https://hud.pytorch.org/pytorch/pytorch/commit/b7eb7256fb9a48d1fc452608986b64688b6469fa#29886643882))
inductor/test_flex_decoding.py::TestFlexDecoding::test_builtin_score_mods_bfloat16_score_mod0_head_dims1
[rocm / linux-focal-rocm6.1-py3.8 / test (default, 3, 6, linux.rocm.gpu.2)](https://hud.pytorch.org/pr/pytorch/pytorch/136139#30199912800) ([gh](https://github.com/pytorch/pytorch/actions/runs/10883861799/job/30199912800)) ([trunk failure](https://hud.pytorch.org/pytorch/pytorch/commit/b7eb7256fb9a48d1fc452608986b64688b6469fa#29886644164))
inductor/test_loop_ordering.py::LoopOrderingTest::test_fp8_cast_and_t
[rocm / linux-focal-rocm6.1-py3.8 / test (default, 4, 6, linux.rocm.gpu.2)](https://hud.pytorch.org/pr/pytorch/pytorch/136139#30199913164) ([gh](https://github.com/pytorch/pytorch/actions/runs/10883861799/job/30199913164)) ([trunk failure](https://hud.pytorch.org/pytorch/pytorch/commit/b7eb7256fb9a48d1fc452608986b64688b6469fa#29886645016))
inductor/test_flex_decoding.py::TestFlexDecoding::test_builtin_score_mods_bfloat16_score_mod0_head_dims0

@atalman atalman self-requested a review September 24, 2024 17:54
@pruthvistony pruthvistony added the rocm This tag is for PRs from ROCm team label Sep 24, 2024
@pruthvistony pruthvistony added this to the 2.5.0 milestone Sep 24, 2024
@jataylo
Copy link
Collaborator Author

jataylo commented Sep 24, 2024

I can confirm these failures are unrelated

https://github.com/pytorch/pytorch/pull/136557/files/16ebb15a8d8de4200fddd5c7b7cb8143a834994c..8f94eaaf3da2977e90aef4df9816d0c88fc74da8 This cherry pick will resolve the fp8 failures.

The flex-decode failures I'm not sure what the root cause was to resolve these cc: @amdfaa @jithunnair-amd @jerrymannil but they are not related to this change.

@kit1980 kit1980 merged commit dd73223 into pytorch:release/2.5 Sep 25, 2024
214 of 218 checks passed
@jithunnair-amd
Copy link
Collaborator

Verified in torch2.5 final RC wheel - pip3 install torch==2.5.0 torchvision --index-url https://download.pytorch.org/whl/test/rocm6.2 - that the _inductor/runtime/triton_heuristics.py file in the pytorch wheel contains the fix:

            if self.device_props.type == "hip":
                if "waves_per_eu" in compile_meta:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/rocm module: inductor module: rocm AMD GPU support for Pytorch open source rocm This tag is for PRs from ROCm team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants