-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMD] release/3.2.x AMD perf cherry picks #5191
Merged
antiagainst
merged 15 commits into
triton-lang:rc/3.2.x
from
jataylo:32_amd_cherrypicks_pr
Dec 4, 2024
Merged
[AMD] release/3.2.x AMD perf cherry picks #5191
antiagainst
merged 15 commits into
triton-lang:rc/3.2.x
from
jataylo:32_amd_cherrypicks_pr
Dec 4, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In the case of 16 bit floats operands for tt::AtomicRMWOp, construct only one LLVM::AtomicRMWOp but use vector of elements. Such approach allows to generate packed intrinsics and process 2 elements at once. Added a lit test for f16 vectorized case. (cherry picked from commit 78c8054)
(cherry picked from commit 86a2ac7)
…4935) This PR adds more restrictions about when should we apply the sched-load optimizations and un-revert triton-lang#4823. We will only apply the optimization when all of the following is satisfied: 1. pureMatmulProblem, i.e. 1 `tt.dot` in the main loop 2. two `tt.load`s in the main loop 3. 2nd `tt.load` is ahead of the `tt.dot` 4. 1st user of 2nd `tt.load` is after the `tt.dot` 5. tile size is large enough, i.e. nonKDim >= 128 and kDim >= 64 (cherry picked from commit 4f6f768)
…n-lang#4991) Specifically, it fixes problems when `srcLayout` and `dstLayout` have different number of registers but the same number of not free registers. We solved the problem by padding free registers to either `srcLayout` or `dstLayout`, but this can be improved by fixing the `invertAndCompose` function. (cherry picked from commit 15c5e55)
…triton-lang#4951) This PR removes the legacy `isMmaToDotShortcut` and its associated shortcut conversion. (cherry picked from commit 1d5fdfe)
This commit removes special cases for MFMA -> Dot Operand LDS shortcuts. Now it is supported by common linear layout infrastructure. No tests are added, mfma-shortcut.mlir already testing this. (cherry picked from commit 69f656c)
This commit adds support for warp-level reduction with DPP instructions, which can improve performance. See https://gpuopen.com/learn/amd-gcn-assembly-cross-lane-operations/ (cherry picked from commit 21119e3)
TritonAMDGPUTransforms now depends on it. (cherry picked from commit 0b443ce)
In the case of unpaired f16 elements utilize DPP instructions to accelerate atomics. Here is an algorithm of lowering `tt::atomicRmwOp(%ptr, %val, %mask)`: 0. Group thread by pairs. Master thread is (tid % 2 == 0); 1. All the threads send `%val` to `(tid - 1)` thread via `dppUpdateOp shl`, so all the masters recieve value from secondary threads; 2. Take into account parity in the `%mask` value, build CF structures according to it; 3. Generate `llvm::atomicRmwOp` in the threads enabled by `%mask` value; 4. All the threads send result of generated operation to `(tid + 1)` thread via `dppUpdateOp shl`, so all secondary thread also recieve their result. DPP approach has ~5% perf improvment so use this one in the case target arch supports DPP. Signed-off-by: Ilya Veselov <[email protected]> (cherry picked from commit bab3470)
jataylo
requested review from
antiagainst,
zhanglx13,
Jokeren and
ptillet
as code owners
November 19, 2024 13:55
Enable new arch target since backend support has been added. (cherry picked from commit ed39410)
Fixes triton-lang#4769 (cherry picked from commit f484cb8)
triton-lang#5064) Bumping llvm to include a loop unroller fix: llvm/llvm-project#114573. This is needed for subsequent loop unroller upstreaming work. (cherry picked from commit 3c296ab)
This pulls in llvm/llvm-project@bd9145c8c213 to enable ASan on AMD backend. (cherry picked from commit 0bd30a2)
This includes llvm/llvm-project#115627 (cherry picked from commit 6404fbb)
This pulls in the AMDGPU backend support for the gfx950 target. We need to fix the rewrites in `Combine.td` given that llvm/llvm-project#112700 adds a new attribute for denorm mode for `arith.addf`. --------- Co-authored-by: Lei Zhang <[email protected]> (cherry picked from commit 1d5e9a2)
jataylo
force-pushed
the
32_amd_cherrypicks_pr
branch
from
December 4, 2024 11:55
324fb09
to
7c6da39
Compare
cc: @bertmaher |
=================================== 8995 passed, 2285 skipped, 153 warnings in 1246.46s (0:20:46) =================================== @antiagainst, @antiagainst mind taking a quick look to sanity check and hopefully @bertmaher can help us merge into rc/3.2.x |
zhanglx13
approved these changes
Dec 4, 2024
jataylo
added a commit
to jataylo/triton
that referenced
this pull request
Dec 5, 2024
This reverts commit 2d8093c.
antiagainst
added a commit
that referenced
this pull request
Dec 5, 2024
Reverts #5191 due to some mlir errors in pytorch unit tests Smaller set of cherry picks: - #5308 (and previous LLVM upgrades) - #5281 - #4925 - #5053 - #5019 - #4998 --------- Co-authored-by: Jungwook Park <[email protected]> Co-authored-by: peterbell10 <[email protected]> Co-authored-by: Hongtao Yu <[email protected]> Co-authored-by: Lei Zhang <[email protected]> Co-authored-by: Ilya V <[email protected]> Co-authored-by: Kyle Wang <[email protected]>
jataylo
added a commit
to jataylo/triton
that referenced
this pull request
Dec 11, 2024
This reverts commit 2d8093c.
jataylo
added a commit
to jataylo/triton
that referenced
this pull request
Dec 12, 2024
This reverts commit 2d8093c.
antiagainst
added a commit
that referenced
this pull request
Dec 13, 2024
This PR brings in required LLVM bumps and additional targets for gfx950 support. - #5040 - #5064 - #5180 - #5242 - #5392 Note this PR reverts the last two PRs to only focus on the LLVM upgrade - #5347 - #5191 --------- Co-authored-by: peterbell10 <[email protected]> Co-authored-by: Hongtao Yu <[email protected]> Co-authored-by: Lei Zhang <[email protected]> Co-authored-by: Jungwook Park <[email protected]>
jataylo
added a commit
to jataylo/triton
that referenced
this pull request
Dec 18, 2024
This PR brings in required LLVM bumps and additional targets for gfx950 support. - triton-lang#5040 - triton-lang#5064 - triton-lang#5180 - triton-lang#5242 - triton-lang#5392 Note this PR reverts the last two PRs to only focus on the LLVM upgrade - triton-lang#5347 - triton-lang#5191 --------- Co-authored-by: peterbell10 <[email protected]> Co-authored-by: Hongtao Yu <[email protected]> Co-authored-by: Lei Zhang <[email protected]> Co-authored-by: Jungwook Park <[email protected]> (cherry picked from commit f11c5ba)
jataylo
added a commit
to jataylo/triton
that referenced
this pull request
Dec 18, 2024
Cherry pick list: - triton-lang#4925 - triton-lang#5053 - triton-lang#5019 - triton-lang#5002 - triton-lang#4935 - required additional cherry picks triton-lang#4991 and triton-lang#4951 - triton-lang#4998 - triton-lang#4925 - triton-lang#5281 - triton-lang#5308 - All previous LLVM hash PRs before triton-lang#5308 --------- Co-authored-by: Ilya V <[email protected]> Co-authored-by: Lei Zhang <[email protected]> Co-authored-by: Lixun Zhang <[email protected]> Co-authored-by: Keren Zhou <[email protected]> Co-authored-by: Alexander Efimov <[email protected]> Co-authored-by: Kyle Wang <[email protected]> Co-authored-by: Jungwook Park <[email protected]> Co-authored-by: peterbell10 <[email protected]> Co-authored-by: Hongtao Yu <[email protected]> (cherry picked from commit 2d8093c)
jataylo
added a commit
to jataylo/triton
that referenced
this pull request
Dec 18, 2024
Reverts triton-lang#5191 due to some mlir errors in pytorch unit tests Smaller set of cherry picks: - triton-lang#5308 (and previous LLVM upgrades) - triton-lang#5281 - triton-lang#4925 - triton-lang#5053 - triton-lang#5019 - triton-lang#4998 --------- Co-authored-by: Jungwook Park <[email protected]> Co-authored-by: peterbell10 <[email protected]> Co-authored-by: Hongtao Yu <[email protected]> Co-authored-by: Lei Zhang <[email protected]> Co-authored-by: Ilya V <[email protected]> Co-authored-by: Kyle Wang <[email protected]> (cherry picked from commit 7e401df)
jataylo
added a commit
to jataylo/triton
that referenced
this pull request
Dec 18, 2024
This PR brings in required LLVM bumps and additional targets for gfx950 support. - triton-lang#5040 - triton-lang#5064 - triton-lang#5180 - triton-lang#5242 - triton-lang#5392 Note this PR reverts the last two PRs to only focus on the LLVM upgrade - triton-lang#5347 - triton-lang#5191 --------- Co-authored-by: peterbell10 <[email protected]> Co-authored-by: Hongtao Yu <[email protected]> Co-authored-by: Lei Zhang <[email protected]> Co-authored-by: Jungwook Park <[email protected]> (cherry picked from commit f11c5ba)
bertmaher
pushed a commit
that referenced
this pull request
Dec 19, 2024
Cherry pick list: - #4925 - #5053 - #5019 - #5002 - #4935 - required additional cherry picks #4991 and #4951 - #4998 - #4925 - #5281 - #5308 - All previous LLVM hash PRs before #5308 --------- Co-authored-by: Ilya V <[email protected]> Co-authored-by: Lei Zhang <[email protected]> Co-authored-by: Lixun Zhang <[email protected]> Co-authored-by: Keren Zhou <[email protected]> Co-authored-by: Alexander Efimov <[email protected]> Co-authored-by: Kyle Wang <[email protected]> Co-authored-by: Jungwook Park <[email protected]> Co-authored-by: peterbell10 <[email protected]> Co-authored-by: Hongtao Yu <[email protected]>
bertmaher
pushed a commit
that referenced
this pull request
Dec 19, 2024
Reverts #5191 due to some mlir errors in pytorch unit tests Smaller set of cherry picks: - #5308 (and previous LLVM upgrades) - #5281 - #4925 - #5053 - #5019 - #4998 --------- Co-authored-by: Jungwook Park <[email protected]> Co-authored-by: peterbell10 <[email protected]> Co-authored-by: Hongtao Yu <[email protected]> Co-authored-by: Lei Zhang <[email protected]> Co-authored-by: Ilya V <[email protected]> Co-authored-by: Kyle Wang <[email protected]>
bertmaher
pushed a commit
that referenced
this pull request
Dec 19, 2024
This PR brings in required LLVM bumps and additional targets for gfx950 support. - #5040 - #5064 - #5180 - #5242 - #5392 Note this PR reverts the last two PRs to only focus on the LLVM upgrade - #5347 - #5191 --------- Co-authored-by: peterbell10 <[email protected]> Co-authored-by: Hongtao Yu <[email protected]> Co-authored-by: Lei Zhang <[email protected]> Co-authored-by: Jungwook Park <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Cherry pick list:
isMmaToDotShortcut
with linear layout based logic #4951