-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support gfx950 layouts #692
Conversation
7910a91
to
fd5c641
Compare
7370be5
to
6bfb674
Compare
API change: - For blocked layout, use -tensorShape, which only takes two dims as dim0,dim1 - For dot layout, use -dotShape, which takes three dims as M,N,K
Separate each layout's code into their own files
- When kWidth is large, use a smaller elemSize honrizontally to save space - Improve the labels, such as - change vec to kWidth for operands - change opA/opB to inA/inB and include operand dims - remove group dims in the operands so that they don't overlap with operand block dims - Better alignment: dot op and mfma zoomed-in pics are bottom aligned
kGroup is defined as total elements per thread / kWidth for one mfma instruction. We need kGroup = 2 only for the newly added mfma_f32_16x16x128_f8f6f4 and mfma_f32_32x32x64_f8f6f4 with f8 input type on MI350.
And print mfma instruction name accordingly. For now, mixed precision mfma between 8-bit and 4- or 6-bit is not supported yet.
- Support data types - Support both 32 and 64 banks - Still working on LDS accesses
- Fixed the issue with maxPhase computation. Need to submit a PR to fix it in the triton compiler - For ds_read_b64 with 64 banks, there are bank conflicts. We need to figure out a different swizzling pattern to avoid bank conflicts.
Assumed a basic global access pattern
mfma_transpose_load instructions - Elements along the M/N dim are contiguous in both global memory and LDS. Note that this is not the in-thread transpose case. - Swizzling is disabled
80d0c5f
to
1b331cb
Compare
Aside from using it a bit to get a feel if there's anything hard to use, the only thing I can comment is that it might be worthwhile to break the python file down to several pieces to be more modular. The tex files are now broken into several ones already so it makes sense to have python follow the same thing, for readability and future-proof. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Can address the modularity and any potential feedbacks from usage in the future.
No description provided.