[TIR][LLVM] Expose TIR api for llvm sext/zext and trunc native type converter intrinsics #15960
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR expose new tir api operators binded to their llvm native intrinsic counterparts.
Adds the ability to emit native cpu intrinsics for atomic type conversions of vectors for tensorizers.
Changes
zextend
,sextend
,truncate
for type conversions.atomic_add
mapping to proper LLVM intrinsic guaranteed (best-effort) to lower to single instruction.Rationale
Some highly efficient CPU intrinsics related to data type manipulations of whole vectors are not exposed by LLVM.
As substitute LLVM offers "higher level functions" with guarantees that will emit the exact & right instruction on CPU.
Example
On x86 we want to expand a vector from
uint8x16
->uint16x16
or perhaps sign expand toint16x16
.In order to do this the pmovzxwd and pmovsxbw are needed which are not exposed by LLVM directly.
The new
zextend
(non-sign, zero aware) andsextend
(sign aware) functions can now do this:Notes
A more complete example with real usage in a tensorization process with these new tir operators can be seen here .
This also allows more TOPI/MS data type conversions leveraging precise control on involved atomic CPU instructions.
This PR is indispensable part of #15918 , an effort towards int8 tensorization coverage on x86.
Cc: @Lunderberg , @junrushao , @masahi , @vinx13, @ekalda , @lhutton1 , @quic-sanirudh , @kparzysz-quic