[TIR][LLVM] Expose TIR api for llvm sext/zext and trunc native type converter intrinsics #15960

cbalint13 · 2023-10-20T20:54:32Z

This PR expose new tir api operators binded to their llvm native intrinsic counterparts.
Adds the ability to emit native cpu intrinsics for atomic type conversions of vectors for tensorizers.

Changes

Adds new TIR ops mapped to LLVM instrinsics: zextend, sextend, truncate for type conversions.
Enables TIR op atomic_add mapping to proper LLVM intrinsic guaranteed (best-effort) to lower to single instruction.

Rationale

Some highly efficient CPU intrinsics related to data type manipulations of whole vectors are not exposed by LLVM.
As substitute LLVM offers "higher level functions" with guarantees that will emit the exact & right instruction on CPU.

Example

On x86 we want to expand a vector from uint8x16 -> uint16x16 or perhaps sign expand to int16x16.
In order to do this the pmovzxwd and pmovsxbw are needed which are not exposed by LLVM directly.
The new zextend (non-sign, zero aware) and sextend (sign aware) functions can now do this:
- Here is the tir invocation:
```
  vec_a_words = tvm.tir.zextend("uint16x8", vec_a_uint8x8)
  vec_b_words = tvm.tir.sextend("int16x8", vec_b_int8x8)
```
- In the lowered x86 assembley there are now exactly the mentioned two instructions:
```
 1ea3:	c4 e2 79 30 e2       	vpmovzxbw %xmm2,%xmm4
 1ea8:	c4 e2 79 20 12       	vpmovsxbw (%rdx),%xmm2
```

Notes

A more complete example with real usage in a tensorization process with these new tir operators can be seen here .
This also allows more TOPI/MS data type conversions leveraging precise control on involved atomic CPU instructions.

This PR is indispensable part of #15918 , an effort towards int8 tensorization coverage on x86.

Cc: @Lunderberg , @junrushao , @masahi , @vinx13, @ekalda , @lhutton1 , @quic-sanirudh , @kparzysz-quic

Lunderberg · 2023-10-23T21:08:37Z

Other than lack of hardware support, are there any cases where we wouldn't want to apply these intrinsics? If there aren't, I'm wondering if llvm::Instruction::ZExt and llvm::Instruction::SExt should be the default codegen for T.cast(value_uint8x8, 'uint8x16') and T.cast(value_uint8x8, 'int8x8'), respectively.

cbalint13 · 2023-10-23T22:14:03Z

@Lunderberg ,

Other than lack of hardware support, are there any cases where we wouldn't want to apply these intrinsics?

Not aware of, in worst case LLVM's sext/zext will map to some "multiple" instructions that mimics that.

If there aren't, I'm wondering if llvm::Instruction::ZExt and llvm::Instruction::SExt should be the default codegen for T.cast(value_uint8x8, 'uint8x16') and T.cast(value_uint8x8, 'int8x8'), respectively.

Not sure where you want to point, but yes, Zext/Sext could replace those casts as you mentioned.

To sum up the needs here:

The very need here is in this PR at this proposed tensorization of PR#15918 (now splitted to multiple PRs).
This would bring a precise (not overflowing !) set of int8 SIMD tensorizer on x86, backward down to ssse3.
Current int8 SIMD set on x86 are overflowing and imprecise (except vnni), also TOPI lacks proper lane-size awareness.

A bit longer:

The ultimate goal, if 15918 is done, would be to have a solid int8 support (again not overflowing on x86 !), working even backward with older hardware (not everyone have vnni or avx512) to make LLM and other quantized models happier.

cbalint13 · 2023-10-24T12:42:31Z

@Lunderberg ,

Other than lack of hardware support, are there any cases where we wouldn't want to apply these intrinsics?

Not aware of, in worst case LLVM's sext/zext will map to some "multiple" instructions that mimics that.

If there aren't, I'm wondering if llvm::Instruction::ZExt and llvm::Instruction::SExt should be the default codegen for T.cast(value_uint8x8, 'uint8x16') and T.cast(value_uint8x8, 'int8x8'), respectively.

Not sure where you want to point, but yes, Zext/Sext could replace those casts as you mentioned.

You mean CreateIntCast() generic int and CreateIntCast() for truncate , right ?

If so, sext/zext and truncate would be finer grain approach (using more specific llvm IR ones).
The need here for exposing tir sext/zext/truncate would be for fine control of intrinsics in tensorization.

Lunderberg · 2023-10-24T14:38:18Z

Not sure where you want to point, but yes, Zext/Sext could replace those casts as you mentioned.

The main thing I'm wondering is whether the support for these conversions should be done through an LLVM-specific intrinsic, or through a change in CodegenLLVM to recognize the T.cast and call builder_->CreateZExt instead of builder_->CreateIntCast. If this is always beneficial to do, then we wouldn't need to introduce an explicit intrinsic for it, as we would already have a way to express the same effect within TIR.

Though, looking at LLVM's implementation of IRBuilderBase::CreateIntCast, it looks like it already does produce a Instruction::ZExt when applicable.

The need here for exposing tir sext/zext/truncate would be for fine control of intrinsics in tensorization.

Ah, that makes sense. So if I understand correctly, we are currently relying on LLVM's choice of intrinsic for performing integer to integer casts, but we want to be able to override that choice of intrinsic by explicitly specifying it.

[TIR][LLVM] Add TIR native type converters

c156d76

cbalint13 marked this pull request as ready for review October 21, 2023 08:49

cbalint13 changed the title ~~[TIR][LLVM] Add TIR native type converters~~ [TIR][LLVM] Expose TIR api for llvm sext/zext and trunc native type converter intrinsics Oct 24, 2023

cbalint13 closed this Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TIR][LLVM] Expose TIR api for llvm sext/zext and trunc native type converter intrinsics #15960

[TIR][LLVM] Expose TIR api for llvm sext/zext and trunc native type converter intrinsics #15960

cbalint13 commented Oct 20, 2023 •

edited

Loading

Lunderberg commented Oct 23, 2023

cbalint13 commented Oct 23, 2023 •

edited

Loading

cbalint13 commented Oct 24, 2023 •

edited

Loading

Lunderberg commented Oct 24, 2023

[TIR][LLVM] Expose TIR api for llvm sext/zext and trunc native type converter intrinsics #15960

[TIR][LLVM] Expose TIR api for llvm sext/zext and trunc native type converter intrinsics #15960

Conversation

cbalint13 commented Oct 20, 2023 • edited Loading

Changes

Rationale

Example

Notes

Lunderberg commented Oct 23, 2023

cbalint13 commented Oct 23, 2023 • edited Loading

cbalint13 commented Oct 24, 2023 • edited Loading

Lunderberg commented Oct 24, 2023

cbalint13 commented Oct 20, 2023 •

edited

Loading

cbalint13 commented Oct 23, 2023 •

edited

Loading

cbalint13 commented Oct 24, 2023 •

edited

Loading