Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TIR][LLVM] Expose TIR api for llvm sext/zext and trunc native type converter intrinsics #15960

Closed
wants to merge 1 commit into from

Conversation

cbalint13
Copy link
Contributor

@cbalint13 cbalint13 commented Oct 20, 2023

This PR expose new tir api operators binded to their llvm native intrinsic counterparts.
Adds the ability to emit native cpu intrinsics for atomic type conversions of vectors for tensorizers.


Changes

  • Adds new TIR ops mapped to LLVM instrinsics: zextend, sextend, truncate for type conversions.
  • Enables TIR op atomic_add mapping to proper LLVM intrinsic guaranteed (best-effort) to lower to single instruction.

Rationale

Some highly efficient CPU intrinsics related to data type manipulations of whole vectors are not exposed by LLVM.
As substitute LLVM offers "higher level functions" with guarantees that will emit the exact & right instruction on CPU.

Example

  • On x86 we want to expand a vector from uint8x16 -> uint16x16 or perhaps sign expand to int16x16.
    In order to do this the pmovzxwd and pmovsxbw are needed which are not exposed by LLVM directly.
    The new zextend (non-sign, zero aware) and sextend (sign aware) functions can now do this:

    • Here is the tir invocation:
      vec_a_words = tvm.tir.zextend("uint16x8", vec_a_uint8x8)
      vec_b_words = tvm.tir.sextend("int16x8", vec_b_int8x8)
    
    • In the lowered x86 assembley there are now exactly the mentioned two instructions:
     1ea3:	c4 e2 79 30 e2       	vpmovzxbw %xmm2,%xmm4
     1ea8:	c4 e2 79 20 12       	vpmovsxbw (%rdx),%xmm2
    

Notes

A more complete example with real usage in a tensorization process with these new tir operators can be seen here .
This also allows more TOPI/MS data type conversions leveraging precise control on involved atomic CPU instructions.

This PR is indispensable part of #15918 , an effort towards int8 tensorization coverage on x86.


Cc: @Lunderberg , @junrushao , @masahi , @vinx13, @ekalda , @lhutton1 , @quic-sanirudh , @kparzysz-quic

@cbalint13 cbalint13 marked this pull request as ready for review October 21, 2023 08:49
@Lunderberg
Copy link
Contributor

Other than lack of hardware support, are there any cases where we wouldn't want to apply these intrinsics? If there aren't, I'm wondering if llvm::Instruction::ZExt and llvm::Instruction::SExt should be the default codegen for T.cast(value_uint8x8, 'uint8x16') and T.cast(value_uint8x8, 'int8x8'), respectively.

@cbalint13
Copy link
Contributor Author

cbalint13 commented Oct 23, 2023

@Lunderberg ,

Other than lack of hardware support, are there any cases where we wouldn't want to apply these intrinsics?

  • Not aware of, in worst case LLVM's sext/zext will map to some "multiple" instructions that mimics that.

If there aren't, I'm wondering if llvm::Instruction::ZExt and llvm::Instruction::SExt should be the default codegen for T.cast(value_uint8x8, 'uint8x16') and T.cast(value_uint8x8, 'int8x8'), respectively.

  • Not sure where you want to point, but yes, Zext/Sext could replace those casts as you mentioned.

To sum up the needs here:

  • The very need here is in this PR at this proposed tensorization of PR#15918 (now splitted to multiple PRs).
  • This would bring a precise (not overflowing !) set of int8 SIMD tensorizer on x86, backward down to ssse3.
  • Current int8 SIMD set on x86 are overflowing and imprecise (except vnni), also TOPI lacks proper lane-size awareness.

A bit longer:

The ultimate goal, if 15918 is done, would be to have a solid int8 support (again not overflowing on x86 !), working even backward with older hardware (not everyone have vnni or avx512) to make LLM and other quantized models happier.

@cbalint13
Copy link
Contributor Author

cbalint13 commented Oct 24, 2023

@Lunderberg ,

Other than lack of hardware support, are there any cases where we wouldn't want to apply these intrinsics?

  • Not aware of, in worst case LLVM's sext/zext will map to some "multiple" instructions that mimics that.

If there aren't, I'm wondering if llvm::Instruction::ZExt and llvm::Instruction::SExt should be the default codegen for T.cast(value_uint8x8, 'uint8x16') and T.cast(value_uint8x8, 'int8x8'), respectively.

  • Not sure where you want to point, but yes, Zext/Sext could replace those casts as you mentioned.

You mean CreateIntCast() generic int and CreateIntCast() for truncate , right ?

  • If so, sext/zext and truncate would be finer grain approach (using more specific llvm IR ones).
  • The need here for exposing tir sext/zext/truncate would be for fine control of intrinsics in tensorization.

@Lunderberg
Copy link
Contributor

Not sure where you want to point, but yes, Zext/Sext could replace those casts as you mentioned.

The main thing I'm wondering is whether the support for these conversions should be done through an LLVM-specific intrinsic, or through a change in CodegenLLVM to recognize the T.cast and call builder_->CreateZExt instead of builder_->CreateIntCast. If this is always beneficial to do, then we wouldn't need to introduce an explicit intrinsic for it, as we would already have a way to express the same effect within TIR.

Though, looking at LLVM's implementation of IRBuilderBase::CreateIntCast, it looks like it already does produce a Instruction::ZExt when applicable.

The need here for exposing tir sext/zext/truncate would be for fine control of intrinsics in tensorization.

Ah, that makes sense. So if I understand correctly, we are currently relying on LLVM's choice of intrinsic for performing integer to integer casts, but we want to be able to override that choice of intrinsic by explicitly specifying it.

@cbalint13 cbalint13 changed the title [TIR][LLVM] Add TIR native type converters [TIR][LLVM] Expose TIR api for llvm sext/zext and trunc native type converter intrinsics Oct 24, 2023
@cbalint13 cbalint13 closed this Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants