[TIR] VNNI and ARM dot product intrinsic for tensorization #10925

masahi · 2022-04-07T02:00:12Z

Introduces a new directory python/tvm/tir/tensor_intrin where we put intrinsic descriptions written in TVMScript for various HW targets. They can be used by manual tensorized TIR schedules or auto-tensorized ones. More intrinsics, such as tensorcore ones, DP4A etc will be added later.

@junrushao1994 @vinx13 @shingjan @Hzfengsy @spectrometerHBH

masahi · 2022-04-07T02:03:10Z

python/tvm/tir/tensor_intrin/arm_cpu.py

+
+
+@T.prim_func
+def dot_product_4x4_i8i8i32_neon(


This is equivalent to the TE one in

tvm/python/tvm/topi/arm_cpu/tensor_intrin.py

Line 536 in 912993f

def dot_int8_int8_int32_neon():

cc @tkonolige

masahi · 2022-04-07T02:03:44Z

python/tvm/tir/tensor_intrin/arm_cpu.py

+
+
+@T.prim_func
+def dot_product_4x4_i8i8i32_sdot(


This is equivalent to the TE one in

tvm/python/tvm/topi/arm_cpu/tensor_intrin.py

Line 431 in 912993f

def dot_int8_int8_int32_neon_82(int32_lanes, dtype="uint"):

masahi · 2022-04-07T02:04:51Z

python/tvm/tir/tensor_intrin/x86.py

+
+
+@T.prim_func
+def dot_product_16x4_u8i8i32_vnni(


Equivalent to the TE one in

tvm/python/tvm/topi/x86/tensor_intrin.py

Line 244 in 912993f

def dot_16x1x16_uint8_int8_int32_cascadelake():

masahi · 2022-04-07T02:18:16Z

python/tvm/tir/tensor_intrin/arm_cpu.py

+
+
+# TODO(masahi): Parametrize the TVMScript description of dot product by
+# shape and dtype, and share the common description with x86.


cc @junrushao1994 @yelite, this is one of the common needs for meta programming support in TVMScript. I think shape parameterization is possible via specialize, but not sure if I can use that with T.Buffer syntax sugar.

A similar need arises for tensorcore (different mma shape x data type)

masahi · 2022-04-07T02:21:53Z

python/tvm/tir/tensor_intrin/arm_cpu.py

+
+        vec_b = B.vload([0, 0], dtype="int8x16")
+
+        # TODO(masahi): Remove duplication when inlined function call is supported


cc @junrushao1994 @yelite, I want to define and call a convenience function like

tvm/python/tvm/topi/arm_cpu/tensor_intrin.py

Line 625 in 912993f

def pairwise_add_mul(extract_half):

to remove duplication in IR generation. I think this is a low-hanging fruit that can be supported without more foundational work on meta-programming? (just need to tweak the python parser?)

Yes, it is likely doable and @Hzfengsy probably already has something ready

junrushao · 2022-04-08T01:29:53Z

CC @vinx13 would you like to review? Thanks a lot!

)

masahi commented Apr 7, 2022

View reviewed changes

masahi added 12 commits April 7, 2022 11:43

[TIR] Add VNNI dot product intrinsic for TIR

711a007

refactored existing test using VNNI intrin

88b763e

add VNNI unittest

38a5aca

rename vnni.py to x86.py

0ced85f

use buffer syntax sugar

1351fde

Add ARM intrin

69e72b6

fixed offset factor

625cd27

use vectorlow/high in arm intrin

d8e43ec

simplify import

9a3e508

pylint

7a757fe

black

15e60b4

more lint fix

07bbb38

masahi force-pushed the tir-tensor-intrin branch from 82e152a to 07bbb38 Compare April 7, 2022 02:43

function doc string not supported by tvmscript

cfc2294

vinx13 approved these changes Apr 8, 2022

View reviewed changes

vinx13 merged commit fc04738 into apache:main Apr 8, 2022

pfk-beta pushed a commit to pfk-beta/tvm that referenced this pull request Apr 11, 2022

[TIR] VNNI and ARM dot product intrinsic for tensorization (apache#10925

edff0ac

)

mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request Apr 11, 2022

[TIR] VNNI and ARM dot product intrinsic for tensorization (apache#10925

429d426

)

Lucien0 pushed a commit to Lucien0/tvm that referenced this pull request Apr 19, 2022

[TIR] VNNI and ARM dot product intrinsic for tensorization (apache#10925

c89a3a2

)

altanh pushed a commit to altanh/tvm that referenced this pull request Apr 28, 2022

[TIR] VNNI and ARM dot product intrinsic for tensorization (apache#10925

31c151b

)

driazati mentioned this pull request Jul 14, 2022

TVM v0.9.0.rc0 Release Candidate Notes #12102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TIR] VNNI and ARM dot product intrinsic for tensorization #10925

[TIR] VNNI and ARM dot product intrinsic for tensorization #10925

masahi commented Apr 7, 2022 •

edited

Loading

masahi Apr 7, 2022

masahi Apr 7, 2022

masahi Apr 7, 2022

masahi Apr 7, 2022 •

edited

Loading

masahi Apr 7, 2022 •

edited

Loading

junrushao Apr 8, 2022

junrushao commented Apr 8, 2022



		# TODO(masahi): Parametrize the TVMScript description of dot product by
		# shape and dtype, and share the common description with x86.


		vec_b = B.vload([0, 0], dtype="int8x16")

		# TODO(masahi): Remove duplication when inlined function call is supported

[TIR] VNNI and ARM dot product intrinsic for tensorization #10925

[TIR] VNNI and ARM dot product intrinsic for tensorization #10925

Conversation

masahi commented Apr 7, 2022 • edited Loading

masahi Apr 7, 2022

Choose a reason for hiding this comment

masahi Apr 7, 2022

Choose a reason for hiding this comment

masahi Apr 7, 2022

Choose a reason for hiding this comment

masahi Apr 7, 2022 • edited Loading

Choose a reason for hiding this comment

masahi Apr 7, 2022 • edited Loading

Choose a reason for hiding this comment

junrushao Apr 8, 2022

Choose a reason for hiding this comment

junrushao commented Apr 8, 2022

masahi commented Apr 7, 2022 •

edited

Loading

masahi Apr 7, 2022 •

edited

Loading

masahi Apr 7, 2022 •

edited

Loading