[TIR][Schedule] DecomposePadding #12174

wrongtest-intellif · 2022-07-25T13:39:49Z

Hi there, this PR wants to introduce a new TIR schedule primitive schd.decompose_padding(block, loop).

For padded conv or pooling ops, there is a typical padding pattern:

Pad[_] = T.if_then_else(pad_predicate, X[_], pad_value)

which could be decomposed into two parts:

One filling pad values with memset routine
One filling in-bound values with memcpy routine

The primitive's signature is alike decompose_reduction, which provides a target block and a loop position to insert newly created "init" block. It is helpful for infrastructures with high-performance memset/memcpy routines, and leverage the complexity to process padding conditions in the main compute block.

Example

Before

@T.prim_func
def before_decompose(x: T.Buffer[128, "int32"], y: T.Buffer[140, "int32"]):
    for i in range(140):
        with T.block("block"):
            vi = T.axis.remap("S", [i])
            y[vi] = T.if_then_else(vi >= 6 and vi < 134, x[vi - 6], 0, dtype="int32")

After decompose_padding(block, i)

@T.prim_func
def after_decompose(x: T.Buffer[128, "int32"], y: T.Buffer[140, "int32"]):
    for i in T.serial(140):
        with T.block("block_pad_const"):
            vi = T.axis.spatial(140, i)
            y[vi] = 0
    for i in T.serial(128):
        with T.block("block"):
            vi = T.axis.spatial(128, i)
            y[vi + 6] = x[vi]

Alternatives and drawbacks

From the graph perspective, one may be able to totally fold out the block which pad the input buffer. While the primitive seems to be more useful when one wants to perform padding in the intra-primfunc buffers.
One could also compute-inline the block perform padding, this introduces conditions in the main computation block, which may or may-not get optimized well, depending on the concrete target.
Currently there are schedule ability limitations on created blocks after decomposition. They can not be then compute-ated or compute-inlined. Because multiple blocks write to the same buffer break the stage pipeline property.

vinx13 · 2022-07-25T17:37:45Z

cc @Lunderberg

Hzfengsy

LGTM. cc @junrushao1994

Co-authored-by: baoxinqi <[email protected]>

TIR Schedule primitive - decompose_padding

ae7a62e

vinx13 self-assigned this Jul 25, 2022

Hzfengsy approved these changes Jul 26, 2022

View reviewed changes

vinx13 approved these changes Jul 27, 2022

View reviewed changes

vinx13 merged commit ca30e5e into apache:main Jul 27, 2022

wrongtest-intellif added a commit that referenced this pull request Aug 30, 2022

TIR Schedule primitive - decompose_padding (#12174)

2d6a91f

Co-authored-by: baoxinqi <[email protected]>

AndrewZhaoLuo mentioned this pull request Oct 4, 2022

TVM v0.10.0.rc0 Release Candidate Notes #12979

Closed

xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022

TIR Schedule primitive - decompose_padding (apache#12174)

6435367

Co-authored-by: baoxinqi <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TIR][Schedule] DecomposePadding #12174

[TIR][Schedule] DecomposePadding #12174

wrongtest-intellif commented Jul 25, 2022

vinx13 commented Jul 25, 2022

Hzfengsy left a comment

[TIR][Schedule] DecomposePadding #12174

[TIR][Schedule] DecomposePadding #12174

Conversation

wrongtest-intellif commented Jul 25, 2022

Example

Alternatives and drawbacks

vinx13 commented Jul 25, 2022

Hzfengsy left a comment

Choose a reason for hiding this comment