Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BYOC] Two helper passes for external codegen using RelayToTIR custom pass machinery #11474

Merged
merged 2 commits into from
Jun 3, 2022

Conversation

mbs-octoml
Copy link
Contributor

@mbs-octoml mbs-octoml commented May 26, 2022

(See https://discuss.tvm.apache.org/t/byoc-supporting-cutlass-byoc-with-collage/12796/6 for
context, which in turn is part of Collage (https://github.com/apache/tvm-rfcs/blob/main/rfcs/0062-collage.md).

For reasons explained in the above thread I'm moving CUTLASS to be IRModule-at-a-time external codegen
using a custom RelayToTIR pass instead of the traditional function-at-a-time external codegen using
a relay.ext.cutlass registered function. This means some of the rewriting done on-the-fly by LowerTEPass now
needs to be done by the custom pass directly. This PR supplies two passes which ease that burden:

  • Before starting the extern-codegen specific processing, make sure all "Compiler" attributed functions have
    unique global definitions (ie are outlined). Though functions start in this form after BYOC partitioning,
    under Graph and AOT compilation flows those functions are then inlined to pass through the 'codegen' keyhole
    which assumes the whole model is just one self-contained main function. This pass will undo that. (I gave up
    trying to just remove the inlining in the first place.)
  • After the extern-codegen specific processing the now compiled "Compiler" attributed functions need to marked as
    'extern'. The te_compiler.cc uses the "ExternalSymbol" attribute for that. But since a) the symbol name
    on that attribute is never needed, only the presence of the attribute is significant downstream and b) "ExternalSymbol"
    is easy to confuse with "global_symbol", I also replace "ExternalSymbol" with "Extern" with an Integer(1) value
    (cf "Primitive").

The outlining pass is a little more general than necessary because it (will also) be used by Collage to
rewrite the IRModule into optimally partitioned form while making maximal reuse of structurally equal partitioned
functions. Hence the abstract GlobalSymbolCache.

@mbs-octoml mbs-octoml force-pushed the mbs-collage-compiler-function branch 2 times, most recently from f56e498 to 3bb37aa Compare May 26, 2022 19:05
@mbs-octoml
Copy link
Contributor Author

ci failures are unrelated

@mbs-octoml mbs-octoml force-pushed the mbs-collage-compiler-function branch from 52dcc90 to 5ece9dd Compare May 27, 2022 18:01
Copy link
Contributor

@sunggg sunggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the great PR, @mbs-octoml!
It looks good to me. Especially, I like to change towards Extern.
Just a few clarification question.

}

def @tvmgen_default_cutlass_main_0(%y_0_i0: Tensor[(1600, 768), float16], %y_0_i1: Tensor[(2304, 768), float16], %y_0_i2: Tensor[(2304), float16],
Inline=1, Compiler="cutlass", global_symbol="tvmgen_default_cutlass_main_0", Primitive=1) -> Tensor[(1600, 2304), float16] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this Inline=1 mean? Just want to double check if the outlining should unset this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIKT only PartitionGraph sets it, and it triggers Inline to actually do the inline. I guess at the time folks were thinking of supporting a kind of inline profit analysis separate from the inline pass itself?

The 'outlining' pass will preserve the attribute, along with all the other attributes. So, nothing stopping someone running Inline to push them back in again if that's what they really want!

The 'make extern' pass however intentionally drops all attributes since these functions are now just stubs and the body is no longer meaningful.

In an ideal world I think:

  • "Compiler" only makes sense on global functions
  • The graph and aot flows would respect the entire IRModule and not be restricted to seeing just 'main', hence there'd never be a need to inline anything other than for perf.
  • IRModule would support the notion of extern functions (along with extern/defined constants, and runtime modules)

But I'm trying to work with what we have :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thank you for the detailed comment! I think it is very good to know and your approach makes sense to me.

One concern is the confusion from the name.
Especially from the people whom are not familiar with these context (like myself) might wonder why outlining does not matter with Inline attribute.
I'd like to suggest changing either one of the names or clearly document somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sung, sorry I only just noticed your responses after the merge.

I think a comment saying 'will outline functions even if they have Inline=1' would address your concern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries! It's just a nit. that would be nice. Thank you!

@mbs-octoml
Copy link
Contributor Author

Thanks for taking a look Sung, hope those answers help. I've had to reverse engineer this stuff but think I have it right.

Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @mbs-octoml added some comments

include/tvm/relay/function.h Outdated Show resolved Hide resolved
include/tvm/relay/function.h Outdated Show resolved Hide resolved
python/tvm/relay/transform/transform.py Outdated Show resolved Hide resolved
src/relay/backend/te_compiler.cc Outdated Show resolved Hide resolved
src/relay/transforms/compiler_function_utils.cc Outdated Show resolved Hide resolved
… pass machinery

(See https://discuss.tvm.apache.org/t/byoc-supporting-cutlass-byoc-with-collage/12796/6 for
context, which in turn is part of Collage (https://github.com/apache/tvm-rfcs/blob/main/rfcs/0062-collage.md).

For reasons explained in the above thread I'm moving CUTLASS to be IRModule-at-a-time external codegen
using a custom RelayToTIR pass instead of the traditional function-at-a-time external codegen using
a relay.ext.cutlass registered function. This means some of the rewriing done on-the-fly by LowerTEPass now
needs to be done by the custom pass directly. This PR supplies two passes which ease that burden:
 - Before starting the CUTLASS-specific processing, make sure all "Compiler" attributed functions have
   unique global definitions (ie are outlined). Though functions start in this form after BYOC partitioning,
   under Graph and AOT compilation flows those functions are then inlined to pass through the 'codegen' keyhole
   which assumes the whole model is just one self-contained main function. This pass will undo that. (I gave up
   trying to just remove the inlining in the first place.)
 - After the CUTLASS-specific processing the now compiled "Compiler" attributed functions need to marked as
   'extern'. The te_compiler.cc uses the "ExternalSymbol" attribute for that, but since a) the symbol name
   is never needed, on the presense of the attribute is significant downstream and b) "ExternalSymbol" is
   easy to confuse with "global_symbol", I just replaced "ExternalSymbol" with "Extern" with an Integer(1)
   (cf "Primitive").

 The outlining pass is a little more general than necessary because it (will also) be used by Collage to
 rewrite the IRModule into optimally partitioned form while making maximal reuse of partition functions.
 Hence the abstract GlobalSymbolCache.
@mbs-octoml mbs-octoml force-pushed the mbs-collage-compiler-function branch from 5ece9dd to f88b5a8 Compare June 3, 2022 00:17
@mbs-octoml
Copy link
Contributor Author

Thanks @areusch, PTAL.

Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @mbs-octoml !

@mbs-octoml
Copy link
Contributor Author

@tvm-bot merge

@github-actions
Copy link
Contributor

github-actions bot commented Jun 3, 2022

Cannot merge, these CI jobs are not successful on f88b5a8:

@areusch areusch merged commit 9dceb4e into apache:main Jun 3, 2022
@mbs-octoml mbs-octoml deleted the mbs-collage-compiler-function branch June 3, 2022 20:21
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Jun 7, 2022
…hape functions

In apache#11474 I got ready to switch CUTLASS from function-at-a-time to IRModule-at-a-time compilation.
However my approach didn't handle dynamic shape functions, so I adjust it here.

The idea is still that such passes will leave behind
calls to 'extern' functions. However, converting those
calls to 'call_lowered' form in
MarkCompilerFunctionsAsExtern is too soon since only
the TECompiler knows how to capture all the attributes
necessary to support dynamic shape functions.

So stop doing that in MarkCompilerFunctionsAsExtern and
instead support this case properly in the TECompiler.

While there try to chip away at the chronic lack of structure in te_compiler.cc. Every little bit helps.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Jun 8, 2022
…hape functions

In apache#11474 I got ready to switch CUTLASS from function-at-a-time to IRModule-at-a-time compilation.
However my approach didn't handle dynamic shape functions, so I adjust it here.

The idea is still that such passes will leave behind
calls to 'extern' functions. However, converting those
calls to 'call_lowered' form in
MarkCompilerFunctionsAsExtern is too soon since only
the TECompiler knows how to capture all the attributes
necessary to support dynamic shape functions.

So stop doing that in MarkCompilerFunctionsAsExtern and
instead support this case properly in the TECompiler.

While there try to chip away at the chronic lack of structure in te_compiler.cc. Every little bit helps.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Jun 9, 2022
…hape functions

In apache#11474 I got ready to switch CUTLASS from function-at-a-time to IRModule-at-a-time compilation.
However my approach didn't handle dynamic shape functions, so I adjust it here.

The idea is still that such passes will leave behind
calls to 'extern' functions. However, converting those
calls to 'call_lowered' form in
MarkCompilerFunctionsAsExtern is too soon since only
the TECompiler knows how to capture all the attributes
necessary to support dynamic shape functions.

So stop doing that in MarkCompilerFunctionsAsExtern and
instead support this case properly in the TECompiler.

While there try to chip away at the chronic lack of structure in te_compiler.cc. Every little bit helps.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Jun 9, 2022
…hape functions

In apache#11474 I got ready to switch CUTLASS from function-at-a-time to IRModule-at-a-time compilation.
However my approach didn't handle dynamic shape functions, so I adjust it here.

The idea is still that such passes will leave behind
calls to 'extern' functions. However, converting those
calls to 'call_lowered' form in
MarkCompilerFunctionsAsExtern is too soon since only
the TECompiler knows how to capture all the attributes
necessary to support dynamic shape functions.

So stop doing that in MarkCompilerFunctionsAsExtern and
instead support this case properly in the TECompiler.

While there try to chip away at the chronic lack of structure in te_compiler.cc. Every little bit helps.

Add a basic unit test.
masahi pushed a commit that referenced this pull request Jun 10, 2022
…hape functions (#11619)

In #11474 I got ready to switch CUTLASS from function-at-a-time to IRModule-at-a-time compilation.
However my approach didn't handle dynamic shape functions, so I adjust it here.

The idea is still that such passes will leave behind
calls to 'extern' functions. However, converting those
calls to 'call_lowered' form in
MarkCompilerFunctionsAsExtern is too soon since only
the TECompiler knows how to capture all the attributes
necessary to support dynamic shape functions.

So stop doing that in MarkCompilerFunctionsAsExtern and
instead support this case properly in the TECompiler.

While there try to chip away at the chronic lack of structure in te_compiler.cc. Every little bit helps.

Add a basic unit test.
Kathryn-cat pushed a commit to Kathryn-cat/tvm that referenced this pull request Jun 10, 2022
…hape functions (apache#11619)

In apache#11474 I got ready to switch CUTLASS from function-at-a-time to IRModule-at-a-time compilation.
However my approach didn't handle dynamic shape functions, so I adjust it here.

The idea is still that such passes will leave behind
calls to 'extern' functions. However, converting those
calls to 'call_lowered' form in
MarkCompilerFunctionsAsExtern is too soon since only
the TECompiler knows how to capture all the attributes
necessary to support dynamic shape functions.

So stop doing that in MarkCompilerFunctionsAsExtern and
instead support this case properly in the TECompiler.

While there try to chip away at the chronic lack of structure in te_compiler.cc. Every little bit helps.

Add a basic unit test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants