Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUTLASS] Initial conv2d support #9595

Merged
merged 24 commits into from
Dec 2, 2021
Merged

Conversation

masahi
Copy link
Member

@masahi masahi commented Nov 26, 2021

Adds boilerplate for generating conv2d kernels. Dynamic shape is supported.

To keep the diff small, this first PR only adds minimum code to demonstrate basic functionalities. In particular, activation fusion is not implemented yet, and profiling and kernel selection is done by piggy-backing on the existing GEMM profiler (see cutlass/gen_conv2d.py). The latter choice simplified the implementation, but as discussed in NVIDIA/cutlass#358, we probably want a dedicated profiler and kernel selection logic for conv2d. These missing features will be added after this PR.

cc @comaniac @Laurawly @zhiics

CutlassPrint(conv2d_decl, "size_t workspace_size = conv2d_op.get_workspace_size(arguments);\n");
// Allocate workspace memory
CutlassPrint(conv2d_decl,
"cutlass::device_memory::allocation<uint8_t> workspace(workspace_size);\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's memory leak by allocating workspace this way. @ZihengJiang

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM

Comment on lines 85 to 88
if epilogue == EpilogueFunctor.LinearCombination:
op_entry["op"] = op
op_entry["name"] = op.procedural_name()
op_entry["runtime"] = 9999999
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this for? Maybe also adding comments for clarification.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This corresponds to the gemm generator counterpart in

op_entry["op"] = op
op_entry["name"] = op.procedural_name()
op_entry["opdef"] = kernel_emitter.emit(op, batched=batched)
op_entry["opdef_bias"] = kernel_emitter.emit(
op_bias, no_beta_scaling=True, batched=batched
)
op_entry["opdef_bias_relu"] = kernel_emitter.emit(
op_bias_relu, no_beta_scaling=True, batched=batched
)
op_entry["opdef_bias_gelu"] = kernel_emitter.emit(op_bias_gelu, batched=batched)
op_entry["src"] = profiler_emitter.emit(
op.procedural_name(),
kernel_emitter.emit(op, batched=False),
DataTypeTag[element_a],
DataTypeTag[element_b],
DataTypeTag[element_c],
op.leading_dim(),
)
op_entry["runtime"] = 9999999

In addition to creating opdef, opdef_bias etc, we also need to set op, name, runtime etc. I tried to simplify that code and this is what I came up with.

I'll rewrite this code to make it easier to understand (by pulling the non-activation case, EpilogueFunctor.LinearCombination, out of the loop).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be much clearer now.

@masahi
Copy link
Member Author

masahi commented Dec 1, 2021

@comaniac good to go?

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @Laurawly @ZihengJiang I guess we could merge this first and fix the memory leaking issue in follow-up PRs if that is really a case?

@comaniac comaniac merged commit dc988b2 into apache:main Dec 2, 2021
@comaniac
Copy link
Contributor

comaniac commented Dec 2, 2021

Thanks @masahi. @Laurawly @ZihengJiang please feel free to continue the discussion here or at the forum for the potential issue.

ylc pushed a commit to ylc/tvm that referenced this pull request Jan 7, 2022
* Add initial conv generator

* added conv2d pattern

* profile by gemm profiler

* remove conv2d profiler for now

* remove unused code

* add default

* minor fix, profiling working

* start codegen

* generated code compiled

* fixed layout initialization

* matched with autotvm tensorcore result

* test refactor

* minor cleanup

* remove iteration algo "Analytic"

* add test for dynamic batch conv2d

* pass dl tensor as output too

* support conv2d dynamic shape in codegen

* test working

* lint

* simplify codegen

* fix weird formatting

* typo fix

* check if cutlass is enabled in the test

* simplify gen_conv2d.py
yangulei pushed a commit to yangulei/tvm that referenced this pull request Jan 11, 2022
* Add initial conv generator

* added conv2d pattern

* profile by gemm profiler

* remove conv2d profiler for now

* remove unused code

* add default

* minor fix, profiling working

* start codegen

* generated code compiled

* fixed layout initialization

* matched with autotvm tensorcore result

* test refactor

* minor cleanup

* remove iteration algo "Analytic"

* add test for dynamic batch conv2d

* pass dl tensor as output too

* support conv2d dynamic shape in codegen

* test working

* lint

* simplify codegen

* fix weird formatting

* typo fix

* check if cutlass is enabled in the test

* simplify gen_conv2d.py
yangulei pushed a commit to yangulei/tvm that referenced this pull request Jan 12, 2022
* Add initial conv generator

* added conv2d pattern

* profile by gemm profiler

* remove conv2d profiler for now

* remove unused code

* add default

* minor fix, profiling working

* start codegen

* generated code compiled

* fixed layout initialization

* matched with autotvm tensorcore result

* test refactor

* minor cleanup

* remove iteration algo "Analytic"

* add test for dynamic batch conv2d

* pass dl tensor as output too

* support conv2d dynamic shape in codegen

* test working

* lint

* simplify codegen

* fix weird formatting

* typo fix

* check if cutlass is enabled in the test

* simplify gen_conv2d.py
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
* Add initial conv generator

* added conv2d pattern

* profile by gemm profiler

* remove conv2d profiler for now

* remove unused code

* add default

* minor fix, profiling working

* start codegen

* generated code compiled

* fixed layout initialization

* matched with autotvm tensorcore result

* test refactor

* minor cleanup

* remove iteration algo "Analytic"

* add test for dynamic batch conv2d

* pass dl tensor as output too

* support conv2d dynamic shape in codegen

* test working

* lint

* simplify codegen

* fix weird formatting

* typo fix

* check if cutlass is enabled in the test

* simplify gen_conv2d.py
@KangHe000
Copy link

Hi, @Laurawly , In your bolt paper, I notice that you made a special design for conv2d layout transformation(nchw to nhwc), this feature is very useful for networks with multiple convolutions. However, I can't find the way to enable this feature. Is it not merged into tvm? cc @masahi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants