-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUTLASS] Conv2d activation fusion, part 2: Sigmoid fp16, SiLU and HardSwish #9795
Conversation
|
||
if mode == "constant": | ||
if not non_zero_found: | ||
return data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a minor optimization but it non-trivially helped performance on the DETR model. @comaniac
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm interesting. I didn't notice that we may have pad ops that actually pad nothing.
@@ -467,6 +467,9 @@ bool StridedSliceRel(const Array<Type>& types, int num_inputs, const Attrs& attr | |||
int64_t num_axis = dshape.size(); | |||
|
|||
const auto* begin = types[1].as<TensorTypeNode>(); | |||
if (begin == nullptr) { | |||
return false; | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the change below in src/relay/op/tensor/transform.cc
are the fix for the type inference issue mentioned in "Known issues" section of #9746
No test is added because it is hard to reproduce on a simple test case and the change is trivial.
56e0e95
to
18e0736
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
if mode == "constant": | ||
if not non_zero_found: | ||
return data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm interesting. I didn't notice that we may have pad ops that actually pad nothing.
…rdSwish (apache#9795) * [Torch] do not pad if pad widths are all zero * silu fusion supported * adding hardswish support * support fast_math sigmoid op * fixed type inference for yolov5 + silu fusion * use include_non_call_ops=False in AnnotateTarget * update cutlass * revert change in build.py * simplify codegen * lint
…rdSwish (apache#9795) * [Torch] do not pad if pad widths are all zero * silu fusion supported * adding hardswish support * support fast_math sigmoid op * fixed type inference for yolov5 + silu fusion * use include_non_call_ops=False in AnnotateTarget * update cutlass * revert change in build.py * simplify codegen * lint
…rdSwish (apache#9795) * [Torch] do not pad if pad widths are all zero * silu fusion supported * adding hardswish support * support fast_math sigmoid op * fixed type inference for yolov5 + silu fusion * use include_non_call_ops=False in AnnotateTarget * update cutlass * revert change in build.py * simplify codegen * lint
Now dependent PRs in the cutlass repo have been merged, so we can enable more fusions. They were used in the benchmark in #9746
@comaniac @Laurawly