-
Notifications
You must be signed in to change notification settings - Fork 57
[DISCUSS] Pattern Language #160
Comments
Thank you for the great proposal, @ganler! A couple questions:
|
@sunggg Thanks for your feedback! Regarding the
but won't match:
|
@ganler Thank you for the clarification. I misunderstood its definition and I agree that yours can differentiate these two. Then, how about this case?
|
@sunggg Correct! I am not sure if we want to fuse
But it might not be beneficial as the theoretical FLOPs are increased. That's why I previously pointed out:
If we want to allow this pattern we write: I am also curious if such patterns are realistic. Because most patterns I saw before are either vertical one (A -> B -> C) or horizontal ones (A -> {B, B, B}). |
@sunggg Good question! To me, it is an undefined behavior in Relay that may or may not distinguish diamond or parallel expressions: import tvm
from tvm import relay
from tvm.relay.dataflow_pattern import *
is_conv2d_0 = is_op('nn.conv2d')(is_var(), is_var())
is_conv2d_1 = is_op('nn.conv2d')(is_var(), is_var())
"""
conv
/ \
\ /
add
"""
diamond_pattern = is_conv2d_0 + is_conv2d_0
"""
conv conv
\ /
add
"""
parallel_pattern = is_conv2d_0 + is_conv2d_1
conv2d_0 = relay.op.nn.conv2d(relay.var('input0'), relay.var('weight0'))
conv2d_1 = relay.op.nn.conv2d(relay.var('input1'), relay.var('weight1'))
out_parallel = conv2d_0 + conv2d_1
out_diamond = conv2d_0 + conv2d_0
print(out_diamond)
print(f'{diamond_pattern.match(out_diamond) = }, which should be True')
print(f'{diamond_pattern.match(out_parallel) = }, which should be False')
"""[Output]
%0 = nn.conv2d(%input0, %weight0, padding=[0, 0, 0, 0]);
add(%0, %0)
diamond_pattern.match(out_diamond) = 1, which should be True
diamond_pattern.match(out_parallel) = 0, which should be False
"""
print('==============================================================')
print(out_parallel)
print(f'{parallel_pattern.match(out_parallel) = }, which should be True')
print(f'{parallel_pattern.match(out_diamond) = }, which should be False')
"""[Output]
%0 = nn.conv2d(%input0, %weight0, padding=[0, 0, 0, 0]);
%1 = nn.conv2d(%input1, %weight1, padding=[0, 0, 0, 0]);
add(%0, %1)
parallel_pattern.match(out_parallel) = 1, which should be True
parallel_pattern.match(out_diamond) = 1, which should be False
""" That's why I think it is important to make the semantics clear. And the semantics should be simple and natural that each pattern variable is unique. |
@ganler I see. It seems like we are based on different assumptions about whether a fused kernel allows multiple outputs (potentially intermediate output) or not. If I remember correctly such multiple outputs are possible, so we won't need such duplication. Maybe it is worth verifying with AutoTIR folks about this - cc. @junrushao1994 @LeshengJin @Hzfengsy @jinhongyii Another context for pattern matching is to use it for BYOC graph partitioning. It allows the multiple outputs for each region (See top green box in Step 3. Its region has two outputs) - link. And thanks for verifying my old memory! :) |
Thanks for the proposal. I'm wondering whether this pattern language supports the following cases. Those are the cases that Relay pattern language doesn't support well and have been bother me for a while:
|
@sunggg Thanks for letting me know that one fused kernel may produce multiple outputs. I think something like |
@comaniac Thanks for the great suggestion!
I think the fuzzy matching proposal is great! More generally, we should give users options to "skip" certain operators, e.g., gemm_add = is_op("gemm") >> is_op("add")
gemm_add.match(..., fuzzy_ops=[op.cast, op.cast_like, op.broadcast_to, op.broadcast_to_like])
Yes, I think current pattern matching should allow matching expressions with one or more outputs. To fuse Conv
|
Relu
/ \
Softmax Add |
Thanks for feedbacks from the Relax Open Meeting! After some further discussions with @YuchenJin, the following improvements are expected to be included:
# [GRAPH A]
# conv
# |
# bias
# / \
# relu ...
#
# [GRAPH B]
# ... -> conv -> bias -> conv -> ...
# Graph A is allowed as by default `bias` allows external use.
is_op("conv") >> is_op("bias") >> is_op("relu")
# Graph A is NOT allowed as by `bias` << `relu` means "only used by".
is_op("conv") >> is_op("bias") >> is_op("relu")
# Graph A is allowed as `bias` must be used by external node.
# Graph B is rejected as `bias` must be used by external node.
is_op("conv") >> is_op("bias").extern_use(True) > is_op("relu")
# Graph A is rejected as `bias` must NOT be used by external node.
# Graph B is allowed as `bias` must NOT be used by external node.
is_op("conv") >> is_op("bias").extern_use(False) > is_op("relu")
# Or if we simply want to assume that non-leaf nodes in the
# graph have no external use.
pattern = ... # any patter in the pattern graph.
pattern.add_graph_constraint(NonLeafExternUse(True))
# or
pattern.add_graph_constraint(NonLeafExternUse(False))
# ORIGINAL: relu -> reshape -> reshape -> reshape -> relu
func = ...
merged = MergeReshape()(func) # one step by default
# ONESTEP: relu -> reshape -> reshape -> relu
merged = MergeReshape()(func, iterative=True)
# ITERATIVE: relu -> reshape -> relu
class FuseCBR(DFPatternCallback):
def __init__(self):
super().__init__()
c = is_op("conv2d").with_attr("kernel_size", (1, 1))
b = is_op("bias")
r = is_op("relu")
self.pattern = c >> b >> r # CBR pattern
def callback(node_map: Dict[DFPattern, VarBinding]) -> List[Tuple[Pattern, Expr]]]:
# Access everything through `node_map` (Pattern -> Correponding VarBinding).
assert len(outputs) == 1
input_args = node_map[c].args
output_type = node_map[r].type_annotation
return [
(node_map[r], call_tir("cbr1x1", input_args, output_type))
# All external uses to `relu` will be redirected to `cbr1x1`.
]
|
Most are implemented thanks to @ganler |
This thread discusses the design of Relax Pattern Language.
🎯 Progress and Questions
This section is pinged to show current progress. Please sklip this section if you are new here.
Design Updates/Patches
Implementation Progress
>>
(only used by),^
(used by),dup
,fork_to
.Open Questions
is_var
meansVarPattern
) but could be confusing to beginners asis<T>
usually means a function signature ofbool(...)
. Can we have better naming practice to make pattern definitions short yet easy-to-understand? For example, how about usingdp.var()
instead ofis_var
wheredp
meansdataflow_pattern
?Motivation
The many existing graph-level optimizations require looking up a specific (sub-)graph pattern (e.g.,
Conv->Bias->ReLU
) and then rewriting the graph towards a more efficient one. Potential usages include:Such usages can be prevalent:
Relay’s pattern language matches
relay.Expr
with a series of patterns. As we are now developing pattern language for Relax, we want to adapt features in Relax (e.g.,RuntimeDepShape
) and make defining topological dataflow relations easier (not (only)Dominator
pattern), towards high scalability in terms of writing graph-level passes.Goals:
torch.fx
)Design through Examples
To better understand the examples below, it is highly recommended to read a few examples in Relay’s pattern language.
Use Dependency
Compared with Dominator analysis, use-def chain information can be more powerful, fine-grained, and lightweight in describing graph patterns.
To fuse or dispatch
Conv-ReLU-Add
, one may write:Despite the syntax, it is actually incorrect: what if the output of
relu
output is consumed by operators other thanadd
? To not violate the graph semantic, we either yield the fusion chance or duplicateConv-ReLU
. It is a concern in XLA (See Page 20) but not for TVM as TVM usually disallows replication (fuse_ops
) by requiring a domination relation, i.e.,add
must post-dominateconv
andrelu
(see also 2nd code block of Relay’s Pattern Matcher).Using dominator, however, can be unnecessary and limited to representing such patterns correctly. Here is a counter-example suggested by @sunggg:
Things can be much easier if we describes patterns with use dependency.
oub
stands for “Only Used By” thatA.oub(B)
means B is the only consumer of A.oub
”syntax sugar” also makes things easier as often we don’t care about the argument position but just the dependency relation.One-One Match
Expr
can only be matched by one pattern object.Expr
Unfortunately, this part is not very explicit in Relay (can be bugs or undefined behavior). See #160 (comment).
TVM OpFusion
The three fusible patterns in TVM (Figure Source).
If implement this in Relay, we need to handle argument positions for injective operators and put dominance relations.
TensorRT CBR
TensorRT Parallel CBR
Parallel conv2d with the same attributes can be merged in a big kernel.
New syntax sugar
[self].fork_to([vars, ...])
:vars
share the same producerself
.vars
are all (immediate) users toself
.Allowing General Parallel Patterns
The example above is actually impractical:
Thus we add one more line:
Self-attention Fusion
Reference: Fig 5a in NVIDIA Blog
New syntax sugar
[self].uniq()
:[self]
is a value only used by oneExpr
| pattern object.[self]
is used by a secondExpr
| pattern object, an exception will be raised.Fuzzy Matching
Suggested by @comaniac (#160 (comment)), sometimes we want to match patterns like
gemm -- (cast -- ) add
thatcast
may or may not be present (other examples can bebroadcast_to
).To generally support this feature, we should allow user to "skip" certain operators when doing pattern matching.
CBR Rewriting
The rewriting style is similar to Relay’s, but allows multiple outputs to support parallel patterns (In Relay, the previous design and implementation makes it hard to support parallel outputs. See this thread).
Syntax Sugar
The discussed ones include:
[producer].oub([consumer]) -> [producer]
[source].fork_to([paralle_heads, ...]) -> [source]
[self].attr_eq([others, ...]) -> [self]
attr_eq([exprs, ...]) -> [exprs, ...]
[self].uniq() -> [self]
Pipe-like syntax sugar to make tree-like pattern definition easier. Inspired by Pipes (CPPCON’20). The examples are both C++- and Python-compatible.
Sequential API a special class to handle sequence patterns, co-working with Pipes API.
a > b
: a is used by b; returnsSeq(a, b)
;a >> b
: a is only used by b; returnsPdomSeq(a, b)
;Seq([vars, ...])
defines sequential dependency alongvars
;PdomSeq([vars, ...])
(huh... better naming?) further requires successor idom predecessor;Other syntax sugar follow Relay’s style (e.g.,
*
meansis_op("relax.multiply")
).The text was updated successfully, but these errors were encountered: