Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Relay] Remove DynamicToStatic pass from graph runtime build #10691

Merged
merged 8 commits into from
Mar 23, 2022

Conversation

masahi
Copy link
Member

@masahi masahi commented Mar 20, 2022

Closes #10692

To solve this problem, we can either remove this pass from relay.build(...) pipeline or run DynamicToStatic in both VM and non-VM paths. I propose to remove it because (1) usually DynamicToStatic is supposed to be applied after model import and (2) the only case running DynamicToStatic during relay.build(...) helps is when the input is entirely static but a frontend fails to produce a static mod AND a user forgets to run DynamicToStatic after model import.

I hope the latter case happens rarely but if not, that's something we should fix in the frontend side. We should avoid relying on DynamicToStatic that runs during relay.build(...) since not all use cases of TVM use relay.build(...) (BYOC, for example).

@masahi
Copy link
Member Author

masahi commented Mar 22, 2022

I realized that this is not specific to meta scheduler (the one I tested), but the same error could happen for auto scheduler as well since it also relies on the structure of the input relay subgraph for a workload lookup.

All tests have passed without DynamicToStatic in relay.build(...). As the diff shows, this investigation uncovered a few cases where frontends are introducing dynamic ops carelessly, or some TF frontend tests are using graph runtime for a model containing dynamic inputs (but just happen to be working thanks to DynamicToStatic since dynamic-inputs are bound to a constant tensor before relay.build(...), and after the frontend conversion. This is a very strange usage and they should be using VM in the first place).

So I believe it is better to remove DynamicToStatic in relay.build(...) to prevent those sloppy coding in the frontends. But if people instead prefer running DynamicToStatic in the VM path as well, it's fine for me to do that. But I'd like to be convinced that such use of DynamicToStatic really solves non-trivial problems. Let me know your thought @mbrookhart @tkonolige @comaniac @zxybazh

@@ -846,7 +846,7 @@ def convert_shape(self, op):
input_tensors = self.get_input_tensors(op)
assert len(input_tensors) == 1, "input tensors length should be 1"

out = _op.shape_of(self.get_tensor_expr(input_tensors[0]))
out = shape_of(self.get_tensor_expr(input_tensors[0]))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR, tflite mobilenet was returning a dynamic shape output 🤦‍♂️

pass_seqs.push_back(transform::CombineParallelConv2D(3));
pass_seqs.push_back(transform::CombineParallelDense(3));
pass_seqs.push_back(transform::CombineParallelBatchMatmul(3));
pass_seqs.push_back(transform::FoldConstant());
pass_seqs.push_back(transform::FoldScaleAxis());
pass_seqs.push_back(transform::SimplifyExpr());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that SimplifyExpr practically depends on FoldConstant be applied beforehand. So I swapped the order.

if default_value == None:
output = tf.sparse_to_dense(indices, oshape, values)
compare_tf_with_tvm(
[sparse_indices, sparse_values], ["indices:0", "values:0"], output.name
[sparse_indices, sparse_values], ["indices:0", "values:0"], output.name, mode="vm"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was previously working on the graph runtime just because the dynamic input is bound to a constant tensor before relay.build(...) in TF tests. Such usage of dynamic inputs makes no sense so I just changed it to use vm for testing.

np_data,
"",
["UniqueWithCounts:0", "UniqueWithCounts:1", "UniqueWithCounts:2"],
mode="vm",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unique is naturally a dynamic op, but for some reason this test was running on the graph runtime and it happens to be working just because the dynamic input is bound to a constant tensor before relay.build(...). So the test was effectively running unique(const_tensor), which is not really useful.

@tkonolige
Copy link
Contributor

I am in favor of remove DynamicToStatic from the default set of passes.

@junrushao junrushao merged commit 4c608be into apache:main Mar 23, 2022
pfk-beta pushed a commit to pfk-beta/tvm that referenced this pull request Apr 11, 2022
…10691)

Closes apache#10692

To solve this problem, we can either remove this pass from `relay.build(...)` pipeline or run `DynamicToStatic` in both VM and non-VM paths. I propose to remove it because  (1) usually `DynamicToStatic` is supposed to be applied after model import and (2) the only case running `DynamicToStatic` during `relay.build(...)` helps is when the input is entirely static but a frontend fails to produce a static mod AND a user forgets to run `DynamicToStatic` after model import.

 I hope the latter case happens rarely but if not, that's something we should fix in the frontend side. We should avoid relying on `DynamicToStatic` that runs during `relay.build(...)` since not all use cases of TVM use `relay.build(...)` (BYOC, for example).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] DynamicToStatic() in the non-VM build path causes ApplyHistoryBest to fail
4 participants