-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Unity][MSC] Avoid depending on trivial bindings in Relax intermediate #16349
[Unity][MSC] Avoid depending on trivial bindings in Relax intermediate #16349
Conversation
The conversion from tensorflow to MSC is done by first converting from tensorflow to relay, then converting from relay to executable python code, executing that python code to generate relax, and finally converting from relax to MSC. During the relax phase of this conversion, some relax `IRModule` are applied, including `FuseOpsByPattern`. The test cases in `test_msc/test_translate_tensorflow.py` rely on `FuseOpsByPattern` preserving trivial bindings (e.g. `var_1 = var_2`) in the relax IRModule. If these trivial bindings are removed by `CanonicalizeBindings`, then the test cases in this file fail. The presence or absence of trivial bindings `FuseOpsByPattern` should be considered an implementation detail, and relax passes should not be required to preserve trivial bindings. This PR updates the relay to executable python step of the tensorflow to MSC conversion, to remove trivial bindings and output a variable name that matches the expected value in the test case. While not an ideal resolution, as other variable name changes could still reintroduce the same test failures, it ensures that `FuseOpsByPattern` may canonicalize bindings as an internal pre- or post-processing step without breaking these unit tests.
@@ -105,6 +105,7 @@ void RelaxCodeGen::CodeGenGraph() { | |||
} | |||
} | |||
stack_.func_call("block_builder.emit_output", idx_exit).call_arg(idx_exit); | |||
stack_.call_arg(DocUtils::ToStrDoc(e->name), "name_hint"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A PR is just merged (hours ago), which change DocUtils::ToStrDoc -> DocUtils::ToStr. I expand some DocNodes for plugin building, and the DocUtils apis are simplified. That happened to conflict with your changes, please rename this ToStrDoc ->ToStr, and I'll wait this PR merged before adding plugin builder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, and conflicts are now resolved.
In looking at the three failing unit tests in CI, I ended up preserving the existing names, and instead avoiding the existence of the trivial bindings by using non-dataflow bindings. From a cursory reading of the MSC conversion steps, the dataflow block doesn't seem required, and removing it simplifies the Relax variable naming considerably. Is my understanding correct, and is there anywhere that relies on the presence of a dataflow block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the unit tests have quite a bit of dependency on the exact functional form of the relax functions generated by the MSC to relax conversion.
- If
R.shape
orR.const
appears as its own binding, or as part of an expression. - If trivial bindings are preserved.
- If a
R.const
appears in areturn
statement, or in a binding.
None of these impact the semantic meaning of the relax compute graph, but they are causing failures for assert_structural_equal
, because, well, they aren't structurally equal. Normally, because it isn't important to preserve these details in a round-trip, I'd adjust the test cases to avoid irrelevant details. However, because the test cases are generated using tvm.relax.frontend.torch.from_fx
utility, this can't be done either.
So, the design of the test cases requires that tvm.relax.frontend.torch.from_fx
and tvm.contrib.msc.framework.tvm.codegen
have the exact same behavior when outputting trivial variable bindings. The use of FuseOpsByPattern
by tvm.contrib.msc.framework.tvm.codegen
requires that FuseOpsByPattern
preserve those trivial bindings. This is far tighter coupling that these components should have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- About codegen the dataflow: I haven't test the codegen without dataflow() scope. I build the codegen of relax like: https://github.com/apache/tvm/blob/unity/python/tvm/relax/frontend/torch/fx_translator.py#L1471-L1473. The from_fx function use a dataflow as the scope for the model, so I build the codegen likely. I think removing the dataflow() only have influence on relax codegen, thus test_translate_relax.py, test_translate_relay.py, test_runner.py and test_manager.py may be influenced.
2.Yes, the assert_structural_equal may be too strict for testing. These failures can be fix by adding a build_target, like: https://github.com/apache/tvm/blob/unity/tests/python/contrib/test_msc/test_translate_relay.py#L164, then the test will only compare the results without using assert_structural_equal. The use of assert_structural_equal is mainly for saving test time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that was my worry, and what I was trying to avoid doing. The CI in the main branch gets really slow from time to time as too many unit tests require running the full compilation stack, rather than just running the specific functionality being tested. Probably something to circle back to at some point.
For now, the tests in test_translate_relax.py
are updated to validate that the round trip of relax -> MSC -> relax has no impact on the output of the compiled module.
…gs_in_test_pr_16349
The potential for duplicate variable names was introduced by having the `block_builder.emit_output` call, which is only required to export values from a dataflow block. The dataflow block is not used in any later MSC conversion, and its removal avoids this re-export of variables. If the dataflow block is required in the future, it can be generated using `tvm.relax.transform.ConvertToDataflowBlock`.
@Archermmt With CI now passing, can you review? |
Seems ok to me. Please ask @Hzfengsy to merge. |
The conversion from tensorflow to MSC is done by first converting from tensorflow to relay, then converting from relay to executable python code, executing that python code to generate relax, and finally converting from relax to MSC. During the relax phase of this conversion, some relax
IRModule
are applied, includingFuseOpsByPattern
.The test cases in
test_msc/test_translate_tensorflow.py
rely onFuseOpsByPattern
preserving trivial bindings (e.g.var_1 = var_2
) in the relax IRModule. If these trivial bindings are removed byCanonicalizeBindings
, then the test cases in this file fail. The presence or absence of trivial bindingsFuseOpsByPattern
should be considered an implementation detail, and relax passes should not be required to preserve trivial bindings.This PR updates the relay to executable python step of the tensorflow to MSC conversion, to remove trivial bindings and output a variable name that matches the expected value in the test case. While not an ideal resolution, as other variable name changes could still reintroduce the same test failures, it ensures that
FuseOpsByPattern
may canonicalize bindings as an internal pre- or post-processing step without breaking these unit tests.