-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BYOC] Handle constants in IRModule-at-a-time external codegen #11770
Conversation
tests/python/frontend/pytorch/test_forward.py::test_argsort appears to be a flake. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- A minor suggestion to use
value_or
, rather than supplying a default value and having to do.value()
:lowered_mod->GetAttr<Map<String, runtime::NDArray>>(tvm::attr::kConstNameToNDArray)->value_or({})
(not sure if{}
thing works) - Split inline stuff into other PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just did a quick review before i ran out the door. main question here: is there a test that actually populates kConstNameToNDArray attr and watches it come out the end of the compiler?
Thanks @masahi & @areusch, PTAL. Re value_or: done, thanks, that's much nicer. |
19b33a4
to
cd1318b
Compare
i hate to be a stickler, but i feel like something should test this in CI. would such a fake module become useful in other compiler testing? |
I tried to do to the TensorRT integration what apache#11631 did to the CUTLASS integration, viz: - Make sure all compilation options are passed in Target instances. This helps Collage. - Use a custom pass invoked via RelayToTIRTargetHooks instead of the relay.ext.$toolchain mechanism. This helps use decouple external codegen from lowering. This PR collects the prep for that change: - TensorRT uses the JSONSerializer visitor to encode each partition function. Previously, when the visitor encountered a Constant it simply generated and recorded a name for the constant. Then, completely separately, and via a callback in TECompiler, the function is visited again in the same order and with the same name generation convention by a ConstantUpdater to actually collect the bindings, which are then encoded into a ConstLoaderModule to be made available at runtime. However if all TensorRT compilation is to be done by a stand-alone pass there's no TECompiler callback hackery available. So I've added a "const_name_to_ndarray" attribute to the IRModule of type Map<String, runtime::NDArray> so that named constants can be accumulated throughout compilation by any pass which needs to do so. Then the Graph, AOT and VM executors are all updated to merge those constants into the final runtime artifact (Compare with "Constants", the equivalent attribute for extracting TIR AllocateConsts.) - The TensorRT tests use the create_executor interface but it wasn't quite ready for the new more general form of passing list-of-targets. - I want TensorRT compilation to work out of the box without the need for any special targets if all the default options should apply. Go back and make the CUTLASS integration I did follow the same convention. - To test this I also switched the 'demo' "ccompiler" external codegen target to IRModule-at-a-time style. This means we can test most of external codegen machinery in one place without depending on any target which may not be enabled in CI (eg TensorRT): - Target instances are plumbed correctly so compile-time options are available. - External modules are conveyed to the final export library. - Constant bindings are conveyed to the metadata module.
818493d
to
ea69f71
Compare
It turns out switching the 'ccompiler' demo external codegen to use target hooks is enough to exercise all this, all be it the PR has grown quite a bit. Let's see what CI makes of it since probably broken something else. |
PTAL @areusch |
…e#11770) I tried to do to the TensorRT integration what apache#11631 did to the CUTLASS integration, viz: - Make sure all compilation options are passed in Target instances. This helps Collage. - Use a custom pass invoked via RelayToTIRTargetHooks instead of the relay.ext.$toolchain mechanism. This helps use decouple external codegen from lowering. This PR collects the prep for that change: - TensorRT uses the JSONSerializer visitor to encode each partition function. Previously, when the visitor encountered a Constant it simply generated and recorded a name for the constant. Then, completely separately, and via a callback in TECompiler, the function is visited again in the same order and with the same name generation convention by a ConstantUpdater to actually collect the bindings, which are then encoded into a ConstLoaderModule to be made available at runtime. However if all TensorRT compilation is to be done by a stand-alone pass there's no TECompiler callback hackery available. So I've added a "const_name_to_ndarray" attribute to the IRModule of type Map<String, runtime::NDArray> so that named constants can be accumulated throughout compilation by any pass which needs to do so. Then the Graph, AOT and VM executors are all updated to merge those constants into the final runtime artifact (Compare with "Constants", the equivalent attribute for extracting TIR AllocateConsts.) - The TensorRT tests use the create_executor interface but it wasn't quite ready for the new more general form of passing list-of-targets. - I want TensorRT compilation to work out of the box without the need for any special targets if all the default options should apply. Go back and make the CUTLASS integration I did follow the same convention. - To test this I also switched the 'demo' "ccompiler" external codegen target to IRModule-at-a-time style. This means we can test most of external codegen machinery in one place without depending on any target which may not be enabled in CI (eg TensorRT): - Target instances are plumbed correctly so compile-time options are available. - External modules are conveyed to the final export library. - Constant bindings are conveyed to the metadata module.
…e#11770) I tried to do to the TensorRT integration what apache#11631 did to the CUTLASS integration, viz: - Make sure all compilation options are passed in Target instances. This helps Collage. - Use a custom pass invoked via RelayToTIRTargetHooks instead of the relay.ext.$toolchain mechanism. This helps use decouple external codegen from lowering. This PR collects the prep for that change: - TensorRT uses the JSONSerializer visitor to encode each partition function. Previously, when the visitor encountered a Constant it simply generated and recorded a name for the constant. Then, completely separately, and via a callback in TECompiler, the function is visited again in the same order and with the same name generation convention by a ConstantUpdater to actually collect the bindings, which are then encoded into a ConstLoaderModule to be made available at runtime. However if all TensorRT compilation is to be done by a stand-alone pass there's no TECompiler callback hackery available. So I've added a "const_name_to_ndarray" attribute to the IRModule of type Map<String, runtime::NDArray> so that named constants can be accumulated throughout compilation by any pass which needs to do so. Then the Graph, AOT and VM executors are all updated to merge those constants into the final runtime artifact (Compare with "Constants", the equivalent attribute for extracting TIR AllocateConsts.) - The TensorRT tests use the create_executor interface but it wasn't quite ready for the new more general form of passing list-of-targets. - I want TensorRT compilation to work out of the box without the need for any special targets if all the default options should apply. Go back and make the CUTLASS integration I did follow the same convention. - To test this I also switched the 'demo' "ccompiler" external codegen target to IRModule-at-a-time style. This means we can test most of external codegen machinery in one place without depending on any target which may not be enabled in CI (eg TensorRT): - Target instances are plumbed correctly so compile-time options are available. - External modules are conveyed to the final export library. - Constant bindings are conveyed to the metadata module.
…e#11770) I tried to do to the TensorRT integration what apache#11631 did to the CUTLASS integration, viz: - Make sure all compilation options are passed in Target instances. This helps Collage. - Use a custom pass invoked via RelayToTIRTargetHooks instead of the relay.ext.$toolchain mechanism. This helps use decouple external codegen from lowering. This PR collects the prep for that change: - TensorRT uses the JSONSerializer visitor to encode each partition function. Previously, when the visitor encountered a Constant it simply generated and recorded a name for the constant. Then, completely separately, and via a callback in TECompiler, the function is visited again in the same order and with the same name generation convention by a ConstantUpdater to actually collect the bindings, which are then encoded into a ConstLoaderModule to be made available at runtime. However if all TensorRT compilation is to be done by a stand-alone pass there's no TECompiler callback hackery available. So I've added a "const_name_to_ndarray" attribute to the IRModule of type Map<String, runtime::NDArray> so that named constants can be accumulated throughout compilation by any pass which needs to do so. Then the Graph, AOT and VM executors are all updated to merge those constants into the final runtime artifact (Compare with "Constants", the equivalent attribute for extracting TIR AllocateConsts.) - The TensorRT tests use the create_executor interface but it wasn't quite ready for the new more general form of passing list-of-targets. - I want TensorRT compilation to work out of the box without the need for any special targets if all the default options should apply. Go back and make the CUTLASS integration I did follow the same convention. - To test this I also switched the 'demo' "ccompiler" external codegen target to IRModule-at-a-time style. This means we can test most of external codegen machinery in one place without depending on any target which may not be enabled in CI (eg TensorRT): - Target instances are plumbed correctly so compile-time options are available. - External modules are conveyed to the final export library. - Constant bindings are conveyed to the metadata module.
I tried to do to the TensorRT integration what #11631 did to the CUTLASS integration, viz:
This helps use decouple external codegen from lowering.
This PR collects the prep for that change:
TensorRT uses the JSONSerializer visitor to encode each partition function. Previously, when the
visitor encountered a Constant it simply generated and recorded a name for the constant. Then,
completely separately, and via a callback in TECompiler, the function is visited again in the
same order and with the same name generation convention by a ConstantUpdater to actually collect the
bindings, which are then encoded into a ConstLoaderModule to be made available at runtime.
However if all TensorRT compilation is to be done by a stand-alone pass there's no TECompiler callback
hackery available. So I've added a "const_name_to_ndarray" attribute to the IRModule of type
Map<String, runtime::NDArray> so that named constants can be accumulated throughout compilation by
any pass which needs to do so. Then the Graph, AOT and VM executors are all updated to merge those
constants into the final runtime artifact
(Compare with "Constants", the equivalent attribute for extracting TIR AllocateConsts.)
The TensorRT tests use the create_executor interface but it wasn't quite ready for the
new more general form of passing list-of-targets.
I want TensorRT compilation to work out of the box without the need for any special targets if
all the default options should apply. Go back and make the CUTLASS integration I did follow the
same convention.
To test this I also switched the 'demo' "ccompiler" external codegen target to IRModule-at-a-time
style. This means we can test most of external codegen machinery in one place without depending on
any target which may not be enabled in CI (eg TensorRT):