[Relay] Merge analysis/context_analysis.cc and transforms/device_anno…

…tation.cc Currently LowerTEPass (backend/te_compiler.cc) is a 'special' pass because it depends on a side-input DeviceMap. We'd like to remove that side-input, and instead recover the Device (and, ultimately, Target) for each (fused) primitive call from the AST alone. By doing so we also avoid needing to perform device planning twice: - It needs to be done before lowering so we know which primitives need to be compiled for which devices. - It then needs to be re-done after lowering and optimization as a prelude to memory planning. By baking the device plan into the AST we can simply do device planning before lowering, and run memory planning later, both as ordinary passes. While working on that issue we realized we currently have 3 'device planners': - transforms/device_annotation.cc, which supports only a small subset of Relay and uses a simple top-down algorithm to assign a device to every sub-expression. - analysis/context_analysis.cc, which makes a galant effort to support most of Relay, is based on unification rather than a top-down algorithm, but handles higher order functions by ad-hoc and fragile inlining. - transforms/annotate_target.cc, which works on Targets instead of Devices, but is otherwise like 'device planning'. We'd like to bring these together. In this PR we introduce a new transforms/device_planner.cc intended to replace transforms/device_annotation.cc and analysis/context_analysis.cc. We don't delete those two just yet since we need to switch all users off of them in the next PR. We also leave transforms/annotate_target.cc alone pending a proper RFC to bring devices and targets together sensibly, but have it firmly in our sights. transforms/device_planner.cc is based on analysis/context_analysis.cc, but is heavily reworked to: 1. Handle starting from existing "on_device" annotations as well as existing "device_copy" calls. 2. Be idempotent, with the idea we'll probably need to re-run it to 'refine' device planning to account for storge scopes. 3. Robustly handle all of Relay, particularly higher-order functions. For that we replace the inlining approach in analysis/context_analysis.cc with a higher-order unification domain. 4. Be a little more systematic with defaulting. 5. Capture the result of the analysis within the AST as new "device_copy" calls at device boundaries, and new/replaced "on_device" calls wherever the device for a sub-expression is not already 'obvious' from the sub-expression's lexical scope. 6. Provide helper visitors for passes which need to ask for the device for any sub-expression they are processing and/or preserve device information on rewrites. Those passes include: - backend/aot_executor_codegen.cc (AOTOnDemandAllocator) - backend/graph_plan_memory.cc (StorageAllocaBaseVisitor etc) - backend/te_compiler.cc (LowerTensorExprMutator) - backend/vm/lambda_lift.cc (LambdaLifter) - transforms/memory_alloc.cc (DialectRewriter) - transforms/to_a_normal_form.cc (Fill) - backend/vm/compiler.cc (VMFunctionCompiler) However we won't change any of those in this PR. See the draft #8788 for the end game, I'll be peeling PRs out of that.
apache · Sep 22, 2021 · 2cc7e2f · 2cc7e2f
1 parent adf29f6
commit 2cc7e2f
Show file tree

Hide file tree

Showing 6 changed files with 3,730 additions and 4 deletions.
diff --git a/include/tvm/relay/transform.h b/include/tvm/relay/transform.h
@@ -444,6 +444,17 @@ TVM_DLL Pass RelayToTIRTargetHook();
  */
 TVM_DLL Pass ManifestAlloc(Target target_host, Map<tvm::Integer, tvm::Target> targets);
 
+/*!
+ * \brief Uses existing "on_device" and "device_copy" CallNodes to infer the device on which
+ * every Relay sub-expression should run (and the result stored). Captures the result of that
+ * analysis using new "on_device" and "device_copy" CallNodes. See
+ * tvm::relay::transform::{LexicalOnDeviceMixin,DeviceAwareExprVisitor,DeviceAwareExprMutator}
+ * for help recovering the device for an arbitrary sub-expression in downstream transformations.
+ *
+ * \param default_device_type DLDeviceType for default device.
+ */
+TVM_DLL Pass PlanDevices(DLDeviceType default_device_type);
+
 }  // namespace transform
 
 /*!

diff --git a/python/tvm/relay/transform/transform.py b/python/tvm/relay/transform/transform.py
@@ -1167,6 +1167,16 @@ def SimplifyExpr():
     return _ffi_api.SimplifyExpr()
 
 
+def PlanDevices(default_device):
+    """
+    Uses existing "on_device" and "device_copy" CallNodes to infer the device on which
+    every Relay sub-expression should run (and the result stored). Captures the result of that
+    analysis using new "on_device" and "device_copy" CallNodes. Note that the device_id of
+    the default_device is ignored.
+    """
+    return _ffi_api.PlanDevices(default_device)
+
+
 def FoldExplicitPadding():
     """
     FoldExplicitPadding finds explict padding before an op that can support

diff --git a/python/tvm/target/target.py b/python/tvm/target/target.py
@@ -525,7 +525,7 @@ def hexagon(cpu_ver="v66", **kwargs):
 
     # LLVM target string
     def create_llvm_target(cpu_ver, config):
-        """ Create LLVM target string. """
+        """Create LLVM target string."""
 
         target = " -mtriple=hexagon"
         mcpu = " -mcpu=hexagon" + cpu_ver
@@ -547,7 +547,7 @@ def create_target_features(config):
 
     # Simulator options string
     def create_sim_options(cpu_ver, config):
-        """ Create simulator option string. """
+        """Create simulator option string."""
 
         def validate_hvx_length(codegen_hvx, sim_options):
             if sim_options and "--hvx_length" in sim_options:
@@ -606,7 +606,7 @@ def validate_hvx_length(codegen_hvx, sim_options):
 
     # LLVM options string
     def create_llvm_options(cpu_ver, config):  # pylint: disable=unused-argument
-        """ Create LLVM options string. """
+        """Create LLVM options string."""
 
         llvm_options = config["llvm_options"]
 
@@ -620,7 +620,7 @@ def create_llvm_options(cpu_ver, config):  # pylint: disable=unused-argument
 
     # TVM target attributes string
     def create_tvm_options(cpu_ver, config):  # pylint: disable=unused-argument
-        """ Create TVM target features string. """
+        """Create TVM target features string."""
 
         features = {
             "link_params": "link-params",