[SLM] Allow modules to define pre-processing of weights #16785

Lunderberg · 2024-03-25T17:57:14Z

Prior to this commit, the weights used by nn.Module instances were required to be nn.Parameter instances. This commit allows the weights to instead be nn.Tensor instances, defined in terms of other nn.Parameter weights. This allows a model to define both the original weights that would be present in an external checkpoint (e.g. a Pytorch or Safetensors file), and the
pre-processing that should be performed on those weights.

This is a re-implementation of #16757, which was reverted in #16777. The re-implementation preserves the handling of dynamic shapes specified as python strings, enabling the test cases that were added in #16784.

If there is a trivial binding of `Var = DataflowVar`, but the non-dataflow variable is never used outside the dataflow block in which is is declared, then we should keep the name of the upstream `DataflowVar`, as it is more likely to be the human-readable name (e.g. a function parameter).

Follow-up to apache#16777, add unit tests demonstrating desired behavior.

…r_16783' into slm_add_unit_tests_for_nn_exporter

Prior to this commit, the weights used by `nn.Module` instances were required to be `nn.Parameter` instances. This commit allows the weights to instead be `nn.Tensor` instances, defined in terms of other `nn.Parameter` weights. This allows a model to define both the original weights that would be present in an external checkpoint (e.g. a Pytorch or Safetensors file), and the pre-processing that should be performed on those weights. This is a re-implementation of apache#16757, which was reverted in apache#16777. The re-implementation preserves the handling of dynamic shapes specified as python strings, enabling the test cases that were added in apache#16784.

Lunderberg · 2024-03-25T17:58:41Z

This PR is currently marked as a draft, as it depends on a bugfix applied in #16783, and the unit tests added in #16784.

slyubomirsky

I am not fully versed on SLM but I did not see anything objectionable in these changes. Some of the TODOs would be good to keep an eye on.

slyubomirsky · 2024-03-26T01:32:24Z

python/tvm/relax/frontend/nn/core.py

+            # TODO(Lunderberg): Make this easier to call.  Infering
+            # struct info for a nested expression should be doable in
+            # a free function, without requiring an active
+            # BlockBuilder and an active FunctionFrame.


Yeah... we might be able to decouple some of the functionality and allow for calling things like that separately.

Long-term, I think it would be nice to distinguish between local struct inference and non-local struct inference. The local inference could be applied when a relax object is constructed, which would avoid the current two-phase initialization of relax objects. Since this step can only perform local struct inference, which would be applied by default, this entire conditional could be removed.

There's some kinks that would need to be worked out first. Some of the struct inference for tensor operations currently throw errors a bit more than I think they should. (e.g. If R.matmul throws an exception if the arguments are not R.Tensor. If the arguments are R.Object, the exception is still thrown, even though R.Tensor is a subtype of R.Object.) These fallbacks would probably get more exercise with local inference, as there may be less information available.

slyubomirsky · 2024-03-26T01:34:29Z

python/tvm/relax/frontend/nn/exporter.py

+                # TODO(Lunderberg): Make a `ir.transform.ConvertSSA`,
+                # similar to the existing `tir.transform.ConvertSSA`,
+                # that converts an entire module to SSA, including TIR
+                # variable definitions used in either TIR or Relax.


What would this conversion do on the Relax side? I thought vars already had exactly one point of definition in Relax.

Both Relax and TIR require SSA to be well-formed. However, there's a number of cases where a module could be unambiguously converted to SSA. (e.g. Two functions use the same relax.Var as a parameter, which can be fixed by substituting a new variable in one of the functions.)

So, it wouldn't be a pass that would be called directly by end users, but would be for internal use. If a pass is most easily written in a way that results in the same symbolic variable occurring in multiple different functions, then this would be used as a post-processing pass. (e.g. Apply BindSymbolicVars to one variable in a function, then save the result as a new function in the same IRModule. Useful, but would duplicate all other symbolic variables.)

slyubomirsky · 2024-03-26T01:35:51Z

python/tvm/relax/frontend/nn/exporter.py

-            new_shape = [_get_var(x) if isinstance(x, tir.Var) else x for x in param.shape]
-            var = core.Tensor.placeholder(new_shape, param.dtype, name)._expr
+            new_shape = [_normalize_dim(dim) for dim in param._shape]
+            # var_cls = rx.DataflowVar if mode == "packed" else rx.Var


Is this line meant to be used at any point?

Whoops, that was a test during dev work. Removing the commented-out var_cls line.

slyubomirsky · 2024-03-26T01:50:52Z

python/tvm/relax/frontend/nn/op.py

@@ -676,12 +676,31 @@ def permute_dims(x: Tensor, axes: Optional[List[int]] = None, name: str = None)
    result : Tensor
        The transposed result.
    """
+
+    # TODO(Lunderberg): This is a more extensive auto-naming than
+    # intended here.  Is this still worth it?


Do we expect these chains of definitions to be deep? If they can be, this might be undesirable.

Long-term, I want to move this automatic naming from the nn.Module side to the Relax side, since it could then be performed after removal of trivial bindings. I don't expect these chains to be deep, as it only tracks trivial bindings. The trivial binding from the Relax function parameter to the parameter's param._expr field should be the only one that would be tracked.

slyubomirsky · 2024-03-26T02:05:59Z

tests/python/relax/test_frontend_nn_packing.py

+            # Relax variable names may contain '.' even though it
+            # cannot be expressed in TVMScript.


I wonder if this is something we should just check for and prohibit.

I could go either way. It's nice to have the 1:1 mapping between Relax and TVMScript, which would forbid the period within a relax variable name. However, it's also nice to have a 1:1 mapping between a Relax function parameter and weight tensor's name in a pytorch or safetensor file, which are usually written with a period in the name.

Lunderberg · 2024-03-28T12:54:07Z

I've done some experimenting, and I think this use case may be handled with @property instead. For the unit test test_generate_parameters, this would then generate self.gate_up_proj when accessed, rather than as part of the __init__ method. Since the access of self.gate_up_proj occurs during the forward function, after all overwrites of param._expr have been applied, it wouldn't need the extensions to nn.Module.export_tvm added in this PR.

Pros:

Would be compatible with the current nn.Module.export_tvm, which overwrites param._expr.
Would be compatible with the current nn.Module.to implementation, which overwrites param._expr.

Cons:

Requires LiftTransformParams as a post-processing step. Would not allow the SLM to Relax conversion to generate the transform_params function directly.

Lunderberg · 2024-05-09T14:33:58Z

Closing this PR, as I think the use of LiftTransformParams provides a better long-term solution, especially with the improvements in #16717. (Thank you again to @zxybazh for the implementation!)

Lunderberg added 5 commits March 25, 2024 16:32

[SLM] Add unit tests for SLM to Relax exporter

c4f9aae

Follow-up to apache#16777, add unit tests demonstrating desired behavior.

Merge branch 'relax_improve_canonicalize_bindings_dataflow_handling_p…

b8b8afd

…r_16783' into slm_add_unit_tests_for_nn_exporter

Merge branch 'slm_add_unit_tests_for_nn_exporter_pr_16784' into HEAD

f7a9f2e

Lunderberg marked this pull request as draft March 25, 2024 17:57

This was referenced Mar 25, 2024

[SLM] Provide consistent dynamic shape when specified as python string #16781

Closed

[SLM] Allow modules to define pre-processing of weights #16757

Merged

slyubomirsky approved these changes Mar 26, 2024

View reviewed changes

Remove commented-out code

07fdb6e

Lunderberg closed this May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SLM] Allow modules to define pre-processing of weights #16785

[SLM] Allow modules to define pre-processing of weights #16785

Lunderberg commented Mar 25, 2024

Lunderberg commented Mar 25, 2024

slyubomirsky left a comment

slyubomirsky Mar 26, 2024

Lunderberg Mar 26, 2024

slyubomirsky Mar 26, 2024

Lunderberg Mar 26, 2024

slyubomirsky Mar 26, 2024

Lunderberg Mar 26, 2024

slyubomirsky Mar 26, 2024

Lunderberg Mar 26, 2024

slyubomirsky Mar 26, 2024

Lunderberg Mar 26, 2024

Lunderberg commented Mar 28, 2024

Lunderberg commented May 9, 2024

		# Relax variable names may contain '.' even though it
		# cannot be expressed in TVMScript.

[SLM] Allow modules to define pre-processing of weights #16785

[SLM] Allow modules to define pre-processing of weights #16785

Conversation

Lunderberg commented Mar 25, 2024

Lunderberg commented Mar 25, 2024

slyubomirsky left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Lunderberg commented Mar 28, 2024

Lunderberg commented May 9, 2024