-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Relax Upstreaming #89
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting this RFC. It has been long mentioned in the community in terms of "something coming up".
Generally, you'll find in my questions that I would like to hear more about the work done to evaluate what is the impact of bringing the proposed features using our already existing components. Please don't take that as vague criticism to the proposal being made.
The main concern I have is that, as a community, the long term cost of replacing an existing IR - as opposed to incrementally improving it - seems to be very high, and something we should be mindful.
Even if it is a long term plan to fully develop and integrate Relax, we can't deny the fact that in practice it also means we will need to maintain two IRs for some time.
My understanding is lots of work was already invested in prototyping Relax, and I appreciate that many improvements were probably done, but after reading the whole proposal I wanted to be open minded in the sense that we could, for example, as an alternative approach to switching IRs, to take the learnings from implementing Relax, and bring those to Relay, reducing impact in the whole stack.
|
||
# 2. **Motivation and goals** | ||
|
||
Relax is an effort within [TVM Unity](https://tvm.apache.org/2021/12/15/tvm-unity) that aims to evolve the graph-level IR to maximize **expressibility, performance, and portability** across today and tomorrow’s workloads. Relax has three key goals motivated by the TVM community’s needs, and lessons the community has learned in ML acceleration through years of using and developing TVM: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what performance
means in this context, specifically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance here means:
- Compilation performance: defined as the effectiveness of Relax’s optimization passes in transforming and optimizing Relax code.
- Runtime performance: defined as the runtime latency for various Relax workloads, whether they are subgraphs or end-to-end models.
It can be found in the Relax roadmap rfc: #69. Maybe I should attach a link to it in this RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to explain more about the Runtime performance
For a single operator or a single subgraph, relax uses Meta-schedule, which is the same as what relay does. However, relax gives us an opportunity to break the boundary between Graph-Op and low-level and enable optimization across the layer - for example, the layout rewrite (as I mentioned at https://discuss.tvm.apache.org/t/establish-tvm-unity-connection-a-technical-strategy/13344/4).
@jinhongyii has finished a prototype for automatic (data and weight) layout rewrite, which has about 10% speedup.
However, relay has no such ability to support it. Because:
- relay defines the layout at graph level and can not get feedback from the low-level operators
- We must register each possible layout to the relay operator, which is impossible.
|
||
# 2. **Motivation and goals** | ||
|
||
Relax is an effort within [TVM Unity](https://tvm.apache.org/2021/12/15/tvm-unity) that aims to evolve the graph-level IR to maximize **expressibility, performance, and portability** across today and tomorrow’s workloads. Relax has three key goals motivated by the TVM community’s needs, and lessons the community has learned in ML acceleration through years of using and developing TVM: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please shed some light on where these community needs where raised before? Is there any GitHub issue or discuss post complaining about the specific solutions provided by the proposed RFC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, I'll list some relevant discussion posts that I know of:
Needs for unified abstractions:
- Please see the comments left by community members on this thread: https://discuss.tvm.apache.org/t/establish-tvm-unity-connection-a-technical-strategy/13344
- https://discuss.tvm.apache.org/t/new-relay-operator-based-on-tensorir/11999
Needs for dynamic shapes support:
- https://discuss.tvm.apache.org/t/dynamic-input-output-shapes-batch-size/12689
- https://discuss.tvm.apache.org/t/does-tvm-support-dynamic-input-shape/11069
- https://discuss.tvm.apache.org/t/cannot-allocate-memory-symbolic-tensor-shape/12514
Needs for supporting dataflow block and handling side effect:
- **Minimize disruption:** This upstreaming should provide a **non-default** path to enable new capabilities for users/developers who are interested in what Relax brings, so it will not break the current default Relay flow. | ||
- **Minimize complexity:** This upstreaming should reuse existing TVM/Relay infrastructure as much as possible (for example IRModule, runtime Module, TOPI library, etc.) to avoid duplicated effort and code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a project, I think it creates some challenges on how to deal with new features being proposed and being introduced in Relay, by the established set of contributors that were not involved in Relax development.
While it is clear in the RFC that Relay and Relax would share some infrastructure, what I'm taking from this RFC, is that it will only set the core components for Relax, not having feature parity with Relay at this point, nor support in other components such as all runtimes.
How the proposed plan intends to deal with such new features being added in Relay, while the intention here it to - at some point - replace Relay with Relax?
## D0: ****Unified abstractions and optimizations across layers**** | ||
|
||
The first key design point is to allow the high-level graph IR to be able to directly interact and call into lower-level TensorIR and PackedFunc (TVM FFI). | ||
|
||
The TensorIR PrimFunc and many external libraries adopt a **destination-passing-style** (DPS) calling convention that both input and output are passed to the function as arguments, and the outputs are mutated directly inside the function: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify on why having "a graph IR to be able to directly interact and call into lower-level TensorIR and PackedFunc" is something we can't achieve with adding this feature on existing Relay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a significant technical reason for why such a feature would be difficult to add into Relay, which is that any call directly into a TIR function from Relay would need to add a type relation that describes the tensor shape requirements. Relax adds more capabilities for checking tensor shapes dynamically and is much more flexible in this regard than Relay. That said, I think "is making a new IR better than trying to add these features into the existing one?" is a good question to be asking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @slyubomirsky said, it's a decision if we should make a new IR or add features to Relay.
If we want to add it to relay, we should:
- add new AST nodes
- rewrite the build pipeline (i.e. we should first include tir function to the Module for further analysis)
- rewrite most of the passes (i.e. we need to do optimization with a Module that has both tir and relay function)
Based on that, making a new IR may be more reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment! In addition to @slyubomirsky and @Hzfengsy's great points, I will share my thoughts at the perspective of optimization and compilation pipeline.
Although it might have been possible, the interaction between graph IR and TensorIR/PackedFunc has been quite tricky in Relay world. This has caused significant difficulties and non-trivial engineering efforts, IMO. Here are some representative examples:
- In Relay, there has been no convenient way to optimize graph IR by using the feedback from low-level.
- If TensorIR performs layout transformation for a primfunc, its decision will affect other primfuncs as well. However, Relay cannot provide such feedback back to graph IR-level since two different IRs cannot co-exist.
- Graph-level tuning methods (e.g., TASO, Collage) need a capability to apply a set of passes to the part of the graph, compile/measure its performance, and provide the performance number as a feedback back to Graph-IR level to generate better candidates. Although this could be achieved by nontrivial engineering efforts, it would complicate the compilation pipeline and maintenance efforts. IMHO, joint-optimization across multiple graph tuners (e.g., TASO+Collage) would be practically impossible.
- Lowering should be done at once at the boundary between Relay and TensorIR and customizing lowering has been very challenging (e.g., partial/custom lowering).
- The main pipeline with
OpStrategy
has not been easy to customize and lower part of the graph for your own target, such as BYOC, while keeping other parts still in high-level IR. Therefore, people had to figure out the way to bypass it and apply their own lowering mechanism (e.g.,RelayToTIR
) that bypasses the main pipeline. - If you only want to apply certain schedule rules on the part of the Graph IR, you would need to lower those parts and apply schedule rules to them. However, such freedom has not been allowed for Relay main pipeline, so people had to find out workaround (e.g., use task extraction and find the primfunc among them. However, if extraction does not behave as users wanted, it would require extra engineering efforts).
- The main pipeline with
Since Relax unifies abstraction, it can deliver those functionalities as compiler passes while providing flexibility and customizability. For example, since both high-level and low-level IRs co-exist, if TensorIR performs optimization decision that may have global effect, like layout transformation, we can rewrite the graph-level IR accordingly to express such change and consider its global implication. Also, lowering can be implemented as a RelaxIR->TensorIR transformation pass. If you want to bring your own lowering mechanism, you can write a new pass. I expect you may be able to reuse most of the lowering machinery and only change the part about "how" you want to lower.
I would be happy to discuss further if you are interested in this direction. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the points @slyubomirsky @Hzfengsy and @sunggg. I won't have the time to reply to all comments today, but wanted to come back on this thread.
IMHO, joint-optimization across multiple graph tuners (e.g., TASO+Collage) would be practically impossible.
@sunggg I know what Collage is, but I'm not familiar with what is TASO in TVM. Can you give a little context about it and what is the expected use case involving TASO and Collage?
If we want to add it to relay, we should:
- add new AST nodes
- rewrite the build pipeline (i.e. we should first include tir function to the Module for further analysis)
- rewrite most of the passes (i.e. we need to do optimization with a Module that has both tir and relay function)
Based on that, making a new IR may be more reasonable.
@Hzfengsy I understand you point, but at the same time, in adding a new IR that plans to deprecate the existing IR with planned overlapping features, are we not basically doing the same things as pointed out on your list, just on a new IR perspective, which impacts - in the long term - all existing ASTs, pipelines and passes?
@YuchenJin, given the complexity of this RFC, I think it would be good to have it amended with a "Rationale and Alternatives" section, similar to what we have in the template: https://github.com/apache/tvm-rfcs/blob/main/0000-template.md#rationale-and-alternatives
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. In short, TASO is a graph-level tuning method that generates different forms of equivalent graph by trying out a set of rewriting rules (e.g., layout transformation, horizontal fusion, peephole optimization, ..). TVM provides some of those optimization heuristics, but they are not enabled by default, probably due to its potential side effects. TASO-like graph-tuners can be helpful in this direction.
TASO: https://cs.stanford.edu/~padon/taso-sosp19.pdf
Another graph tuner, but with equality saturation technique: https://arxiv.org/pdf/2101.01332.pdf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Relay, there has been no convenient way to optimize graph IR by using the feedback from low-level.
- If TensorIR performs layout transformation for a primfunc, its decision will affect other primfuncs as well. However, Relay cannot provide such feedback back to graph IR-level since two different IRs cannot co-exist.
- Graph-level tuning methods (e.g., TASO, Collage) need a capability to apply a set of passes to the part of the graph, compile/measure its performance, and provide the performance number as a feedback back to Graph-IR level to generate better candidates. Although this could be achieved by nontrivial engineering efforts, it would complicate the compilation pipeline and maintenance efforts. IMHO, joint-optimization across multiple graph tuners (e.g., TASO+Collage) would be practically impossible.
It's possible for an IRModule to contain both tir::PrimFunc
and relay::Function
at the same time which would allow the replacement of any given Call
operator inside of Relay with a lower level operator to trial as part of a scheduler - this seems to indicate it's entirely possible using an IRModule
to connect higher and lower level IR unless I'm missing something?
We've previously done this in CMSIS-NN, relay::Function
s become tir::PrimFunc
s as part of our custom lowering in RelayToTIR
which results in an IRModule
with some functions lowered to TIR and some remaining in Relay.
Lowering should be done at once at the boundary between Relay and TensorIR and customizing lowering has been very challenging (e.g., partial/custom lowering).
- The main pipeline with
OpStrategy
has not been easy to customize and lower part of the graph for your own target, such as BYOC, while keeping other parts still in high-level IR. Therefore, people had to figure out the way to bypass it and apply their own lowering mechanism (e.g.,RelayToTIR
) that bypasses the main pipeline.- If you only want to apply certain schedule rules on the part of the Graph IR, you would need to lower those parts and apply schedule rules to them. However, such freedom has not been allowed for Relay main pipeline, so people had to find out workaround (e.g., use task extraction and find the primfunc among them. However, if extraction does not behave as users wanted, it would require extra engineering efforts).
RelayToTIR
allows a Target
to customise how it decides to lower. Every Target
can register a RelayToTIR
hook which uses a set of compose-able Pass
es to create a per-Target
lowering pipeline; these Target
s would compose together to form the main pipeline. This indicates we have the capability to create more complex per-Target
lowerings of not just operators, but also groups of operators using RelayToTIR
already with the flexibility to improve any logic for a specific Target
or re-use existing Pass
es where necessary?
The limitation I can see with this is that trying different operator strategies within the per-Target
lowering isn't well supported? Could this not be supported by providing search information to RelayToTIR
? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for sharing your thoughts, @Mousius. It is interesting to learn that these problems has been attacked in BYOC world. Like @comaniac, @junrushao and others, I want to clarify I don't disagree that these functionalities can be achieved by extending current Relay framework. However, in my PoV, it is matter of amount of the engineering effort and future extensibility. Making this scale of change, which is related to the framework-level design philosophy, in the main pipeline while providing a guarantee for existing behavior and performance sounds extremely difficult, IMHO. Although I joined the Relax project in the middle, I believe these are parts of motivation to innovate new IR with those new design consideration and goals. Since Relax pipeline does not disturb Relay pipeline, I think both pipeline can evolve together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @leandron for the suggestion! We have added the Rationale and Alternatives section to the RFC, please take a look. :)
A **dataflow graph** means every operation inside is **side-effect free** and there are no **control flows** (such as if-then-else). A **dataflow block** is a way for us to mark the dataflow graph regions of the program in Relax. Specifically, all the operations under the dataflow block are side-effect-free and do not contain control flows (control flow is an advanced semantic that most pass writers do not consider). Outside a dataflow block, operations can contain side effects (for example doing in-place weight update during model training) and control flow. The program below is an example program that contains two dataflow blocks. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for providing the context on dataflow graphs, but can you also explain what work was done to compare the cost of introducing this as a new feature in a new IR, when compared to adding it to Relay? What are the fundamental issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Determining whether a given function is pure and control flow-free would not be difficult to do in Relay, or even marking portions of functions that are pure and control flow-free, would not be hard, but might require a new AST construct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think relay is pure except for Ref
s. And in practice they are not used because they are poorly supported by the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Ref
s and a few operators like dropout
. The main reason for Ref
to exist is for AD and training, so I imagine compiler support for such features would improve if training becomes a more important usage for TVM.
- AOT: AOT compilation has a wide range of benefits such as being more space efficient and is necessary for resource-constrained projects like uTVM. We are committed to continuously supporting the AOT compilation in Relax, and there is an ongoing effort to connect Relax to the current AOT executor. | ||
- BYOC: We will try to reuse the existing translation spec. In Relax, BYOC can be formalized as a pass that call external packed functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this RFC proposed an alternative to a fundamental part in TVM, it would be good to have some idea on how BYOC and AoT will be supported, as well as some idea of the timelines.
I understand that at the present RFC, Relax intends to be an optional compilation flow, but I think we can agree that we don't intend to maintain two overlapping IRs long term, as it would become too costly in terms of infrastructure.
Based on that, can you specifically clarify on whether there is an expected timeline for AoT and BYOC to be brought into Relax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the BYOC integration perspective, users will need to bring the codegen that converts Relax ops to BYOC representation (e.g., JSON) while re-using the runtime. We have a demo with TensorRT: https://github.com/tlc-pack/relax/tree/relax/src/relax/backend/contrib
Relax would provide more unified and organized support for BYOC. In Relay, although the functionalities are provided, IMHO, it has been quite fragmented so it has been tricky to use multiple BYOCs together with TVM's pipeline. So, in relax, our goal is to
(1) Support the existing functionalities
(2) Organize and simplify BYOC workflow
@YuchenJin can provide the timeline for this.
|
||
This RFC only focuses on the foundation part of Relax. After it, we will incrementally incorporate the additional capabilities and features. Relax aims to achieve parity with the functionality provided by Relay: this means that workloads which are functional on Relay will also be functional on Relax, even though the infrastructure underneath may change. | ||
|
||
Future plans that we will bring in future RFCs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently all the framework frontends support Relay. What's the plan with regards to Frontends and Relax? I think it should be covered in this section.
rfcs/0089-relax-upstreaming.md
Outdated
- `relax.runtime.builtin.alloc_heap(size) -> heap`: Allocate the heap (an NDArray) with a specific size to execute shape computation | ||
|
||
(We can use `alloc_tensor` to achieve the same goal) | ||
|
||
- `relax.runtime.builtin.store_shape(shape, heap, idx0, ...)`: Store a shape into specific indices in the shape heap. | ||
- `relax.runtime.builtin.load_shape(heap, idx0, ...) -> shape`: Construct a shape from the shape heap according to the indices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these runtime features seem pretty much Runtime independent, meaning they could be implemented in the existing runtimes. Taking Relay VM as an example, why couldn't this be implemented as a feature in the existing runtime?
|
||
# 5. **Future work** | ||
|
||
This RFC only focuses on the foundation part of Relax. After it, we will incrementally incorporate the additional capabilities and features. Relax aims to achieve parity with the functionality provided by Relay: this means that workloads which are functional on Relay will also be functional on Relax, even though the infrastructure underneath may change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for clarity: is the ambition for Relax to completely replace Relay at some point?
If yes, when in terms of timeline would you see Relax being feature compatible with Relay?
|
||
# 1. **Summary** | ||
|
||
This RFC proposes to upstream the core foundation of Relax (Relay Next). Relax is a new graph-level IR that enables new capabilities to address the [critical needs](https://discuss.tvm.apache.org/t/establish-tvm-unity-connection-a-technical-strategy/13344) identified by the TVM community over the years of using and developing deep learning compilers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this paragraph a bit more specific to list the critical needs raised by the community, that are addressed by Relax?
Reading the link attached, it is not clearly describing the need, as it is also a description of features and milestones in a broader context than Relax.
Hi @leandron, thanks for your feedback! :) We share a common goal of minimizing disruption while incrementally improving TVM. One of the main questions is how to bring in the improvements. That’s indeed what we have carefully thought about. One thing we found in building the unity connection is that we need to co-design things together. Take first-class symbolic shape support as an example: Suppose we want to apply a b: Tensor[(m, 224)]
a: Tensor[(m * 224, )] = flatten(b) In Relay, we have b: Tensor[(?, 224)]
a: Tensor[(?, )] = flatten(b) Without symbolic shape, we lose the shape relation between tensor Supporting this requires introducing native symbolic shape support as opposed to a separate mechanism. It's worth noting that from the example above, the shape mechanism in Relax is very different from what we currently have in Relay. Additionally, the other improvements also benefit from first-class symbolic shape. For example, Directly introducing this symbolic shape support to existing IR would mean one-stop transition to the current Relax, which is not incremental improvements as we hope for. Relax can be viewed as complementary to Relay. Relay focuses on high-level op transformations, while the current Relax passes focus on TIR-graph co-transformations that can enable flexible fusion and layout rewrite, which is hard to achieve in Relay. As of this RFC, we do not seek to change the default build pipeline or replace Relay. In this RFC, we only introduce Relax as an optional component for those community members who need it. It is a common practice in other ML frameworks, for example, PyTorch brought in TorchFX as an optional (vertical) component to support graph exporting, while maintaining TorchScript. We totally agree that TVM default flow evolution is important, and we should carefully discuss that with the community in future RFCs. Evolving default build has been briefly discussed in Establish TVM Unity Connection — A Technical Strategy, and there will be an upcoming tvm.compile RFC to discuss the long-term strategy to consolidate default build flows in TVM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the RFC, Relax is an impressive piece of work and the features introduced here would be a great addition to TVM's compilation flow!
I will need to spend some more time wrapping my head around Relax, but I did an intial pass and left some comments. I tried not repeating Leandro too much in this review, but I want to make it clear that I do share the same concerns.
I think Relax is very cool (I especially like the dynamic shape support), but from this RFC it is not clear to me what are the technical reasons for not extending Relay with the new features developed through prototyping Relax. A new IR that doesn't represent a different level of abstraction comes with reputational and technical cost, so it is something that should not be taken lightly.
One more thing - I've been mostly looking at TVM from a hardware support perspective, so I'd be interested to see a more concrete example of a problem that Relax solves that in no way can be solved through improvements on Relay, e.g. a network that we can't currently compile/schedule effectively.
|
||
### ****call_tir**** | ||
|
||
In Relax, we introduce `call_tir` to bridge graph-level IR and TIR. `call_tir` is an intrinsic that calls a TIR PrimFunc (that follows DPS) and returns the output. The semantics of `call_tir` can be demonstrated by the code below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can already have Relay and TIR in the same IRModule
, so considering the question of dynamic shape support orthogonal to it, what new functionality does call_tir
add?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call_tir
is not only designed for dynamic shape support. It enables optimization/transformations of an IRModule for both GraphIR and TensorIR. We do support having Relay and TIR in the same IRModule, but we can not optimize them together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this different to checking if a Call
is to a tir::PrimFunc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call_tir
allows a PrimFunc
to be called from a Relax program. It allocates the destination tensor first and invokes the PrimFunc
in destination-passing style. It's implemented as an operator in Relax, so it's a specific kind of call. Normal Call
nodes do not perform that allocation behavior and do not assume a destination-passing style convention (call_tir
is really just syntactic sugar for performing the allocation and making the call--as long as the PrimFunc
has a global symbol, it can be called from Relax using the ExternFunc
construct in the AST).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, could this not be solved using a utility function that returns the relevant nodes?
fn call_and_allocate(prim_func: PrimFunc):
return SeqStmt[Allocate(), Call(prim_func)];
It'd be good not to have this ambiguity where call_tir
is a specific way of calling a tir.PrimFunc
rather than a general way to call TIR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main idea is to abstract over memory allocation in the front-end language. Later stages of compilation can, in principle, canonicalize call_tir
into the form you describe.
|
||
This cross-level interaction unlocks many interesting things that were not possible before, including, but not limited to: | ||
|
||
- Incrementally lower different parts of a program using different strategies, instead of lowering the entire program to TIR directly from Relay as today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate more on this? We already can lower separate subgraphs using different strategies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment! Yes, it has been supported in Relay but there are some nontrivial limitation, imo.
- (1) Relay main pipeline lowers every Relay IRs into TIR at once at their IR boundary. This makes partial lowering (lower only part of the graph) difficult in the main pipeline.
- (2) Relay main pipeline supports lowering with
OpStrategy
. However, it is not necessarily easy to customize it (custom lowering)
For these reasons, people introduced RelayToTIR
and RelayToRuntime
that essentially bypass the main pipeline. Although it enables the functionalities people want, IMHO, it might not be easy to maintain them as a framework and it is not easy if you want to leverage multiple lowering strategies in the incremental way. Therefore, Relax wants to tackle down this problem and provide such supports in an organized systematic way. For example, since Relax provides unified abstraction, we can introduce GraphIR->TIR transformation into the pipeline and this is essentially what lowering does. Thus, by introducing such mechanism as a Relax->TIR transformation pass, Relax can bring those functionalities into the main pipeline in a customizable manner. You can also easily conduct partial lowering within a pass since you have a full control. We expect users may be able to reuse most of lowering machinery since most of times, you just want to change "how to lower" part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment! Yes, it has been supported in Relay but there are some nontrivial limitation, imo.
- (1) Relay main pipeline lowers every Relay IRs into TIR at once at their IR boundary. This makes partial lowering (lower only part of the graph) difficult in the main pipeline.
- (2) Relay main pipeline supports lowering with
OpStrategy
. However, it is not necessarily easy to customize it (custom lowering)For these reasons, people introduced
RelayToTIR
andRelayToRuntime
that essentially bypass the main pipeline. Although it enables the functionalities people want, IMHO, it might not be easy to maintain them as a framework and it is not easy if you want to leverage multiple lowering strategies in the incremental way. Therefore, Relax wants to tackle down this problem and provide such supports in an organized systematic way. For example, since Relax provides unified abstraction, we can introduce GraphIR->TIR transformation into the pipeline and this is essentially what lowering does. Thus, by introducing such mechanism as a Relax->TIR transformation pass, Relax can bring those functionalities into the main pipeline in a customizable manner. You can also easily conduct partial lowering within a pass since you have a full control. We expect users may be able to reuse most of lowering machinery since most of times, you just want to change "how to lower" part.
I've put a reply to this above but I'll quickly summarise in response here. RelayToTIR
works together with the main pipeline to allow partial lowering of a Target
, it runs before the main TECompiler
and therefore seems to be exactly what you're describing as Relax->TIR? We leave the module in a partial TIR/partial Relay state for the pipeline to continue lowering.
I do agree RelayToRuntime
essentially bypassing the entire compiler is not particularly helpful for re-use, this was a motivation for creating RelayToTIR
in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to related to my reply above: #89 (comment)
- Incrementally lower different parts of a program using different strategies, instead of lowering the entire program to TIR directly from Relay as today. | ||
- Allow for more customized optimizations, such as whole program optimizations, cascading, and other post-schedule optimizations. | ||
- Enable automation (MetaSchedule) to analyze call_tir nodes and the callee TIR programs, perform optimizations and rewrites to one or more call_tir nodes, thus feeding decisions such as layout rewrite directly to the high-level IR. | ||
- By turning subgraphs into calls to PackedFunc (via call_dps_packed), BYOC becomes an IRModule ⇒ IRModule transformation as a natural part of compilation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some detail on the new BYOC would be appreciated as well - what does the Relax BYOC offer in terms of functionality that the current one doesn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me start with my comments for the previous comment :)
Relax would provide more unified and organized support for BYOC. In Relay, although the functionalities are provided, IMHO, it has been quite fragmented so it has been tricky to use multiple BYOCs together with TVM's pipeline. So, in relax, our goal is to
(1) Support the existing functionalities
(2) Organize and simplify BYOC workflow
Personally, my ultimate goal is to unlock Relax to allow the easy customization of both TVM internal and BYOC pipeline. For example, we may want to rewrite the graph IRs while considering low-level optimizations for TIR and multiple BYOC choices. (e.g., for Matmul+add
, we have bunch of choices: Tune with MetaSchedule, use BYOC like TensorRT, Cutlass, Cublas...). And some of recent BYOCs, such as Cutlass, are tunable. To enable this functionalities, pipeline should be easy to customize, straightforward, and maintainable for the future extension.
This has been difficult to achieve in Relay for many reasons: for example, inflexible lowering, each BYOC has its own greedy partitioning without considering other potential BYOCs, no interaction between BYOC and main pipeline, etc.
In relax, although it is still work-in-progess, we've seen some great signals. I would be happy to discuss further if you are interested in.
|
||
# 4. **Reference-level explanation** | ||
|
||
To achieve the design points described in the last section, this RFC focuses on how to build a **end-to-end MVP** (Minimum Viable Product) which allows the users to construct an end-to-end model (represented by IRModule), transform/build the IRModule, and run the execution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of interest, can you bring an example where this kind of workflow is applicable? Are the users supposed to write their models directly in Relax? Where should the weights data come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although Relax allows users write their models directly (you can bind params as well: example), the main path would be frontends for Torch, TF, ONNX,... as we do in Relay.
## 4.1 Relax AST | ||
|
||
To support the key design points in the last section, Relax introduces the following constructs to the AST. In the meantime, we reuse `RelayExpr`, `Call`, `Constant`, `Tuple`, `If`, `Op`, `GlobalVar`, `TupleGetItem` in Relay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I second to that - if Relax is a superset of Relay, why don't we extend Relay?
<img src='../resources/relax-e2e-flow.png' width='600'> | ||
</p> | ||
|
||
## 4.1 Relax AST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have a separate RFC for each one of the new nodes presented here, it would make having discussions easier, considering how much discussion we got out of adding one IR node (tir.constant
).
- Do shape calculations via shape heap (an internal NDArray) manipulation. | ||
- Suppose Tensor A's shape is (m, n) at compile time, and in the Relax program we want to compute (j, k) = (m+1, n+1). At runtime, A's shape will be stored in index 0 and index 1 of a shape heap(which is a TVM NDArray) via calling the vm builtin function `store_shape(A.shape)`. m+1 and n+1 will be computed by a TIR Primfunc generated in the shape lowering pass, and j and k will be stored at index 2 and 3 of the shape heap. Please refer to the shape lowering pass in the next subsection for more details. | ||
|
||
As future plan, we will consolidate Relay VM and Relax VM, and integrate Relax with the AOT executor (see Section 5). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the VM executor very well, but the impression I get is that Relax VM won't be hugely different from Relay VM, so why don't we immediately consolidate the two, avoiding the awkward situation of having to maintain two very similar executors in the stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not as familiar with the Relay VM, but perhaps it might indeed be feasible to combine the VMs (have a single bytecode language) and use it for both compilers. How much additional work would it take compared to the current Relax VM prototype @YuchenJin et al?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we could probably leverage some of the existing VM infra, i.e. the way we manage the frames, the bytecode de/serialization mechanism, and likely the memory pool, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ultimate goal is to consolidate the Relay VM and Relax VM, which requires work such as refactoring the Relay VM codegen. As a first step we bring the new vm in — which has already reused a bunch of insights from the Relay VM.
|
||
## 4.5 PR list | ||
|
||
We plan to split the upstreaming into the following manageable PRs for TVM community review: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introducing a separate compiler into the source tree based on one RFC seems like too much in one go. I'd expect to see one RFC for each Relax feature.
Thanks @leandron and @ekalda for the comments. We all agree that we are trying to improve the graph-level IR of TVM while the controversial point is that if we can enhance relay to support features from relax. Let's discuss it directly and focus on the technical points themselves. First of all, I'd like to list some most critical features that relax want to introduce:
In my opinion, it's hard to incrementally update relay to support them. G1: Dynamic shape supportTo be specific, relax can represent and deduce symbolic shape rather than use G2: A representation for TVMUnityTVMUnity is an important feature for unified optimization for graph, tensor computation, and libraries. The build flow of relay is a one-way path: G3: Customizable compilation flow and operator support.Customizing operators and backends are really common in production. There are 7 steps to add a new operator to relay. However, we only need 2 steps in relax:
Additionally, other compilation customization skills (e.g. BYOC, AOT, customized fusion) are also more straightforward with relax. Please see the TVM Unity Connection In short, a new IR is a reasonable way to support the above features IMO. And I'm open to hearing more ideas from the community. |
class Expr(BaseExpr): | ||
"""This is RelayExpr, but we add a shape_ field.""" | ||
checked_type_: Type | ||
shape_: ObjectRef |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above shape_
is defined as an (implicitly optional) Expr
but here it is defined as ObjectRef
. Is this because you want it to be implicitly optional? If, so I'd recommend explicitly optional instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tkonolige, it's a great question! Expr->shape_
should be an Expr
, ObjectRef
is used here in its definition to prevent cyclic typing.
return gv0 | ||
``` | ||
|
||
While the text format type annotation `lv0: R.Tensor[(n, 4), "float32"]` shows the shape of each value, this is only syntactic sugar. From the IR’s point of view, the `shape_` field `(n, 4)` is not included in the type signature of `lv0`. The type signature of `lv0` is `DynTensor(rank=2, dtype="float32")`, and the shape is a special value field that is attached to each `Expr`. We made this explicit choice to simplify the type inference so that we do not need to get into the [dependent typing](https://en.wikipedia.org/wiki/Dependent_type) land where type depends on value (shape in our case) which requires heavier machinery to handle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This document contains no section about typing or type inference. How does it work? Would it be worthwhile to add a section about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am working on language specifications, though I will defer to @YuchenJin as to whether they should be part of the RFC text. We actually have some unresolved debates on how it should work: tlc-pack/relax#222
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could make sense to include some broad notes about how typing is intended to work in Relax. Overall, the type system is simpler than in Relay: Tensor types track only the rank and dtype of tensors (both optional) and shape is approached separately from type.
Thanks fo the RFC. Although I didn't involve the actual Relax development, I've been attended the weekly open design review meeting for a while and I'm glad that I could share our experience to help improve the Relax design. Thus, I don't have specific questions to the design. Regarding to the point that mentioned above about whether we really need a brand new IR to replace Relay, in fact, we at AWS already attempted to build a compiler-based training framework, RAF, by extending Relay. Based on our experience working on RAF for the past 1.5-2 years, we agreed with the Relax group that Relay does have some limitations in its design, and these limitations prevent us from easily adding some new features such as dynamic shape, flexible fusion, etc. Note that I'm not saying it's impossible to implement these feature by extending Relay. My point is even it's possible, it is hard to keep a clean system/infra without a large scale refactoring. To me, it is even safer to build a new infra from scratch, so that existing workloads and use cases won't be affected at all. This is also the reason why we noticed Relax and tried our best to involve the early stage design in the first place. Meanwhile, in terms of maintaining two IRs, I don't really think this would add many overheads to the community, because these two IRs are basically independent and can be developed separately. In other words, it's up to the developers to work on Relay or Relax. As long as there's still a number of Relay developers in the community, we shouldn't deprecate it. On the other hand, if people found that Relax is better over time, then developers will gradually move to Relax. Until then, we will bring the Relay deprecation on the table, just like nnvm in the past. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the RFC, I enjoyed reading through. May be we can summarize the list of passes/features/code bases that can be shared or reused between Relay and Relax since there are some concerns about the migration and how much effort they need to familiarize the new frontend. I believe this is essentially not an issue if we can make Relax backward compatible
and the migration transparent to downstream users/deployment. Note the migration here involves functionality and performance.
- Do shape calculations via shape heap (an internal NDArray) manipulation. | ||
- Suppose Tensor A's shape is (m, n) at compile time, and in the Relax program we want to compute (j, k) = (m+1, n+1). At runtime, A's shape will be stored in index 0 and index 1 of a shape heap(which is a TVM NDArray) via calling the vm builtin function `store_shape(A.shape)`. m+1 and n+1 will be computed by a TIR Primfunc generated in the shape lowering pass, and j and k will be stored at index 2 and 3 of the shape heap. Please refer to the shape lowering pass in the next subsection for more details. | ||
|
||
As future plan, we will consolidate Relay VM and Relax VM, and integrate Relax with the AOT executor (see Section 5). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we could probably leverage some of the existing VM infra, i.e. the way we manage the frames, the bytecode de/serialization mechanism, and likely the memory pool, etc.
Thank you @leandron @ekalda for the questions, and @zhiics, @slyubomirsky, @Hzfengsy, @sunggg for the discussion! As a long-term contributor since 2018, the pre-Relay era, and the initiator and top 2 contributors of RAF (https://github.com/awslabs/raf/), the TVM-based training framework, I would love to share my perspective and slight concern about the TVM development at this moment, 2022. While being a decent auto-tuner for static shape workloads, and the latest work with auto tensorization further boosted its performance with microkernel tuning, there has been strong demand from the community to allow TVM to do more, which as @YuchenJin listed, includes:
As a community, we do encourage everyone to understand different perspectives and empower each other, and I believe this is the way for us to grow. Technically, just wanted to address a meta question here: why is it less feasible to gradually upgrade Relay?
Alternatively, we believe having Relax as a separate pass is a cleaner and more maintainable approach - gradually bringing some of the passes from the bottom is engineeringly incremental and guarantees that the Relay code path is always stable. |
I like this separation of work between Relay / Relax. We have many Relay passes that work all right and for which it doesn't make a lot of sense to reimplement in Relax. But if Relax is supposed to be complementary to Relay, is the name "Relax", as "Relay Next", still a good choice? "Relay Next" strongly suggests that Relax is something that is going to replace Relay, like we did for nnvm. I'm still not entirely clear if the plan is to eventually deprecate Relay, or Relay and Relax are going to coexist for foreseeable future. |
During lowering, this call won't get translated into destination passing style, because it is impossible to obtain the shape information and pre-allocate the memory. Instead, they are directly translated to calls that allocate and return the result tensor. | ||
|
||
- `R.unique` can be mapped to a runtime PackedFunc calls that takes in an NDArray x and perform an unique operation. | ||
- We can even dispatch to common runtime libraries such as `torch.unique`, for exmaple the above `R.unique(x)` can be lowered to `call_packed(”torch.unique”, x)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean all data-dependent dynamic ops need to have runtime packed functions, for all targets?
Even in Relay / TE we can implement unique
/ NMS
perfectly fine and we don't need special runtime support. We do that by essentially treating unique
and NMS
etc as a static op and pushing all dynamism handling to dynamic strided_slice
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great question! This does not mean data-dependent ops are required to be implemented using PackedFunc.
In Relax, we also support implementing dynamic ops using TE/TIR. For example, we can implement unique
by splitting the operation into multiple phases (first outputting a tensor of the same size as the input data) like you said, and this is supported by EmitTE.
One way to quickly support operators in Relax is to fall back to third-party framework libraries, for example by calling torch.unique
as mentioned in this paragraph. With the first-class support of calling PackedFunc in the graph IR (via call_dps_packed
and call_packed
), we can generate direct calls to these third-party libraries to get immediate op coverage.
rfcs/0089-relax-upstreaming.md
Outdated
|
||
We can introduce three builtin functions in the runtime: | ||
|
||
- `relax.runtime.builtin.alloc_heap(size) -> heap`: Allocate the heap (an NDArray) with a specific size to execute shape computation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to use a different term than "heap", since it is used very differently from the conventional CS sense ("heap" as in stack / heap).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, it makes sense!
I updated the "shape heap" to "shape tensor" in the RFC, and mentioned alloc_shape_tensor
can be merged with alloc_tensor
to achieve the same goal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, just wanted to address a meta question here: why is it less feasible to gradually upgrade Relay?
Hi @junrushao, I agree this is the meta question here, and one which seems to be highlighting the fragility of TVM itself which leads it to resist such incremental change.
- Conflicted design philosophy: Relax follows a completely different design than Relay with mutually conflicting assumptions and ideas. For example, having two conflicting shape mechanisms in the system would effectively mean passes have to handle both of them.
As far as I can tell from the discussion, it's entirely possible to begin replacing the Any
type with more detailed dynamic shape information? Making breaking changes within software is a good thing as long as we provide users information in release notes etc.
- Engineering challenge: design difference leads to hurdles for incremental updates. For example, if we want to move away from the assumption that the IR is side effect-free, all the passes with the old assumption become automatically invalid or wrong because the assumption is not respected.
Our test cases should be sufficient to ensure that the passes which use the type system or make specific assumptions aren't broken? If we haven't put in place sufficient testing then how do we know that the passes work as expected right now? 😸
- Stability concern: Even if we do surgical incremental enhancement to Relay by introducing breaking changes piece by piece, there is still stability concern. Consider a case where there are downstream vendors whose forks depend on upstream Relay, and Relay’s assumptions break over time, it would be less stable for them to maintain Relay.
Provided we document the changes in release notes, anyone consuming our packages will be able to update when they have capacity to do so and integrate the new version? We should consider how we do this incremental roll out between versions so as each release only requires a reasonable amount of changes to integrate rather than trying to avoid breakage altogether.
Alternatively, we believe having Relax as a separate pass is a cleaner and more maintainable approach - gradually bringing some of the passes from the bottom is engineeringly incremental and guarantees that the Relay code path is always stable.
This is incremental from the point of view of Relax but not of TVM as a whole, with this approach it seems consumers will eventually get to a point where they must change entirely from Relay to Relax which will be a far more disruptive change.
|
||
# 2. **Motivation and goals** | ||
|
||
Relax is an effort within [TVM Unity](https://tvm.apache.org/2021/12/15/tvm-unity) that aims to evolve the graph-level IR to maximize **expressibility, performance, and portability** across today and tomorrow’s workloads. Relax has three key goals motivated by the TVM community’s needs, and lessons the community has learned in ML acceleration through years of using and developing TVM: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify what you mean when you say that Relax maximises expressibility as opposed to Relay? My assumption is that this is the TVMScript addition but that could equally have been done for Relay 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Mousius, the expressibility here does not mean the support for TVMScript. It means the ability of the IR to express workloads. For example, we need to be able to express symbolic shapes in the IR to support and optimize dynamic shape workloads; we need to be able to express side effects to support training workloads that make inplace updates to the model’s weights during backpropagation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that training is not in scope of this RFC, can we hold off on adding side effect semantics to a future RFC? That leaves us with just optimizing for the dynamic shape workload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side effects could be useful for other purposes, like printing and assertions (both of which we should have had in Relay, IMO). It's also been noted that side effects could be useful for importing things like in-place operations from PyTorch and other frameworks.
## D0: ****Unified abstractions and optimizations across layers**** | ||
|
||
The first key design point is to allow the high-level graph IR to be able to directly interact and call into lower-level TensorIR and PackedFunc (TVM FFI). | ||
|
||
The TensorIR PrimFunc and many external libraries adopt a **destination-passing-style** (DPS) calling convention that both input and output are passed to the function as arguments, and the outputs are mutated directly inside the function: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Relay, there has been no convenient way to optimize graph IR by using the feedback from low-level.
- If TensorIR performs layout transformation for a primfunc, its decision will affect other primfuncs as well. However, Relay cannot provide such feedback back to graph IR-level since two different IRs cannot co-exist.
- Graph-level tuning methods (e.g., TASO, Collage) need a capability to apply a set of passes to the part of the graph, compile/measure its performance, and provide the performance number as a feedback back to Graph-IR level to generate better candidates. Although this could be achieved by nontrivial engineering efforts, it would complicate the compilation pipeline and maintenance efforts. IMHO, joint-optimization across multiple graph tuners (e.g., TASO+Collage) would be practically impossible.
It's possible for an IRModule to contain both tir::PrimFunc
and relay::Function
at the same time which would allow the replacement of any given Call
operator inside of Relay with a lower level operator to trial as part of a scheduler - this seems to indicate it's entirely possible using an IRModule
to connect higher and lower level IR unless I'm missing something?
We've previously done this in CMSIS-NN, relay::Function
s become tir::PrimFunc
s as part of our custom lowering in RelayToTIR
which results in an IRModule
with some functions lowered to TIR and some remaining in Relay.
Lowering should be done at once at the boundary between Relay and TensorIR and customizing lowering has been very challenging (e.g., partial/custom lowering).
- The main pipeline with
OpStrategy
has not been easy to customize and lower part of the graph for your own target, such as BYOC, while keeping other parts still in high-level IR. Therefore, people had to figure out the way to bypass it and apply their own lowering mechanism (e.g.,RelayToTIR
) that bypasses the main pipeline.- If you only want to apply certain schedule rules on the part of the Graph IR, you would need to lower those parts and apply schedule rules to them. However, such freedom has not been allowed for Relay main pipeline, so people had to find out workaround (e.g., use task extraction and find the primfunc among them. However, if extraction does not behave as users wanted, it would require extra engineering efforts).
RelayToTIR
allows a Target
to customise how it decides to lower. Every Target
can register a RelayToTIR
hook which uses a set of compose-able Pass
es to create a per-Target
lowering pipeline; these Target
s would compose together to form the main pipeline. This indicates we have the capability to create more complex per-Target
lowerings of not just operators, but also groups of operators using RelayToTIR
already with the flexibility to improve any logic for a specific Target
or re-use existing Pass
es where necessary?
The limitation I can see with this is that trying different operator strategies within the per-Target
lowering isn't well supported? Could this not be supported by providing search information to RelayToTIR
? 🤔
|
||
This cross-level interaction unlocks many interesting things that were not possible before, including, but not limited to: | ||
|
||
- Incrementally lower different parts of a program using different strategies, instead of lowering the entire program to TIR directly from Relay as today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment! Yes, it has been supported in Relay but there are some nontrivial limitation, imo.
- (1) Relay main pipeline lowers every Relay IRs into TIR at once at their IR boundary. This makes partial lowering (lower only part of the graph) difficult in the main pipeline.
- (2) Relay main pipeline supports lowering with
OpStrategy
. However, it is not necessarily easy to customize it (custom lowering)For these reasons, people introduced
RelayToTIR
andRelayToRuntime
that essentially bypass the main pipeline. Although it enables the functionalities people want, IMHO, it might not be easy to maintain them as a framework and it is not easy if you want to leverage multiple lowering strategies in the incremental way. Therefore, Relax wants to tackle down this problem and provide such supports in an organized systematic way. For example, since Relax provides unified abstraction, we can introduce GraphIR->TIR transformation into the pipeline and this is essentially what lowering does. Thus, by introducing such mechanism as a Relax->TIR transformation pass, Relax can bring those functionalities into the main pipeline in a customizable manner. You can also easily conduct partial lowering within a pass since you have a full control. We expect users may be able to reuse most of lowering machinery since most of times, you just want to change "how to lower" part.
I've put a reply to this above but I'll quickly summarise in response here. RelayToTIR
works together with the main pipeline to allow partial lowering of a Target
, it runs before the main TECompiler
and therefore seems to be exactly what you're describing as Relax->TIR? We leave the module in a partial TIR/partial Relay state for the pipeline to continue lowering.
I do agree RelayToRuntime
essentially bypassing the entire compiler is not particularly helpful for re-use, this was a motivation for creating RelayToTIR
in the first place.
|
||
### ****call_tir**** | ||
|
||
In Relax, we introduce `call_tir` to bridge graph-level IR and TIR. `call_tir` is an intrinsic that calls a TIR PrimFunc (that follows DPS) and returns the output. The semantics of `call_tir` can be demonstrated by the code below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this different to checking if a Call
is to a tir::PrimFunc
?
|
||
We plan to split the upstreaming into the following manageable PRs for TVM community review: | ||
|
||
- Relax IR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relax IR
really needs an independent RFC which details the dynamic type system so we can discuss methods of integrating it directly into Relay, this seems to be one of the larger technical hurdles within this RFC (though I agree with @ekalda that really we need RFCs for each independent architectural piece here)
- Relax IR | ||
- Relax VM | ||
- BlockBuilder | ||
- ExprFunctor/ExprVisitor/ExprMutator/IRFunctor | ||
- Relay → Relax translator | ||
- Minimum build (4 passes) | ||
- VM Codegen | ||
- E2E model compilation + execution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The upstreaming plan should be represented in the Relax Roadmap (https://github.com/apache/tvm-rfcs/blob/main/rfcs/0069-relax-roadmap.md)? Which should also contain references to these more detailed RFCs.
Thank you, everyone, for the discussions here. Let us take a step back and look at the non-technical parts of the conversation. A lot of our discussions come from two goals:
Both goals are very important. G0 ties to our ability to continuously support our current use cases. G1 is also essential to our viability as a solution, so we can grow as a community and stay competitive in a fast-evolving machine learning compilation landscape. Enabling both has always been an important theme of long-living projects. Deep learning frameworks are a common reference to refer back to. Usually, they are done in roughly three phases:
Each stage contains a different level of commitment and would normally entail different levels of gating criteria as we look at them. For example, PyTorch introduced TorchFX as an optional module that supports graph tracing and export. It had some overlapping capabilities with TorchScript. The PyTorch community is collectively evolving some of the compilations (TorchDynamo) to make use of FX. As of now, there is not yet an announcement of S2 from the community. Encouragement of S0 and making it easy to do helps us to enable G1. A too high barrier here can discourage community contributions and result in mainline lacking the latest features and short-living our competition. This is especially important given that the land of machine learning compilation still remains open, and the ability to timely support symbolic shape and training helps bring in users and contributions who would otherwise turn to alternatives. G0 is equally important here. In many cases, they boil down to making careful and informed decisions regarding evolution (S1 and S2). Additionally, making sure that at S0 stage, there is a limited disruptive change to the existing infrastructure. Importantly, not every module/feature has to go through all stages. And in common practices, the decisions in each stage are usually not made at the same time. We can find examples of S0 cases in TVM as well. For example, USMP was currently designed for specific cases like AOT. We welcomed these improvements to unblock needs in embedded settings early. Through USMP we found the need of tir.alloc_const, which related to evolving on existing infra(S1). As a result, we had a more in-depth discussion. Additionally, we are bringing the effort to further enable USMP in a broader setting as part of S1. At some point, we might consider consolidating all memory allocations as S2 – note that many community members are collectively working toward that goal, but we are not yet at a point to make such a decision. As another example, we enabled cascaders that are specifically designed for micro-NPU, which had some domain overlapping with the arithmetic affine module, but nevertheless bought in without consolidation because we believed that there is enough interest and maintenance support for the module. Finally, the unpacked_api was specifically enabled for extremely low-resource settings, and we enabled S0 level inclusion despite some inconsistency with the packed func API. Of course, we do not want to enable random things in the codebase, which ties back to the maintenance overhead concern. One of the questions we want to ask here is whether the module contains enough support from the community that allows continued maintenance. Additionally, we should consider the fact of added engineering support by welcoming additional community members who are interested in the needs and would otherwise look elsewhere. Our overall thought process and decision time point for each stage can be different – they should be so we can enable both G0 and G1. Nor do all modules have to go through all the stages. For S0, we would expect if there are enough champions in the community with a self-contained plan. For important features, we would expect, say, more than three committers who can champion the module and significant community support to maintain them. Additionally, S0 should be made as minimally disruptive (wrt to the current infrastructure) as possible. To encourage G1, we can overlook some levels of duplications (just like the TorchFX and TorchScript case, USMP, and other allocators when they land as S0), considering the additional community support we get to maintain them. S1 and S2 would involve more careful discussions and coordination with greater amounts of details on some of the key points. Likely, they will also happen at a different time point so we can make informed decisions. This particular RFC is at the S0 stage and intentionally made to be so. As the RFC stated, there is no proposal to make S1/S2 decisions at this RFC. Many of our current discussions are around S1/S2 – the future evolution of the system. They are extremely helpful discussions to have to set up the context and help us improve the design, but not necessarily decisions we have to make immediately. Let us think about the broader community members we can empower and bring in through enabling the S0 improvement. Thank you, everyone, for the discussions so far, and let us work together to enable our community. |
On the point about potentially incorporating symbolic shapes into Relay, I would like to hear more detail about how it can be done with Relay's system of accumulating type constraints and solving them simultaneously. If we were to handle dynamic shapes in Relay, we would need to define semantics for how shape variables are scoped and how assignments are handled, how they can be processed during the solving of type constraints, and what happens if symbolic shape expressions cannot be concluded to be the same at compile time. If this can be neatly incorporated into Relay, then it might make sense to pursue. I would be happy to brainstorm on that issue. |
Having taken onboard the feedback from community members (@leandron, @ekalda, @Mousius, @masahi), a number of us involved in this RFC (@YuchenJin, @jwfromm, @tqchen, @areusch, @mbaret, @jroesch, @tmoreau89) feel it’s necessary to be explicit about the scope of this proposal, and we apologize to those reviewing that this was not present in the original text.
The RFC has been accordingly amended to include the above commitments in the "Motivation and goals" section, which we hope addresses some of the valid concerns expressed so far. |
Thanks everyone for discussions so far. There are a lot of conversations around the clarifications and issues being answered. Two of the main concern being raised so far seems to be:
In this case, @YuchenJin have:
I think the above two actions have addressed the concerns being raised wrt to the particular scope of this RFC. @leandron @ekalda @Mousius please followup and see if you have more questions Let us continue follow the overall guideline of providing grounded technical justifications and bring constructive discussions, knowing that some of the high-level tastes argument can be subjective. In those cases, thinking about the community over code principle can go a long way. The particular Unity connection strategy already got majority support from the community(from more than 8 organizations), let us think about the following question: How can we collectively empower those members together? |
Thanks @YuchenJin for updating this RFC! From the use-cases that I observed from ByteDance, the symbolic shape capabilities allows TVM to handle dynamic workloads that cannot be handled in other frameworks and be more widely adopted. And we can quickly enable iterations of unity connection as stated in RFC #91 , by bridging the high level representation with the low level TensorIR. |
Based on our experience at NIO, dynamic shape support in Relax is extremely important for us. In fact, we have done many things on Relay trying to cover dynamic shape support on our user cases, however lack of first class support for symbolic dynamic shape still constraint us some ops / patterns can not exist on models. First class support for symbolic dynamic shape is extremely extremely important for us, especially some model is essentially dynamic input / output, for example Point Cloud. Relax, this is what I've been expecting for so long. If we have Relax, for Point Cloud or Object Detection model's dynamic output / Dynamic Batch model, Relax could solve it perfectly (whether from the view of functionality or performance). Anyone has doubt I recommend to read this: https://github.com/tlc-pack/relax/wiki/Relax-Shape-Computation-Design. Thanks @YuchenJin @tqchen and many relax authors to bring it, I very appreciate this work 👍 and in fact, I am evaluating relax internally and want to let relax to solve our problem ASAP. |
In addition to the use cases and experience I've mentioned previously, I want to further highlight that symbolic shape support becomes even more important in these months for us at AWS, mainly due to the requirements of deploying decoder models (e.g., GPT). Since the text generation process is a natural dynamic shape workload in terms of sequence length and batch size, padding everything is not practical due to its inefficiency, which is already shown in latest paper publications. It is extremely important for TVM to support this case if we attempt to keep being the SOTA deep learning compiler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks everyone for discussions so far. There are a lot of conversations around the clarifications and issues being answered. Two of the main concern being raised so far seems to be:
- rationales and alternatives.
- overall scope and execution (relation to relay deprecation)
In this case, @YuchenJin have:
- Updated the RFC with the rationales and alternatives
- Clarified that the scope of the proposal being in scoped to checking in the module as optional and non-disruptive or changing existing modules (nor do they imply deprecation of relay).
I think the above two actions have addressed the concerns being raised wrt to the particular scope of this RFC. @leandron @ekalda @Mousius please followup and see if you have more questions
Let us continue follow the overall guideline of providing grounded technical justifications and bring constructive discussions, knowing that some of the high-level tastes argument can be subjective.
I'm in agreement with my past self here, looking at the scale of the discussion on this RFC it seems prudent to do a functional decomposition exercise on Relax to break it down into the relevant units for integration. It appears that @YuchenJin has given an example of how to sub-divide the functionality into three separate blocks which can be landed independently:
- C0: Add first-class dynamic shape support
- C1: Add first-class support for interactions with TensorIR and TVM FFI
- C2: Add first-class dataflow and side effect handling
I believe by breaking this down and tackling each unit independently we can analyse whether or not we can integrate these features into TVM without an optional module, if it requires breaking changes then we can easily do that now that we have a release cadence and notes to inform our users. It seems to me that we should address dynamic shapes first given @YuchenJin has already illustrated the changes required are not significant provided we don't try and maintain both behaviours?
|
||
### ****call_tir**** | ||
|
||
In Relax, we introduce `call_tir` to bridge graph-level IR and TIR. `call_tir` is an intrinsic that calls a TIR PrimFunc (that follows DPS) and returns the output. The semantics of `call_tir` can be demonstrated by the code below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, could this not be solved using a utility function that returns the relevant nodes?
fn call_and_allocate(prim_func: PrimFunc):
return SeqStmt[Allocate(), Call(prim_func)];
It'd be good not to have this ambiguity where call_tir
is a specific way of calling a tir.PrimFunc
rather than a general way to call TIR.
|
||
# 2. **Motivation and goals** | ||
|
||
Relax is an effort within [TVM Unity](https://tvm.apache.org/2021/12/15/tvm-unity) that aims to evolve the graph-level IR to maximize **expressibility, performance, and portability** across today and tomorrow’s workloads. Relax has three key goals motivated by the TVM community’s needs, and lessons the community has learned in ML acceleration through years of using and developing TVM: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that training is not in scope of this RFC, can we hold off on adding side effect semantics to a future RFC? That leaves us with just optimizing for the dynamic shape workload.
@Mousius In this case, @YuchenJin 's reply clearly articulated that there is a close co-design of these factors, and changing to adopt dynamic alone would imply a one-step jump to relax -- which is not incremental. The data structure change would come with a set of supporting infra and co-design of things including shape inference, and other things. Of course, if we do not maintain both behavior and allow an incremental transition. Then it is equivalent to a relay=>relax bridge that would allow part of the pipeline to go through in a more incremental fashion. This approach is consistent with what is being proposed in the RFC (with clear data structure boundaries). I would suggest us not to trivialize the argument here, as it is easier to say why not do X than really go and do things. The learnings from @YuchenJin by concrete code certainly is very convincing, as well as their learnings of why not do things otherwise. This is a case where there are subjective disagreements happens. In such cases, I would encourage again for us to put community into consideration, and the fact that things are not disrupting other existing flows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Clarified that the scope of the proposal being in scoped to checking in the module as optional and non-disruptive or changing existing modules (nor do they imply deprecation of relay).
I think the above two actions have addressed the concerns being raised wrt to the particular scope of this RFC. @leandron @ekalda @Mousius please followup and see if you have more questions
@tqchen thanks for the ping. I'll have quite limited time to follow-up here this week, however, I wanted just to comment on the last changes brought into the RFC text.
I think that at this point there is no dispute and we all agree that:
- The features proposed in Relax are important features to be incorporated in TVM;
- There are benefits in bringing new contributors to the Community;
At the same time, I feel Relax - as a new IR - it not an ordinary feature like "supporting some new framework as a frontend" or "incorporating a new algorithm for tuning". It is a whole pack of features in the form of a new IR.
Relax is actually part of TVM Unity, which claims to be a long term view for TVM - which is great. So what I feel (see all my previous questions) is that we need this RFC amended with what is this long term view with a focus on Relax.
I'm starting with the thinking that we don't want to maintain two IRs forever, and if really needed for complete lack of options we need to temporarily incorporate a new IR, we must have a horizon in which we'll consolidate in one.
The main question in my mind is: in the current form, what is the author's view on Relax being a replacement for Relay, as opposed to an alternative partially overlapping workflow?
Based on the answer to that question (the must come before we decide to merge this work), then we need to think about:
- If we commit to migrate as a community to Relax, then what's the author's view of the rework needed in existing passes/runtimes/frontends/. IMHO the authors are not required to do the actual work, just the due diligence of giving the community a view of the technical work required;
- If we don't commit to migrate as a community to Relax, what are the alternative designs we can make to evolve Relay with same features proposed here?
- If the authors don't have enough information to take the decision, then I think it is on the authors role to refine the plan so that we get consensus on one of the options above.
That seems like a sensible thing to ask for a feature the size of Relax, in the context of TVM Unity. It just needs some more work to be done, so that we have enough information visible to the whole community on what decision we are taking and why.
I'll emphasise that I think we must have clarity on that decision before we merge this, as once it is merged in the code base and enabled in CI, we all indirectly committed to it. This is pretty much in line of what other projects in this are do, and I'd be happy with a roadmap based on information we have today, but I think it will beneficial to everyone if said roadmap is included in this RFC.
We acknowledge that the migration of default flow (that involves Relay) would benefit from more in-depth discussions on possible approaches. They are not part of this proposal and we welcome further discussions on this topic. We think however, that it is useful to provide such a context. This section provides a context on possible difficulties in evolving Relay to bring some of the features. | ||
|
||
From the development point of view, we think that the technical reasonings, community needs, and technical support around the module being present is sufficient to justify the inclusion of the optional module. We agree that we need to do further in-depth discussions on default flow evolution, and would be more than happy to continue the discussion after this RFC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you comment on what are the discussions you think need to happen and what are the expected outcomes from these discussions?
The current wording feels a bit vague, considering Relax as a new IR that might generate uncertainty from the contributors on where to actually contribute new features, so, if TVM is moving to Relax, new features should go on Relax or if eventually we're going to stay with Relay, new features should actually Relay, and in this case, where does Relax fit in this puzzle?
|
||
Presenting optional modules and dialects to the ecosystem is an important way to welcome new improvements, land our technical commitment timely, continue to reinvent ourselves, and welcome new community members who have new use cases. It also aligns with practices we see in other ML projects, such as MLIR dialects, PyTorch FX modules, etc. | ||
|
||
As in these previous cases (MLIR dialects and TorchFX), there can be some amount of perceived duplications with existing modules. Nevertheless, there are usually technical reasons for a new module, both from design consistency, time to delivery, and separation of concerns from stable modules during improvements. Additionally, there is also a majority amount of support to help continue maintaining the new module. This new community effort would help us to welcome new members to the community, and enhance the overall development of TVM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In previous comments I was unfamiliar with the project guidelines for MLIR and Torch, but reading their contribution guidelines, they actually require much more certainty on the future landscape for a given feature to make into the codebase, so I see no reason why we should have lower standards than those communities, at the same time we mention those communities as examples to be followed.
Based on that, I disagree on the premise that features should be brought into the codebase on the major justification of welcoming new contributors.
Co-design doesn't mean we have to land everything at once, it's often easier to prototype several features together and then break them down when we get to delivery - I believe that is most incremental and understandable rather than landing large changes with several dimensions at once. I'm also unsure how it's a one-step jump to Relax as @YuchenJin has demonstrated that the functionality of Relax is split into several different pieces. I understand that in order to land the entire proposition of Relax it may take several pieces landing, but that's natural for a large change such as this?
They're not equivalent as we can avoid adding additional variations in our code by making a breaking change early and iteratively integrating the other various pieces of Relax into existing code. @slyubomirsky also expressed an interest in investigating the dynamic shapes in isolation so I believe this could have some momentum? |
As an individual Relax contributor from UCSD, I don’t feel our community has created an environment that welcomes new contributors and new contributions. I am happy to see different opinions in the discussion, but it's really shocking and disappointing that we are afraid of moving forward even with large amount of community support, and for such a technically strong optional module.😞 Based on several community members, Relax can solve the needs that they care about. It’s also made clear by @YuchenJin in the RFC that Relax is an optional module which does not disrupt the current TVM flow, and it does not have the intention to replace Relay. In most open source projects, the evolution is done in phased manner. I am very surprised that such a module that can solve the current pain points of tvm has been blocked by subjective opinions. It's hard to imagine that everyone who wants to contribute to tvm, especially those who do foundational work like relax, needs to do future predictions about existing flow deprecation and timelines, which in my opinion discourages people from contributing. |
Thanks everyone for the feedback. One thing that we seem to agree on is that there is a strong need to support symbolic shape use cases for TVM, as represented by many of the folks who chimed into this thread. Hopefully, we all agree that there is a strong need to support robust and high-quality dynamic shape compilations, and that is not being supported in the current TVM. One of the questions is whether or not we can do that by separating the changes of C0, C1, C2 in Rationale and Alternatives. In order to provide values to the user, we need end-to-end dynamic shape compilation for some specific environment of interest (for example it is less relevant at the moment for micro settings). C0, C1, and C2 are necessary to support a minimum but robust dynamic compilation:
In summary, in order to build a minimum viable dynamic shape compilation in the most high quality, robust, and general fashion, the lowering flow would require stages that contain C1 and C2. We would also like to highlight that having C0, C1, and C2 helps us to achieve the goals faster. For example, running fusion on calls into TensorIR means we no longer need developers to annotate the fusion properties of each op, and we no longer need to handle weight layout rewrites in an ad-hoc way. This reduced engineering complexity thanks to better architecture helps us to bring robust high-quality support for dynamic shape faster to the community. The dynamic shape compilation, along with first-class C1 and C2 interaction are needs that we do not support well today in TVM. They offer clear values to our community, and differentiate from the cases that Relay already supports. One of the main topics raised was how Relax fits into TVM in general. We acknowledge that it is impossible to provide the full picture immediately. It is also not necessary to make all decisions in the beginning, as making uninformed decisions is worse. It is also common practices for OSS project to evolve and bringing in S1-conversations along the way. For example, when TorchFX got introduced, there was no prediction of TorchDynamo, which got introduced in the later part as things iterated through the codebase and became clearer. Nevertheless, we would try our best to address how Relax fits into TVM: C0, C1, C2 also highlight how cohesive relax fits into TVM’s current codebase more than many other modules in the codebase. In particular, the design deeply aligns with TensorIR, topi, symbolic shape and PackedFunc and many other modules. We also view having Relax and Relay co-exist in the codebase as a positive thing and better than the current state:
These positive parts already would bring the community a long way. We also acknowledge that we cannot pin down all the technical decisions around S1/S2 level in the very beginning. Please see the Future Opportunities section in the TVM Unity connection RFC, which I encourage everyone to check out. The author also wants to come back to community empowerment. In this case, many in the community have voiced their need for empowerment with grounded technical reasonings. We would like to encourage us to take a step back, as most of the technical reasons are less grounded and we do not need to make all decisions in a single shot. We are looking at a module that is clearly isolated, makes no disruptive change to the code base, and is needed by the majority of community members. Having open-mindedness here would go a long way toward building a welcoming and inclusive community. |
In Intellif, people build, maintain and extend the DL compilation stack with Relay in past years. However, we never think the upstreaming of a new module would break existing functionalities or cause confusions, but huge opportunities to solve many technical issues which prove to be not so easy to handle in Relay, which are already emphasized in the discussion thread. From my perspective the TVM community is a very inclusive community. We do have modules of certain overlapped functionality co-exist without so much debates. As examples we could see different runtime implementation for Relay, TE-schedule and TensorIR-schedule, Ansor and meta-schedule, etc. Wish it is also not a problem on graph ast infra. |
Thanks for this great job ! |
For those interested, I think this recent paper shows one way how symbolic shapes could be made to work with Relay's type checking approach (Axon is clearly structured very similarly to Relay), though it would require substantially reworking the existing type relations in Relay. It's rather different from Relax's approach, so it's a possible point of comparison. |
I'm a graduate researcher at UW and had been a full-time SDE at AWS AI for years, my work is around Deep Learning Frameworks/Compilers. I feel like all of us agree dynamic shapes are essential so I don't want to spend more time emphasizing how important it is. I'm not a contributor to Relax, but I have been following it for a long time. I don't want to pretend to be neutral, I do think it is quite necessary to welcome Relax, rather than just adding dynamic shape support in Relay. The main controversy in this thread is about whether to upgrade Relay incrementally or develop a new IR called Relax. I understand hardware companies appreciate stability, and we can see CUDA didn't change its interface drastically over the years, what a miracle! There must be several times people wanted to develop new languages/compilers for NVIDIA GPUs but CUDA survived, this is a lesson we should learn: in the beginning, we design things with a vision of the future in mind, then we maintain them with high standard, improve it incrementally and be customer-obsessed. This is the ideal story, but we should not ignore that though CUDA was invented before the DL era, there are already many high-performance computing workloads the designer can refer to. Fortunately, even in 2022, the operators used in DL still highly align with HPC ones and are actually simpler (it's a world of GEMM). What about the story of (computational) graph-level IRs? The dominant workload in DL changes over time and I would say they cause a lot of headaches for framework and compiler designers: first CNNs/RNNs/LSTMs/Tree-LSTMs(the structure dynamism is one of the challenges Relay would like to tackle, but unfortunately they are used nowhere), then we have Transformers/GNNs(not as hot as Transformers because of hardware lottery, but who knows the future). Now we have entered a time where models converge, but scalability grows significantly: models become larger and larger, and a lot of engineers and researchers propose (checkpointing and rematerialization, quantization, graph substitution, fusion and stitching, sparsification and mixture-of-experts, hybrid parallelism) to optimize DL workloads at compile-time, and I'm glad to see many of them are developed upon TVM because TVM's design is always up-to-date and support new workloads quickly, however, Relay's current design cannot take full advantage of these new techniques, and the system has the trend of becoming fragmented. Relax is a great opportunity for us to reconsider the graph-level IR design: prune the redundancies and add new functionalities, it's exciting to see we can unify different levels of optimizations together in TVM Unity, once Relax is accepted by the community. Refactor makes things simpler, rather than complex. Whenever we found it's time to make some changes, TVM always embraces new designs. This happens several times in TVM history: Prior to Relay, there is NNVM, which is deprecated and completely replaced with Relay. The previous Tensor-Expression has limited expressiveness, and the schedule tree data structure cannot support tensorization elegantly, then we have TensorIR, which is not only backward compatible, but also brings opportunities for developing new dialects (Ruihang and I designed SparseTIR upon it, works pretty good). The AutoTVM cannot generate scheduling templates automatically, then we have Ansor and Meta-Scheduler. I would emphasize that the core components of all these updates are upstreamed within several months, and do not break any backward compatibility, it credits to our hard-working and open-minded contributors and reviewers. Committing to TVM helps these contributors become MLC experts, some of them are PMC members now. I would say non of these refactoring influences TVM's reputation, on the contrary, it makes people impressed by TVM's speed in adapting to the future, and they are more willing to try TVM because it's open, it's driven by innovation. I really don't understand what's the difference this time, when it comes to Relax? We have a bigger community, this is awesome and I definitely welcome your input and constructive suggestions on the future of this project. I view the New Scoped Module RFC as a contract between industrial developers and researchers/engineers like me that works on "toy prototypes", we promise not to touch anything that might influence user experience, we also don't want to be discouraged because my prototypes cannot be upstreamed and only stay in some random GitHub repo as a toy. I also think the new S0-S1-S2 progress is already the most painless approach to delivering new designs, and the effect is equivalent to incremental change. If people take a look at the Relax repo, it already has a huge amount of code there and well-written documentation (you can compare it with the official relay documentation), I think it's super inappropriate to ignore these contributors' devotion, especially individual contributors such as @LeshengJin . TVM has a huge user base of researchers, they are an important part of the community, and they also contribute high-quality code instead of just hacking. Regarding the "lower standard than other communities" issue, TVM has high standards and we are not talking about standards. If no fundamental changes are allowed in DL infrastructures, google should stay at TF 1.0 and never develop JAX, and PyTorch should not create so many different compiler infrastructures (I want to share this slide again. I'm willing to provide more details on every argument if requested. Best, |
Based on my experience at several organizations, dynamic shape support is obviously very important, particularly along with the popularity of large language models. Also, efficiently supporting dynamic shape would be one of the major appealing features of a "modern" DLC. I think the above comments have also reached the agreement of importance of dynamic shape. The major argument is whether we need to have separate PRs to land this feature. IMHO, Relax is already one of the components of Unity, and the current proposal again only contains the most value part of Relax which provides a minimum E2E compilation flow to enable the support of a dynamic model. This somehow has been working well before in both TVM and other open source project since the component doesn't blocking/breaking the current uses/deployment. For example, the first version of Relay also had IR, simple lowering, and necessary passes to quickly unblock the users/developers (e.g. AWS) who want to give it a try. Afterwards, we iterated on it many times to improve both the design and implementation. As a contributor of TVM, I would encourage we focus more on the design itself and spot the design flaws and the missing key features that we should address so that users (some of them are already waiting for Relax as mentioned here) can quickly check it out and bring us back with more insightful feedback or directly contribute to the project. |
There were concerns that bought up in RFC #95 that this RFC conversation did not cover "how proposal fit into TVM". We agree that discussing the fit is important and would like to refer to related conversations and sections:
|
I learn a lot from reading through the thread, and find most people here are from a system background: either doing related research in schools or heading an engineering team in companies. I would like to share some of my thoughts from a different perspective, as a TVM user and ML algorithm developer. I am a graduate student at MIT and studying efficient deep learning algorithms and co-designs (details in my page, lab site and our recent project that trains NN on a 256kB MCU). We have been honest TVM users because of its flexibility, high performance and open-source. But, when we want to dive deeper and make some customizations, things are becoming complex and relay is no longer friendly
I understand those who want the backward compatibility so existing projects are not broken. But we cannot build a ship of Theseus in the real world and the above issues cannot be easily "improved" with current relay. If TVM do not embrace new designs and improve its user-friendliness, then, eventually developers will switch to other tools and this is indeed happening:
I like the TVM project and hope the community can be always active. TVM has a huge user base of researchers and Relax can allow them to easily contribute their code and idea to the repo, instead of tricky hacking and creating separate repos for each project. This is important for an open-source community -- just recall how mxnet loses its market and why PyTorch can beat TensorFlow even released one year later. TVM should consider Relax's upstreaming given its more thoughtful and user-friendly design, well-written documentation/tutorials, and S0,1,2 painless upgrading. I would like to discuss more if there is any comments and questions. |
Thanks everyone for the discussions! A brief recap of our discussions so far:
|
After seeing so many voices in this thread. I think it is important to provide a reply here. I am wearing the Apache TVM hat as a ASF member and Apache TVM PMC member. First of all, I would like to say thank you, everyone, for sharing your voices here. This post has received support from more than eight organizations from both industry and academic backgrounds. Your voices are very important to the community and will not be ignored. As many said, we would love the TVM community to continue being inclusive and innovative while maintaining the stability of existing developed components. I also would like to come out and acknowledge the positions so far: The position that @leandron made so far was:
I would like explicitly to acknowledge that the above positions have valid rationales, are completely valid, and can be a possible way of software development. I think the position raised by @YuchenJin and others were:
These are also valid rationales and can be possible ways of developing things. As a first step, I would like to acknowledge each others’ positions as they are valid rationales. The main difference is that there is a disagreement on how we should do things as a community. Such a decision should be made collectively as a community, considering all the factors involved: including code and community factors. We all make our suggestions holding innovation, stability, and community into account. When evaluating a proposal and empowering our community members, we expect every one of us to continue having a constructive conversation, considering the latest context. While the initial comment made by @leandron is valid on its own, I would love to see we re-evaluate our positions message considering all the factors in the latest context, including community empowerment and the collective views of other members. I want to say that by no means do we simply seek to dismiss the original position -- i would apologize if I it makes it feel that way. Instead, we want to acknowledging each view, and we have disagreements on hows, and taking community's view into consideration. Have constructive public conversations in services of many who have voiced their support here -- as we continue to empower each other to bring an inclusive, welcoming and supportive community. Thank you! |
My position:
|
need to update it quickly |
Five years ago, we started with a community that comes with a common vision in mind – enabling machine learning engineers to optimize and run computations efficiently on any hardware backend. Five years later, the fields of machine learning and MLC(ML compilation) have gone under rapid changes. That same vision is still shared among this community. This is why many of us still feel so fresh when writing code patches, bug fixes, architectural refactors, and new features. We are here today thanks to a diverse community that comes with different perspectives and areas of focus but still aligns around that common vision. As a project, we benefit from different perspectives to survive in the ever-changing and competitive area of ML compilation. Hardly one can predict every detail of the future (just observing the set of recent changes such as chatGPT and stable diffusion). Hardly can one definitively assert that one approach would be better than another one from the very beginning. Enabling diverse possibilities helps to move the project forward while enabling different needs. As a community, while we care about different subsets of modules and do not always need to work on the same thing, there is always an overlap of interests, regardless of whether it is the graph, FFI, TensorIR, or backend that sustains collaborations among different people. Most importantly, we come with a mindset of empowering each other under the same vision. Thank you, everyone, for participating in this thread. This thread arrives at its current state due to different perspectives on possible thoughts of project procedural operations (whether a detailed migration plan and commitment to migration are necessary for the new module proposal). There is a common agreement that migration(if it happens and is being proposed) would require a lot of details and community buy-in, but different opinions about when that can and how that should happen. On behalf of the TVM PMC, I would like to recommend an initial step to help us to recognize achieve the following goals from different members of the community:
Specifically, we would recommend us to follow an existing practice in projects like Hadoop, to empower related development in a branch. ASF mechanism allows any committer to create a branch in the apache repo and do collaborative development there at their own pace. Per our existing process, merging a branch into main still requires lazy consensus. Branch development offers flexibility while accepting the risk of blocking when merging to the main. As a result, there are general incentives to keep alignment with the majority of the community and continued engagement to get buy-in. Branch development offers a way to collaborate on a possible but not definitive future of the project, as a branch can come with different outcomes such as being partially merged, continued development, or abandoned. Enabling different perspectives is important for us both as a project and community. TVM PMC re-affirmed that branch development can be used as an option for the project and specific development around tvm unity. We would like to offer it as a possible option for the community and the first step of execution, with the goal of getting related pieces into main. I wrote down a more detailed post, which we would love to get everyone’s feedback. Of course, this is only one possible option, and community members can freely choose their ways of participation. Developing in a branch will also give some time buffer to answer G1. It is valuable to answer questions and have grounded conversations to give more information to the members who are not yet on board with the new module. Noticeably, to many community members, detailed code examples, benchmarks, and continued engagement are necessary to get broader community buy-in. We would recommend having focused discussions on the questions of interest (e.g. give concrete code tutorials for BYOC) to help the community members who have related questions. We encourage such continued conversations in forum threads, meetups, and development interactions with the goal of getting as much information as possible. Again such interactions aim at demonstrating possibilities, but not warrant deprecation or migration since that choice should still lie in the hands of the community. Hopefully, they give more comprehensive pictures for us to make follow-up decisions collectively. As part of winter break, I started to do more coding, and I was really fascinated to see that passion still is deep in my heart(and I believe in many of us) after so many years, thanks to this community and our common vision. As a community member, I am really motivated to spend focused energy helping to build concrete code examples and tutorial materials for G1. Please also checkout this post for more details |
An update, thanks to effort of many community members we are now at a period where the initial foundational items in unity branch are now established. One goal of the G1 is to give some time answer questions, and provide examples to those who have shared needs to have more detailed evidence and possible feasibility analysis of migrating some modules. As part of the effort, the community members have been posting tutorials on related topics of interest in the forum. I would encourage folks in this thread to feel free to ask for what additional tutorials you want to see and/or specific technical questions. |
Just another update, it is great to see unity being developed and used for dynamic shape and emerging use-cases One goal of the G1 is to give some time answer question. There are more topics of related interest(some might related to the questions in this thread https://discuss.tvm.apache.org/c/development/unity/14) Please check it out and feel free to participate the discussions and technical questions |
sending another reminder for everyone to chime into related unity discussion threads https://discuss.tvm.apache.org/c/development/unity/14, love to see your participations on all the technical discussions and see the how we can collectively address your needs |
I want to know do we have plan to decide when to merge Unity branch into main branch? As LLM is so popular now, without Unity, we can not support it well. |
Thanks @FrozenGene for bring this up! To bring broader awareness, we posted a new strategy proposal here https://discuss.tvm.apache.org/t/discuss-tvm-core-strategy-for-emerging-needs/15751 to concretely enable LLMs and other usecases |
It's worth noting that with the merging of Unity into TVM's main branch, Relax has already been de facto upstreamed. |
indeed, check out apache/tvm#16446 |
🥳 |
This RFC proposes to upstream the core foundation of Relax including its IR, compilation flow, and runtime, to address the critical needs identified by the TVM community, and enable a cohesive (but optional) TVM Unity Connection milestone.
Rendered version