Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Relax Upstreaming #89

Closed
wants to merge 4 commits into from
Closed

Conversation

YuchenJin
Copy link

This RFC proposes to upstream the core foundation of Relax including its IR, compilation flow, and runtime, to address the critical needs identified by the TVM community, and enable a cohesive (but optional) TVM Unity Connection milestone.

Rendered version

Copy link
Contributor

@leandron leandron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting this RFC. It has been long mentioned in the community in terms of "something coming up".

Generally, you'll find in my questions that I would like to hear more about the work done to evaluate what is the impact of bringing the proposed features using our already existing components. Please don't take that as vague criticism to the proposal being made.

The main concern I have is that, as a community, the long term cost of replacing an existing IR - as opposed to incrementally improving it - seems to be very high, and something we should be mindful.

Even if it is a long term plan to fully develop and integrate Relax, we can't deny the fact that in practice it also means we will need to maintain two IRs for some time.

My understanding is lots of work was already invested in prototyping Relax, and I appreciate that many improvements were probably done, but after reading the whole proposal I wanted to be open minded in the sense that we could, for example, as an alternative approach to switching IRs, to take the learnings from implementing Relax, and bring those to Relay, reducing impact in the whole stack.


# 2. **Motivation and goals**

Relax is an effort within [TVM Unity](https://tvm.apache.org/2021/12/15/tvm-unity) that aims to evolve the graph-level IR to maximize **expressibility, performance, and portability** across today and tomorrow’s workloads. Relax has three key goals motivated by the TVM community’s needs, and lessons the community has learned in ML acceleration through years of using and developing TVM:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what performance means in this context, specifically?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance here means:

  • Compilation performance: defined as the effectiveness of Relax’s optimization passes in transforming and optimizing Relax code.
  • Runtime performance: defined as the runtime latency for various Relax workloads, whether they are subgraphs or end-to-end models.

It can be found in the Relax roadmap rfc: #69. Maybe I should attach a link to it in this RFC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to explain more about the Runtime performance

For a single operator or a single subgraph, relax uses Meta-schedule, which is the same as what relay does. However, relax gives us an opportunity to break the boundary between Graph-Op and low-level and enable optimization across the layer - for example, the layout rewrite (as I mentioned at https://discuss.tvm.apache.org/t/establish-tvm-unity-connection-a-technical-strategy/13344/4).

@jinhongyii has finished a prototype for automatic (data and weight) layout rewrite, which has about 10% speedup.

However, relay has no such ability to support it. Because:

  1. relay defines the layout at graph level and can not get feedback from the low-level operators
  2. We must register each possible layout to the relay operator, which is impossible.


# 2. **Motivation and goals**

Relax is an effort within [TVM Unity](https://tvm.apache.org/2021/12/15/tvm-unity) that aims to evolve the graph-level IR to maximize **expressibility, performance, and portability** across today and tomorrow’s workloads. Relax has three key goals motivated by the TVM community’s needs, and lessons the community has learned in ML acceleration through years of using and developing TVM:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please shed some light on where these community needs where raised before? Is there any GitHub issue or discuss post complaining about the specific solutions provided by the proposed RFC?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +23 to +24
- **Minimize disruption:** This upstreaming should provide a **non-default** path to enable new capabilities for users/developers who are interested in what Relax brings, so it will not break the current default Relay flow.
- **Minimize complexity:** This upstreaming should reuse existing TVM/Relay infrastructure as much as possible (for example IRModule, runtime Module, TOPI library, etc.) to avoid duplicated effort and code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a project, I think it creates some challenges on how to deal with new features being proposed and being introduced in Relay, by the established set of contributors that were not involved in Relax development.

While it is clear in the RFC that Relay and Relax would share some infrastructure, what I'm taking from this RFC, is that it will only set the core components for Relax, not having feature parity with Relay at this point, nor support in other components such as all runtimes.

How the proposed plan intends to deal with such new features being added in Relay, while the intention here it to - at some point - replace Relay with Relax?

Comment on lines +91 to +96
## D0: ****Unified abstractions and optimizations across layers****

The first key design point is to allow the high-level graph IR to be able to directly interact and call into lower-level TensorIR and PackedFunc (TVM FFI).

The TensorIR PrimFunc and many external libraries adopt a **destination-passing-style** (DPS) calling convention that both input and output are passed to the function as arguments, and the outputs are mutated directly inside the function:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify on why having "a graph IR to be able to directly interact and call into lower-level TensorIR and PackedFunc" is something we can't achieve with adding this feature on existing Relay?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a significant technical reason for why such a feature would be difficult to add into Relay, which is that any call directly into a TIR function from Relay would need to add a type relation that describes the tensor shape requirements. Relax adds more capabilities for checking tensor shapes dynamically and is much more flexible in this regard than Relay. That said, I think "is making a new IR better than trying to add these features into the existing one?" is a good question to be asking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @slyubomirsky said, it's a decision if we should make a new IR or add features to Relay.

If we want to add it to relay, we should:

  1. add new AST nodes
  2. rewrite the build pipeline (i.e. we should first include tir function to the Module for further analysis)
  3. rewrite most of the passes (i.e. we need to do optimization with a Module that has both tir and relay function)

Based on that, making a new IR may be more reasonable.

Copy link

@sunggg sunggg Aug 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment! In addition to @slyubomirsky and @Hzfengsy's great points, I will share my thoughts at the perspective of optimization and compilation pipeline.

Although it might have been possible, the interaction between graph IR and TensorIR/PackedFunc has been quite tricky in Relay world. This has caused significant difficulties and non-trivial engineering efforts, IMO. Here are some representative examples:

  • In Relay, there has been no convenient way to optimize graph IR by using the feedback from low-level.
    • If TensorIR performs layout transformation for a primfunc, its decision will affect other primfuncs as well. However, Relay cannot provide such feedback back to graph IR-level since two different IRs cannot co-exist.
    • Graph-level tuning methods (e.g., TASO, Collage) need a capability to apply a set of passes to the part of the graph, compile/measure its performance, and provide the performance number as a feedback back to Graph-IR level to generate better candidates. Although this could be achieved by nontrivial engineering efforts, it would complicate the compilation pipeline and maintenance efforts. IMHO, joint-optimization across multiple graph tuners (e.g., TASO+Collage) would be practically impossible.
  • Lowering should be done at once at the boundary between Relay and TensorIR and customizing lowering has been very challenging (e.g., partial/custom lowering).
    • The main pipeline with OpStrategy has not been easy to customize and lower part of the graph for your own target, such as BYOC, while keeping other parts still in high-level IR. Therefore, people had to figure out the way to bypass it and apply their own lowering mechanism (e.g., RelayToTIR) that bypasses the main pipeline.
    • If you only want to apply certain schedule rules on the part of the Graph IR, you would need to lower those parts and apply schedule rules to them. However, such freedom has not been allowed for Relay main pipeline, so people had to find out workaround (e.g., use task extraction and find the primfunc among them. However, if extraction does not behave as users wanted, it would require extra engineering efforts).

Since Relax unifies abstraction, it can deliver those functionalities as compiler passes while providing flexibility and customizability. For example, since both high-level and low-level IRs co-exist, if TensorIR performs optimization decision that may have global effect, like layout transformation, we can rewrite the graph-level IR accordingly to express such change and consider its global implication. Also, lowering can be implemented as a RelaxIR->TensorIR transformation pass. If you want to bring your own lowering mechanism, you can write a new pass. I expect you may be able to reuse most of the lowering machinery and only change the part about "how" you want to lower.

I would be happy to discuss further if you are interested in this direction. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the points @slyubomirsky @Hzfengsy and @sunggg. I won't have the time to reply to all comments today, but wanted to come back on this thread.

IMHO, joint-optimization across multiple graph tuners (e.g., TASO+Collage) would be practically impossible.

@sunggg I know what Collage is, but I'm not familiar with what is TASO in TVM. Can you give a little context about it and what is the expected use case involving TASO and Collage?

If we want to add it to relay, we should:

  • add new AST nodes
  • rewrite the build pipeline (i.e. we should first include tir function to the Module for further analysis)
  • rewrite most of the passes (i.e. we need to do optimization with a Module that has both tir and relay function)

Based on that, making a new IR may be more reasonable.

@Hzfengsy I understand you point, but at the same time, in adding a new IR that plans to deprecate the existing IR with planned overlapping features, are we not basically doing the same things as pointed out on your list, just on a new IR perspective, which impacts - in the long term - all existing ASTs, pipelines and passes?

@YuchenJin, given the complexity of this RFC, I think it would be good to have it amended with a "Rationale and Alternatives" section, similar to what we have in the template: https://github.com/apache/tvm-rfcs/blob/main/0000-template.md#rationale-and-alternatives

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. In short, TASO is a graph-level tuning method that generates different forms of equivalent graph by trying out a set of rewriting rules (e.g., layout transformation, horizontal fusion, peephole optimization, ..). TVM provides some of those optimization heuristics, but they are not enabled by default, probably due to its potential side effects. TASO-like graph-tuners can be helpful in this direction.
TASO: https://cs.stanford.edu/~padon/taso-sosp19.pdf
Another graph tuner, but with equality saturation technique: https://arxiv.org/pdf/2101.01332.pdf

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Relay, there has been no convenient way to optimize graph IR by using the feedback from low-level.

  • If TensorIR performs layout transformation for a primfunc, its decision will affect other primfuncs as well. However, Relay cannot provide such feedback back to graph IR-level since two different IRs cannot co-exist.
  • Graph-level tuning methods (e.g., TASO, Collage) need a capability to apply a set of passes to the part of the graph, compile/measure its performance, and provide the performance number as a feedback back to Graph-IR level to generate better candidates. Although this could be achieved by nontrivial engineering efforts, it would complicate the compilation pipeline and maintenance efforts. IMHO, joint-optimization across multiple graph tuners (e.g., TASO+Collage) would be practically impossible.

It's possible for an IRModule to contain both tir::PrimFunc and relay::Function at the same time which would allow the replacement of any given Call operator inside of Relay with a lower level operator to trial as part of a scheduler - this seems to indicate it's entirely possible using an IRModule to connect higher and lower level IR unless I'm missing something?

We've previously done this in CMSIS-NN, relay::Functions become tir::PrimFuncs as part of our custom lowering in RelayToTIR which results in an IRModule with some functions lowered to TIR and some remaining in Relay.

Lowering should be done at once at the boundary between Relay and TensorIR and customizing lowering has been very challenging (e.g., partial/custom lowering).

  • The main pipeline with OpStrategy has not been easy to customize and lower part of the graph for your own target, such as BYOC, while keeping other parts still in high-level IR. Therefore, people had to figure out the way to bypass it and apply their own lowering mechanism (e.g., RelayToTIR) that bypasses the main pipeline.
  • If you only want to apply certain schedule rules on the part of the Graph IR, you would need to lower those parts and apply schedule rules to them. However, such freedom has not been allowed for Relay main pipeline, so people had to find out workaround (e.g., use task extraction and find the primfunc among them. However, if extraction does not behave as users wanted, it would require extra engineering efforts).

RelayToTIR allows a Target to customise how it decides to lower. Every Target can register a RelayToTIR hook which uses a set of compose-able Passes to create a per-Target lowering pipeline; these Targets would compose together to form the main pipeline. This indicates we have the capability to create more complex per-Target lowerings of not just operators, but also groups of operators using RelayToTIR already with the flexibility to improve any logic for a specific Target or re-use existing Passes where necessary?

The limitation I can see with this is that trying different operator strategies within the per-Target lowering isn't well supported? Could this not be supported by providing search information to RelayToTIR? 🤔

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for sharing your thoughts, @Mousius. It is interesting to learn that these problems has been attacked in BYOC world. Like @comaniac, @junrushao and others, I want to clarify I don't disagree that these functionalities can be achieved by extending current Relay framework. However, in my PoV, it is matter of amount of the engineering effort and future extensibility. Making this scale of change, which is related to the framework-level design philosophy, in the main pipeline while providing a guarantee for existing behavior and performance sounds extremely difficult, IMHO. Although I joined the Relax project in the middle, I believe these are parts of motivation to innovate new IR with those new design consideration and goals. Since Relax pipeline does not disturb Relay pipeline, I think both pipeline can evolve together.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @leandron for the suggestion! We have added the Rationale and Alternatives section to the RFC, please take a look. :)

Comment on lines +345 to +346
A **dataflow graph** means every operation inside is **side-effect free** and there are no **control flows** (such as if-then-else). A **dataflow block** is a way for us to mark the dataflow graph regions of the program in Relax. Specifically, all the operations under the dataflow block are side-effect-free and do not contain control flows (control flow is an advanced semantic that most pass writers do not consider). Outside a dataflow block, operations can contain side effects (for example doing in-place weight update during model training) and control flow. The program below is an example program that contains two dataflow blocks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for providing the context on dataflow graphs, but can you also explain what work was done to compare the cost of introducing this as a new feature in a new IR, when compared to adding it to Relay? What are the fundamental issues?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Determining whether a given function is pure and control flow-free would not be difficult to do in Relay, or even marking portions of functions that are pure and control flow-free, would not be hard, but might require a new AST construct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think relay is pure except for Refs. And in practice they are not used because they are poorly supported by the compiler.

Copy link

@slyubomirsky slyubomirsky Aug 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Refs and a few operators like dropout. The main reason for Ref to exist is for AD and training, so I imagine compiler support for such features would improve if training becomes a more important usage for TVM.

Comment on lines +682 to +683
- AOT: AOT compilation has a wide range of benefits such as being more space efficient and is necessary for resource-constrained projects like uTVM. We are committed to continuously supporting the AOT compilation in Relax, and there is an ongoing effort to connect Relax to the current AOT executor.
- BYOC: We will try to reuse the existing translation spec. In Relax, BYOC can be formalized as a pass that call external packed functions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this RFC proposed an alternative to a fundamental part in TVM, it would be good to have some idea on how BYOC and AoT will be supported, as well as some idea of the timelines.

I understand that at the present RFC, Relax intends to be an optional compilation flow, but I think we can agree that we don't intend to maintain two overlapping IRs long term, as it would become too costly in terms of infrastructure.

Based on that, can you specifically clarify on whether there is an expected timeline for AoT and BYOC to be brought into Relax?

Copy link
Member

@Hzfengsy Hzfengsy Aug 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BYOC and AOT are explained here. It's good to copy the content to this RFC

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the BYOC integration perspective, users will need to bring the codegen that converts Relax ops to BYOC representation (e.g., JSON) while re-using the runtime. We have a demo with TensorRT: https://github.com/tlc-pack/relax/tree/relax/src/relax/backend/contrib

Relax would provide more unified and organized support for BYOC. In Relay, although the functionalities are provided, IMHO, it has been quite fragmented so it has been tricky to use multiple BYOCs together with TVM's pipeline. So, in relax, our goal is to
(1) Support the existing functionalities
(2) Organize and simplify BYOC workflow

@YuchenJin can provide the timeline for this.


This RFC only focuses on the foundation part of Relax. After it, we will incrementally incorporate the additional capabilities and features. Relax aims to achieve parity with the functionality provided by Relay: this means that workloads which are functional on Relay will also be functional on Relax, even though the infrastructure underneath may change.

Future plans that we will bring in future RFCs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently all the framework frontends support Relay. What's the plan with regards to Frontends and Relax? I think it should be covered in this section.

Comment on lines 592 to 597
- `relax.runtime.builtin.alloc_heap(size) -> heap`: Allocate the heap (an NDArray) with a specific size to execute shape computation

(We can use `alloc_tensor` to achieve the same goal)

- `relax.runtime.builtin.store_shape(shape, heap, idx0, ...)`: Store a shape into specific indices in the shape heap.
- `relax.runtime.builtin.load_shape(heap, idx0, ...) -> shape`: Construct a shape from the shape heap according to the indices.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these runtime features seem pretty much Runtime independent, meaning they could be implemented in the existing runtimes. Taking Relay VM as an example, why couldn't this be implemented as a feature in the existing runtime?


# 5. **Future work**

This RFC only focuses on the foundation part of Relax. After it, we will incrementally incorporate the additional capabilities and features. Relax aims to achieve parity with the functionality provided by Relay: this means that workloads which are functional on Relay will also be functional on Relax, even though the infrastructure underneath may change.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for clarity: is the ambition for Relax to completely replace Relay at some point?

If yes, when in terms of timeline would you see Relax being feature compatible with Relay?


# 1. **Summary**

This RFC proposes to upstream the core foundation of Relax (Relay Next). Relax is a new graph-level IR that enables new capabilities to address the [critical needs](https://discuss.tvm.apache.org/t/establish-tvm-unity-connection-a-technical-strategy/13344) identified by the TVM community over the years of using and developing deep learning compilers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this paragraph a bit more specific to list the critical needs raised by the community, that are addressed by Relax?

Reading the link attached, it is not clearly describing the need, as it is also a description of features and milestones in a broader context than Relax.

@YuchenJin
Copy link
Author

Hi @leandron, thanks for your feedback! :)

We share a common goal of minimizing disruption while incrementally improving TVM. One of the main questions is how to bring in the improvements. That’s indeed what we have carefully thought about.

One thing we found in building the unity connection is that we need to co-design things together. Take first-class symbolic shape support as an example:

Suppose we want to apply a flatten operation to flatten a tensor, here is the symbolic shape representation in Relax:

b: Tensor[(m, 224)]
a: Tensor[(m * 224, )] = flatten(b)

In Relay, we have ? to denote an unknown dimension. So the above program in Relay is:

b: Tensor[(?, 224)]
a: Tensor[(?, )] = flatten(b)

Without symbolic shape, we lose the shape relation between tensor a and tensor b, which prevents good optimization opportunities for example memory planning to reuse the memory between a and b since we know at compile time they occupy the same amount of memory.

Supporting this requires introducing native symbolic shape support as opposed to a separate mechanism. It's worth noting that from the example above, the shape mechanism in Relax is very different from what we currently have in Relay.

Additionally, the other improvements also benefit from first-class symbolic shape. For example, call_tir signature has output_shape which can be represented by symbolic shape (since TensorIR supports symbolic shape), and TE language is based on symbolic shape hence we can directly generate PrimFunc with emit_te (see Relax-TE integration section). Introducing each component separately will make the design less cohesive.

Directly introducing this symbolic shape support to existing IR would mean one-stop transition to the current Relax, which is not incremental improvements as we hope for.

Relax can be viewed as complementary to Relay. Relay focuses on high-level op transformations, while the current Relax passes focus on TIR-graph co-transformations that can enable flexible fusion and layout rewrite, which is hard to achieve in Relay.

As of this RFC, we do not seek to change the default build pipeline or replace Relay. In this RFC, we only introduce Relax as an optional component for those community members who need it. It is a common practice in other ML frameworks, for example, PyTorch brought in TorchFX as an optional (vertical) component to support graph exporting, while maintaining TorchScript. We totally agree that TVM default flow evolution is important, and we should carefully discuss that with the community in future RFCs. Evolving default build has been briefly discussed in Establish TVM Unity Connection — A Technical Strategy, and there will be an upcoming tvm.compile RFC to discuss the long-term strategy to consolidate default build flows in TVM.

Copy link
Contributor

@ekalda ekalda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the RFC, Relax is an impressive piece of work and the features introduced here would be a great addition to TVM's compilation flow!

I will need to spend some more time wrapping my head around Relax, but I did an intial pass and left some comments. I tried not repeating Leandro too much in this review, but I want to make it clear that I do share the same concerns.

I think Relax is very cool (I especially like the dynamic shape support), but from this RFC it is not clear to me what are the technical reasons for not extending Relay with the new features developed through prototyping Relax. A new IR that doesn't represent a different level of abstraction comes with reputational and technical cost, so it is something that should not be taken lightly.

One more thing - I've been mostly looking at TVM from a hardware support perspective, so I'd be interested to see a more concrete example of a problem that Relax solves that in no way can be solved through improvements on Relay, e.g. a network that we can't currently compile/schedule effectively.


### ****call_tir****

In Relax, we introduce `call_tir` to bridge graph-level IR and TIR. `call_tir` is an intrinsic that calls a TIR PrimFunc (that follows DPS) and returns the output. The semantics of `call_tir` can be demonstrated by the code below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can already have Relay and TIR in the same IRModule, so considering the question of dynamic shape support orthogonal to it, what new functionality does call_tir add?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call_tir is not only designed for dynamic shape support. It enables optimization/transformations of an IRModule for both GraphIR and TensorIR. We do support having Relay and TIR in the same IRModule, but we can not optimize them together.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different to checking if a Call is to a tir::PrimFunc ?

Copy link

@slyubomirsky slyubomirsky Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call_tir allows a PrimFunc to be called from a Relax program. It allocates the destination tensor first and invokes the PrimFunc in destination-passing style. It's implemented as an operator in Relax, so it's a specific kind of call. Normal Call nodes do not perform that allocation behavior and do not assume a destination-passing style convention (call_tir is really just syntactic sugar for performing the allocation and making the call--as long as the PrimFunc has a global symbol, it can be called from Relax using the ExternFunc construct in the AST).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, could this not be solved using a utility function that returns the relevant nodes?

fn call_and_allocate(prim_func: PrimFunc):
  return SeqStmt[Allocate(), Call(prim_func)];

It'd be good not to have this ambiguity where call_tir is a specific way of calling a tir.PrimFunc rather than a general way to call TIR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main idea is to abstract over memory allocation in the front-end language. Later stages of compilation can, in principle, canonicalize call_tir into the form you describe.


This cross-level interaction unlocks many interesting things that were not possible before, including, but not limited to:

- Incrementally lower different parts of a program using different strategies, instead of lowering the entire program to TIR directly from Relay as today.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate more on this? We already can lower separate subgraphs using different strategies

Copy link

@sunggg sunggg Aug 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment! Yes, it has been supported in Relay but there are some nontrivial limitation, imo.

  • (1) Relay main pipeline lowers every Relay IRs into TIR at once at their IR boundary. This makes partial lowering (lower only part of the graph) difficult in the main pipeline.
  • (2) Relay main pipeline supports lowering with OpStrategy. However, it is not necessarily easy to customize it (custom lowering)

For these reasons, people introduced RelayToTIR and RelayToRuntime that essentially bypass the main pipeline. Although it enables the functionalities people want, IMHO, it might not be easy to maintain them as a framework and it is not easy if you want to leverage multiple lowering strategies in the incremental way. Therefore, Relax wants to tackle down this problem and provide such supports in an organized systematic way. For example, since Relax provides unified abstraction, we can introduce GraphIR->TIR transformation into the pipeline and this is essentially what lowering does. Thus, by introducing such mechanism as a Relax->TIR transformation pass, Relax can bring those functionalities into the main pipeline in a customizable manner. You can also easily conduct partial lowering within a pass since you have a full control. We expect users may be able to reuse most of lowering machinery since most of times, you just want to change "how to lower" part.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment! Yes, it has been supported in Relay but there are some nontrivial limitation, imo.

  • (1) Relay main pipeline lowers every Relay IRs into TIR at once at their IR boundary. This makes partial lowering (lower only part of the graph) difficult in the main pipeline.
  • (2) Relay main pipeline supports lowering with OpStrategy. However, it is not necessarily easy to customize it (custom lowering)

For these reasons, people introduced RelayToTIR and RelayToRuntime that essentially bypass the main pipeline. Although it enables the functionalities people want, IMHO, it might not be easy to maintain them as a framework and it is not easy if you want to leverage multiple lowering strategies in the incremental way. Therefore, Relax wants to tackle down this problem and provide such supports in an organized systematic way. For example, since Relax provides unified abstraction, we can introduce GraphIR->TIR transformation into the pipeline and this is essentially what lowering does. Thus, by introducing such mechanism as a Relax->TIR transformation pass, Relax can bring those functionalities into the main pipeline in a customizable manner. You can also easily conduct partial lowering within a pass since you have a full control. We expect users may be able to reuse most of lowering machinery since most of times, you just want to change "how to lower" part.

I've put a reply to this above but I'll quickly summarise in response here. RelayToTIR works together with the main pipeline to allow partial lowering of a Target, it runs before the main TECompiler and therefore seems to be exactly what you're describing as Relax->TIR? We leave the module in a partial TIR/partial Relay state for the pipeline to continue lowering.

I do agree RelayToRuntime essentially bypassing the entire compiler is not particularly helpful for re-use, this was a motivation for creating RelayToTIR in the first place.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to related to my reply above: #89 (comment)

- Incrementally lower different parts of a program using different strategies, instead of lowering the entire program to TIR directly from Relay as today.
- Allow for more customized optimizations, such as whole program optimizations, cascading, and other post-schedule optimizations.
- Enable automation (MetaSchedule) to analyze call_tir nodes and the callee TIR programs, perform optimizations and rewrites to one or more call_tir nodes, thus feeding decisions such as layout rewrite directly to the high-level IR.
- By turning subgraphs into calls to PackedFunc (via call_dps_packed), BYOC becomes an IRModule ⇒ IRModule transformation as a natural part of compilation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some detail on the new BYOC would be appreciated as well - what does the Relax BYOC offer in terms of functionality that the current one doesn't?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me start with my comments for the previous comment :)

Relax would provide more unified and organized support for BYOC. In Relay, although the functionalities are provided, IMHO, it has been quite fragmented so it has been tricky to use multiple BYOCs together with TVM's pipeline. So, in relax, our goal is to
(1) Support the existing functionalities
(2) Organize and simplify BYOC workflow

Personally, my ultimate goal is to unlock Relax to allow the easy customization of both TVM internal and BYOC pipeline. For example, we may want to rewrite the graph IRs while considering low-level optimizations for TIR and multiple BYOC choices. (e.g., for Matmul+add, we have bunch of choices: Tune with MetaSchedule, use BYOC like TensorRT, Cutlass, Cublas...). And some of recent BYOCs, such as Cutlass, are tunable. To enable this functionalities, pipeline should be easy to customize, straightforward, and maintainable for the future extension.

This has been difficult to achieve in Relay for many reasons: for example, inflexible lowering, each BYOC has its own greedy partitioning without considering other potential BYOCs, no interaction between BYOC and main pipeline, etc.

In relax, although it is still work-in-progess, we've seen some great signals. I would be happy to discuss further if you are interested in.


# 4. **Reference-level explanation**

To achieve the design points described in the last section, this RFC focuses on how to build a **end-to-end MVP** (Minimum Viable Product) which allows the users to construct an end-to-end model (represented by IRModule), transform/build the IRModule, and run the execution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest, can you bring an example where this kind of workflow is applicable? Are the users supposed to write their models directly in Relax? Where should the weights data come from?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although Relax allows users write their models directly (you can bind params as well: example), the main path would be frontends for Torch, TF, ONNX,... as we do in Relay.

Comment on lines +386 to +388
## 4.1 Relax AST

To support the key design points in the last section, Relax introduces the following constructs to the AST. In the meantime, we reuse `RelayExpr`, `Call`, `Constant`, `Tuple`, `If`, `Op`, `GlobalVar`, `TupleGetItem` in Relay.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second to that - if Relax is a superset of Relay, why don't we extend Relay?

<img src='../resources/relax-e2e-flow.png' width='600'>
</p>

## 4.1 Relax AST
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have a separate RFC for each one of the new nodes presented here, it would make having discussions easier, considering how much discussion we got out of adding one IR node (tir.constant).

- Do shape calculations via shape heap (an internal NDArray) manipulation.
- Suppose Tensor A's shape is (m, n) at compile time, and in the Relax program we want to compute (j, k) = (m+1, n+1). At runtime, A's shape will be stored in index 0 and index 1 of a shape heap(which is a TVM NDArray) via calling the vm builtin function `store_shape(A.shape)`. m+1 and n+1 will be computed by a TIR Primfunc generated in the shape lowering pass, and j and k will be stored at index 2 and 3 of the shape heap. Please refer to the shape lowering pass in the next subsection for more details.

As future plan, we will consolidate Relay VM and Relax VM, and integrate Relax with the AOT executor (see Section 5).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the VM executor very well, but the impression I get is that Relax VM won't be hugely different from Relay VM, so why don't we immediately consolidate the two, avoiding the awkward situation of having to maintain two very similar executors in the stack?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not as familiar with the Relay VM, but perhaps it might indeed be feasible to combine the VMs (have a single bytecode language) and use it for both compilers. How much additional work would it take compared to the current Relax VM prototype @YuchenJin et al?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we could probably leverage some of the existing VM infra, i.e. the way we manage the frames, the bytecode de/serialization mechanism, and likely the memory pool, etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ultimate goal is to consolidate the Relay VM and Relax VM, which requires work such as refactoring the Relay VM codegen. As a first step we bring the new vm in — which has already reused a bunch of insights from the Relay VM.


## 4.5 PR list

We plan to split the upstreaming into the following manageable PRs for TVM community review:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing a separate compiler into the source tree based on one RFC seems like too much in one go. I'd expect to see one RFC for each Relax feature.

@Hzfengsy
Copy link
Member

Thanks @leandron and @ekalda for the comments. We all agree that we are trying to improve the graph-level IR of TVM while the controversial point is that if we can enhance relay to support features from relax. Let's discuss it directly and focus on the technical points themselves.

First of all, I'd like to list some most critical features that relax want to introduce:

  1. Dynamic shape support, to be specific symbolic shape representation;
  2. A representation for TVMUnity, i.e. a cross-layer abstraction for optimization;
  3. Customizable compilation flow and operator support.

In my opinion, it's hard to incrementally update relay to support them.

G1: Dynamic shape support

To be specific, relax can represent and deduce symbolic shape rather than use Any. However, if we introduce dynamic shapes to relay, there will be two competing repr for shapes (symbolic shape and Any), which makes it undesirable.

G2: A representation for TVMUnity

TVMUnity is an important feature for unified optimization for graph, tensor computation, and libraries. The build flow of relay is a one-way path: relay->tir/libraries->runtime module, while TVMUnity enables IRModule(graph+tir+libraries)->IRModule transformations, which gives users more flexibility to choose the backend (use codegen or call libraries) even after tuning. I'm not sure if it's possible for relay if we still keep the original workflow.

G3: Customizable compilation flow and operator support.

Customizing operators and backends are really common in production. There are 7 steps to add a new operator to relay. However, we only need 2 steps in relax:

  1. write how the op is computed (both tir or libraries are good),
  2. use call_tir to represent it in IRModule

Additionally, other compilation customization skills (e.g. BYOC, AOT, customized fusion) are also more straightforward with relax. Please see the TVM Unity Connection

In short, a new IR is a reasonable way to support the above features IMO. And I'm open to hearing more ideas from the community.

class Expr(BaseExpr):
"""This is RelayExpr, but we add a shape_ field."""
checked_type_: Type
shape_: ObjectRef
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above shape_ is defined as an (implicitly optional) Expr but here it is defined as ObjectRef. Is this because you want it to be implicitly optional? If, so I'd recommend explicitly optional instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tkonolige, it's a great question! Expr->shape_ should be an Expr, ObjectRef is used here in its definition to prevent cyclic typing.

return gv0
```

While the text format type annotation `lv0: R.Tensor[(n, 4), "float32"]` shows the shape of each value, this is only syntactic sugar. From the IR’s point of view, the `shape_` field `(n, 4)` is not included in the type signature of `lv0`. The type signature of `lv0` is `DynTensor(rank=2, dtype="float32")`, and the shape is a special value field that is attached to each `Expr`. We made this explicit choice to simplify the type inference so that we do not need to get into the [dependent typing](https://en.wikipedia.org/wiki/Dependent_type) land where type depends on value (shape in our case) which requires heavier machinery to handle.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This document contains no section about typing or type inference. How does it work? Would it be worthwhile to add a section about it?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am working on language specifications, though I will defer to @YuchenJin as to whether they should be part of the RFC text. We actually have some unresolved debates on how it should work: tlc-pack/relax#222

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could make sense to include some broad notes about how typing is intended to work in Relax. Overall, the type system is simpler than in Relay: Tensor types track only the rank and dtype of tensors (both optional) and shape is approached separately from type.

@comaniac
Copy link
Contributor

Thanks fo the RFC. Although I didn't involve the actual Relax development, I've been attended the weekly open design review meeting for a while and I'm glad that I could share our experience to help improve the Relax design. Thus, I don't have specific questions to the design.

Regarding to the point that mentioned above about whether we really need a brand new IR to replace Relay, in fact, we at AWS already attempted to build a compiler-based training framework, RAF, by extending Relay. Based on our experience working on RAF for the past 1.5-2 years, we agreed with the Relax group that Relay does have some limitations in its design, and these limitations prevent us from easily adding some new features such as dynamic shape, flexible fusion, etc. Note that I'm not saying it's impossible to implement these feature by extending Relay. My point is even it's possible, it is hard to keep a clean system/infra without a large scale refactoring. To me, it is even safer to build a new infra from scratch, so that existing workloads and use cases won't be affected at all. This is also the reason why we noticed Relax and tried our best to involve the early stage design in the first place.

Meanwhile, in terms of maintaining two IRs, I don't really think this would add many overheads to the community, because these two IRs are basically independent and can be developed separately. In other words, it's up to the developers to work on Relay or Relax. As long as there's still a number of Relay developers in the community, we shouldn't deprecate it. On the other hand, if people found that Relax is better over time, then developers will gradually move to Relax. Until then, we will bring the Relay deprecation on the table, just like nnvm in the past.

Copy link
Member

@zhiics zhiics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the RFC, I enjoyed reading through. May be we can summarize the list of passes/features/code bases that can be shared or reused between Relay and Relax since there are some concerns about the migration and how much effort they need to familiarize the new frontend. I believe this is essentially not an issue if we can make Relax backward compatible and the migration transparent to downstream users/deployment. Note the migration here involves functionality and performance.

rfcs/0089-relax-upstreaming.md Show resolved Hide resolved
rfcs/0089-relax-upstreaming.md Show resolved Hide resolved
rfcs/0089-relax-upstreaming.md Show resolved Hide resolved
rfcs/0089-relax-upstreaming.md Outdated Show resolved Hide resolved
- Do shape calculations via shape heap (an internal NDArray) manipulation.
- Suppose Tensor A's shape is (m, n) at compile time, and in the Relax program we want to compute (j, k) = (m+1, n+1). At runtime, A's shape will be stored in index 0 and index 1 of a shape heap(which is a TVM NDArray) via calling the vm builtin function `store_shape(A.shape)`. m+1 and n+1 will be computed by a TIR Primfunc generated in the shape lowering pass, and j and k will be stored at index 2 and 3 of the shape heap. Please refer to the shape lowering pass in the next subsection for more details.

As future plan, we will consolidate Relay VM and Relax VM, and integrate Relax with the AOT executor (see Section 5).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we could probably leverage some of the existing VM infra, i.e. the way we manage the frames, the bytecode de/serialization mechanism, and likely the memory pool, etc.

rfcs/0089-relax-upstreaming.md Show resolved Hide resolved
rfcs/0089-relax-upstreaming.md Show resolved Hide resolved
rfcs/0089-relax-upstreaming.md Show resolved Hide resolved
@junrushao
Copy link
Member

Thank you @leandron @ekalda for the questions, and @zhiics, @slyubomirsky, @Hzfengsy, @sunggg for the discussion!

As a long-term contributor since 2018, the pre-Relay era, and the initiator and top 2 contributors of RAF (https://github.com/awslabs/raf/), the TVM-based training framework, I would love to share my perspective and slight concern about the TVM development at this moment, 2022.

While being a decent auto-tuner for static shape workloads, and the latest work with auto tensorization further boosted its performance with microkernel tuning, there has been strong demand from the community to allow TVM to do more, which as @YuchenJin listed, includes:

  • Unified abstraction
  • Dynamic shape support
  • Dataflow block and first-class side effect handling
  • Non-inference workloads

As a community, we do encourage everyone to understand different perspectives and empower each other, and I believe this is the way for us to grow.

Technically, just wanted to address a meta question here: why is it less feasible to gradually upgrade Relay?

  • Conflicted design philosophy: Relax follows a completely different design than Relay with mutually conflicting assumptions and ideas. For example, having two conflicting shape mechanisms in the system would effectively mean passes have to handle both of them.
  • Engineering challenge: design difference leads to hurdles for incremental updates. For example, if we want to move away from the assumption that the IR is side effect-free, all the passes with the old assumption become automatically invalid or wrong because the assumption is not respected.
  • Stability concern: Even if we do surgical incremental enhancement to Relay by introducing breaking changes piece by piece, there is still stability concern. Consider a case where there are downstream vendors whose forks depend on upstream Relay, and Relay’s assumptions break over time, it would be less stable for them to maintain Relay.

Alternatively, we believe having Relax as a separate pass is a cleaner and more maintainable approach - gradually bringing some of the passes from the bottom is engineeringly incremental and guarantees that the Relay code path is always stable.

@masahi
Copy link
Member

masahi commented Aug 23, 2022

@YuchenJin

Relax can be viewed as complementary to Relay. Relay focuses on high-level op transformations, while the current Relax passes focus on TIR-graph co-transformations that can enable flexible fusion and layout rewrite, which is hard to achieve in Relay.

I like this separation of work between Relay / Relax. We have many Relay passes that work all right and for which it doesn't make a lot of sense to reimplement in Relax.

But if Relax is supposed to be complementary to Relay, is the name "Relax", as "Relay Next", still a good choice? "Relay Next" strongly suggests that Relax is something that is going to replace Relay, like we did for nnvm. I'm still not entirely clear if the plan is to eventually deprecate Relay, or Relay and Relax are going to coexist for foreseeable future.

During lowering, this call won't get translated into destination passing style, because it is impossible to obtain the shape information and pre-allocate the memory. Instead, they are directly translated to calls that allocate and return the result tensor.

- `R.unique` can be mapped to a runtime PackedFunc calls that takes in an NDArray x and perform an unique operation.
- We can even dispatch to common runtime libraries such as `torch.unique`, for exmaple the above `R.unique(x)` can be lowered to `call_packed(”torch.unique”, x)`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean all data-dependent dynamic ops need to have runtime packed functions, for all targets?

Even in Relay / TE we can implement unique / NMS perfectly fine and we don't need special runtime support. We do that by essentially treating unique and NMS etc as a static op and pushing all dynamism handling to dynamic strided_slice.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great question! This does not mean data-dependent ops are required to be implemented using PackedFunc.

In Relax, we also support implementing dynamic ops using TE/TIR. For example, we can implement unique by splitting the operation into multiple phases (first outputting a tensor of the same size as the input data) like you said, and this is supported by EmitTE.

One way to quickly support operators in Relax is to fall back to third-party framework libraries, for example by calling torch.unique as mentioned in this paragraph. With the first-class support of calling PackedFunc in the graph IR (via call_dps_packed and call_packed), we can generate direct calls to these third-party libraries to get immediate op coverage.


We can introduce three builtin functions in the runtime:

- `relax.runtime.builtin.alloc_heap(size) -> heap`: Allocate the heap (an NDArray) with a specific size to execute shape computation
Copy link
Member

@masahi masahi Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to use a different term than "heap", since it is used very differently from the conventional CS sense ("heap" as in stack / heap).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, it makes sense!

I updated the "shape heap" to "shape tensor" in the RFC, and mentioned alloc_shape_tensor can be merged with alloc_tensor to achieve the same goal.

Copy link
Member

@Mousius Mousius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, just wanted to address a meta question here: why is it less feasible to gradually upgrade Relay?

Hi @junrushao, I agree this is the meta question here, and one which seems to be highlighting the fragility of TVM itself which leads it to resist such incremental change.

  • Conflicted design philosophy: Relax follows a completely different design than Relay with mutually conflicting assumptions and ideas. For example, having two conflicting shape mechanisms in the system would effectively mean passes have to handle both of them.

As far as I can tell from the discussion, it's entirely possible to begin replacing the Any type with more detailed dynamic shape information? Making breaking changes within software is a good thing as long as we provide users information in release notes etc.

  • Engineering challenge: design difference leads to hurdles for incremental updates. For example, if we want to move away from the assumption that the IR is side effect-free, all the passes with the old assumption become automatically invalid or wrong because the assumption is not respected.

Our test cases should be sufficient to ensure that the passes which use the type system or make specific assumptions aren't broken? If we haven't put in place sufficient testing then how do we know that the passes work as expected right now? 😸

  • Stability concern: Even if we do surgical incremental enhancement to Relay by introducing breaking changes piece by piece, there is still stability concern. Consider a case where there are downstream vendors whose forks depend on upstream Relay, and Relay’s assumptions break over time, it would be less stable for them to maintain Relay.

Provided we document the changes in release notes, anyone consuming our packages will be able to update when they have capacity to do so and integrate the new version? We should consider how we do this incremental roll out between versions so as each release only requires a reasonable amount of changes to integrate rather than trying to avoid breakage altogether.

Alternatively, we believe having Relax as a separate pass is a cleaner and more maintainable approach - gradually bringing some of the passes from the bottom is engineeringly incremental and guarantees that the Relay code path is always stable.

This is incremental from the point of view of Relax but not of TVM as a whole, with this approach it seems consumers will eventually get to a point where they must change entirely from Relay to Relax which will be a far more disruptive change.


# 2. **Motivation and goals**

Relax is an effort within [TVM Unity](https://tvm.apache.org/2021/12/15/tvm-unity) that aims to evolve the graph-level IR to maximize **expressibility, performance, and portability** across today and tomorrow’s workloads. Relax has three key goals motivated by the TVM community’s needs, and lessons the community has learned in ML acceleration through years of using and developing TVM:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify what you mean when you say that Relax maximises expressibility as opposed to Relay? My assumption is that this is the TVMScript addition but that could equally have been done for Relay 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Mousius, the expressibility here does not mean the support for TVMScript. It means the ability of the IR to express workloads. For example, we need to be able to express symbolic shapes in the IR to support and optimize dynamic shape workloads; we need to be able to express side effects to support training workloads that make inplace updates to the model’s weights during backpropagation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that training is not in scope of this RFC, can we hold off on adding side effect semantics to a future RFC? That leaves us with just optimizing for the dynamic shape workload.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side effects could be useful for other purposes, like printing and assertions (both of which we should have had in Relay, IMO). It's also been noted that side effects could be useful for importing things like in-place operations from PyTorch and other frameworks.

Comment on lines +91 to +96
## D0: ****Unified abstractions and optimizations across layers****

The first key design point is to allow the high-level graph IR to be able to directly interact and call into lower-level TensorIR and PackedFunc (TVM FFI).

The TensorIR PrimFunc and many external libraries adopt a **destination-passing-style** (DPS) calling convention that both input and output are passed to the function as arguments, and the outputs are mutated directly inside the function:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Relay, there has been no convenient way to optimize graph IR by using the feedback from low-level.

  • If TensorIR performs layout transformation for a primfunc, its decision will affect other primfuncs as well. However, Relay cannot provide such feedback back to graph IR-level since two different IRs cannot co-exist.
  • Graph-level tuning methods (e.g., TASO, Collage) need a capability to apply a set of passes to the part of the graph, compile/measure its performance, and provide the performance number as a feedback back to Graph-IR level to generate better candidates. Although this could be achieved by nontrivial engineering efforts, it would complicate the compilation pipeline and maintenance efforts. IMHO, joint-optimization across multiple graph tuners (e.g., TASO+Collage) would be practically impossible.

It's possible for an IRModule to contain both tir::PrimFunc and relay::Function at the same time which would allow the replacement of any given Call operator inside of Relay with a lower level operator to trial as part of a scheduler - this seems to indicate it's entirely possible using an IRModule to connect higher and lower level IR unless I'm missing something?

We've previously done this in CMSIS-NN, relay::Functions become tir::PrimFuncs as part of our custom lowering in RelayToTIR which results in an IRModule with some functions lowered to TIR and some remaining in Relay.

Lowering should be done at once at the boundary between Relay and TensorIR and customizing lowering has been very challenging (e.g., partial/custom lowering).

  • The main pipeline with OpStrategy has not been easy to customize and lower part of the graph for your own target, such as BYOC, while keeping other parts still in high-level IR. Therefore, people had to figure out the way to bypass it and apply their own lowering mechanism (e.g., RelayToTIR) that bypasses the main pipeline.
  • If you only want to apply certain schedule rules on the part of the Graph IR, you would need to lower those parts and apply schedule rules to them. However, such freedom has not been allowed for Relay main pipeline, so people had to find out workaround (e.g., use task extraction and find the primfunc among them. However, if extraction does not behave as users wanted, it would require extra engineering efforts).

RelayToTIR allows a Target to customise how it decides to lower. Every Target can register a RelayToTIR hook which uses a set of compose-able Passes to create a per-Target lowering pipeline; these Targets would compose together to form the main pipeline. This indicates we have the capability to create more complex per-Target lowerings of not just operators, but also groups of operators using RelayToTIR already with the flexibility to improve any logic for a specific Target or re-use existing Passes where necessary?

The limitation I can see with this is that trying different operator strategies within the per-Target lowering isn't well supported? Could this not be supported by providing search information to RelayToTIR? 🤔


This cross-level interaction unlocks many interesting things that were not possible before, including, but not limited to:

- Incrementally lower different parts of a program using different strategies, instead of lowering the entire program to TIR directly from Relay as today.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment! Yes, it has been supported in Relay but there are some nontrivial limitation, imo.

  • (1) Relay main pipeline lowers every Relay IRs into TIR at once at their IR boundary. This makes partial lowering (lower only part of the graph) difficult in the main pipeline.
  • (2) Relay main pipeline supports lowering with OpStrategy. However, it is not necessarily easy to customize it (custom lowering)

For these reasons, people introduced RelayToTIR and RelayToRuntime that essentially bypass the main pipeline. Although it enables the functionalities people want, IMHO, it might not be easy to maintain them as a framework and it is not easy if you want to leverage multiple lowering strategies in the incremental way. Therefore, Relax wants to tackle down this problem and provide such supports in an organized systematic way. For example, since Relax provides unified abstraction, we can introduce GraphIR->TIR transformation into the pipeline and this is essentially what lowering does. Thus, by introducing such mechanism as a Relax->TIR transformation pass, Relax can bring those functionalities into the main pipeline in a customizable manner. You can also easily conduct partial lowering within a pass since you have a full control. We expect users may be able to reuse most of lowering machinery since most of times, you just want to change "how to lower" part.

I've put a reply to this above but I'll quickly summarise in response here. RelayToTIR works together with the main pipeline to allow partial lowering of a Target, it runs before the main TECompiler and therefore seems to be exactly what you're describing as Relax->TIR? We leave the module in a partial TIR/partial Relay state for the pipeline to continue lowering.

I do agree RelayToRuntime essentially bypassing the entire compiler is not particularly helpful for re-use, this was a motivation for creating RelayToTIR in the first place.


### ****call_tir****

In Relax, we introduce `call_tir` to bridge graph-level IR and TIR. `call_tir` is an intrinsic that calls a TIR PrimFunc (that follows DPS) and returns the output. The semantics of `call_tir` can be demonstrated by the code below.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different to checking if a Call is to a tir::PrimFunc ?


We plan to split the upstreaming into the following manageable PRs for TVM community review:

- Relax IR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relax IR really needs an independent RFC which details the dynamic type system so we can discuss methods of integrating it directly into Relay, this seems to be one of the larger technical hurdles within this RFC (though I agree with @ekalda that really we need RFCs for each independent architectural piece here)

Comment on lines +667 to +674
- Relax IR
- Relax VM
- BlockBuilder
- ExprFunctor/ExprVisitor/ExprMutator/IRFunctor
- Relay → Relax translator
- Minimum build (4 passes)
- VM Codegen
- E2E model compilation + execution
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upstreaming plan should be represented in the Relax Roadmap (https://github.com/apache/tvm-rfcs/blob/main/rfcs/0069-relax-roadmap.md)? Which should also contain references to these more detailed RFCs.

@tqchen
Copy link
Member

tqchen commented Aug 23, 2022

Thank you, everyone, for the discussions here. Let us take a step back and look at the non-technical parts of the conversation. A lot of our discussions come from two goals:

  • G0: Maintaining a stable evolution solution for some of our common use-cases
  • G1: Welcome new improvements, land our technical commitment timely, continue to reinvent ourselves, and welcome new community members who have new use cases.

Both goals are very important. G0 ties to our ability to continuously support our current use cases. G1 is also essential to our viability as a solution, so we can grow as a community and stay competitive in a fast-evolving machine learning compilation landscape.

Enabling both has always been an important theme of long-living projects. Deep learning frameworks are a common reference to refer back to. Usually, they are done in roughly three phases:

  • S0: Introduction of a new feature/component as an optional module.
  • S1: Evolving the overall solutions to make use of the new component.
  • S2: Consider deprecation of some of the existing solutions, or evolve the solutions for a consolidation point.

Each stage contains a different level of commitment and would normally entail different levels of gating criteria as we look at them.

For example, PyTorch introduced TorchFX as an optional module that supports graph tracing and export. It had some overlapping capabilities with TorchScript. The PyTorch community is collectively evolving some of the compilations (TorchDynamo) to make use of FX. As of now, there is not yet an announcement of S2 from the community.

Encouragement of S0 and making it easy to do helps us to enable G1. A too high barrier here can discourage community contributions and result in mainline lacking the latest features and short-living our competition. This is especially important given that the land of machine learning compilation still remains open, and the ability to timely support symbolic shape and training helps bring in users and contributions who would otherwise turn to alternatives.

G0 is equally important here. In many cases, they boil down to making careful and informed decisions regarding evolution (S1 and S2). Additionally, making sure that at S0 stage, there is a limited disruptive change to the existing infrastructure. Importantly, not every module/feature has to go through all stages. And in common practices, the decisions in each stage are usually not made at the same time.

We can find examples of S0 cases in TVM as well. For example, USMP was currently designed for specific cases like AOT. We welcomed these improvements to unblock needs in embedded settings early. Through USMP we found the need of tir.alloc_const, which related to evolving on existing infra(S1). As a result, we had a more in-depth discussion. Additionally, we are bringing the effort to further enable USMP in a broader setting as part of S1. At some point, we might consider consolidating all memory allocations as S2 – note that many community members are collectively working toward that goal, but we are not yet at a point to make such a decision. As another example, we enabled cascaders that are specifically designed for micro-NPU, which had some domain overlapping with the arithmetic affine module, but nevertheless bought in without consolidation because we believed that there is enough interest and maintenance support for the module. Finally, the unpacked_api was specifically enabled for extremely low-resource settings, and we enabled S0 level inclusion despite some inconsistency with the packed func API.

Of course, we do not want to enable random things in the codebase, which ties back to the maintenance overhead concern. One of the questions we want to ask here is whether the module contains enough support from the community that allows continued maintenance. Additionally, we should consider the fact of added engineering support by welcoming additional community members who are interested in the needs and would otherwise look elsewhere.

Our overall thought process and decision time point for each stage can be different – they should be so we can enable both G0 and G1. Nor do all modules have to go through all the stages.

For S0, we would expect if there are enough champions in the community with a self-contained plan. For important features, we would expect, say, more than three committers who can champion the module and significant community support to maintain them. Additionally, S0 should be made as minimally disruptive (wrt to the current infrastructure) as possible. To encourage G1, we can overlook some levels of duplications (just like the TorchFX and TorchScript case, USMP, and other allocators when they land as S0), considering the additional community support we get to maintain them.

S1 and S2 would involve more careful discussions and coordination with greater amounts of details on some of the key points. Likely, they will also happen at a different time point so we can make informed decisions.

This particular RFC is at the S0 stage and intentionally made to be so. As the RFC stated, there is no proposal to make S1/S2 decisions at this RFC. Many of our current discussions are around S1/S2 – the future evolution of the system. They are extremely helpful discussions to have to set up the context and help us improve the design, but not necessarily decisions we have to make immediately. Let us think about the broader community members we can empower and bring in through enabling the S0 improvement.

Thank you, everyone, for the discussions so far, and let us work together to enable our community.

@slyubomirsky
Copy link

On the point about potentially incorporating symbolic shapes into Relay, I would like to hear more detail about how it can be done with Relay's system of accumulating type constraints and solving them simultaneously. If we were to handle dynamic shapes in Relay, we would need to define semantics for how shape variables are scoped and how assignments are handled, how they can be processed during the solving of type constraints, and what happens if symbolic shape expressions cannot be concluded to be the same at compile time. If this can be neatly incorporated into Relay, then it might make sense to pursue. I would be happy to brainstorm on that issue.

@YuchenJin
Copy link
Author

YuchenJin commented Aug 24, 2022

Having taken onboard the feedback from community members (@leandron, @ekalda, @Mousius, @masahi), a number of us involved in this RFC (@YuchenJin, @jwfromm, @tqchen, @areusch, @mbaret, @jroesch, @tmoreau89) feel it’s necessary to be explicit about the scope of this proposal, and we apologize to those reviewing that this was not present in the original text.

  • Acceptance of this RFC doesn't mean there is an agreement to eventually deprecate Relay and replace it with Relax. It only permits bringing the development that's currently occurring on the Relax fork into the TVM repo. This will improve the accessibility of that important work for community stakeholders who rely on it, as well as bring Relax under TVM project governance.

  • If at a later stage it's found that individual features from Relax are desired in the Relay compiler (e.g. dynamic shapes, TVMScript support), design discussions and RFCs must take place to determine the best way to implement those features. Acceptance of this RFC gives no preference to Relax as the solution, and so evolving Relay would remain firmly on the table in those discussions.

The RFC has been accordingly amended to include the above commitments in the "Motivation and goals" section, which we hope addresses some of the valid concerns expressed so far.

@tqchen tqchen self-assigned this Sep 29, 2022
@tqchen
Copy link
Member

tqchen commented Oct 3, 2022

Thanks everyone for discussions so far. There are a lot of conversations around the clarifications and issues being answered. Two of the main concern being raised so far seems to be:

  • rationales and alternatives.
  • overall scope and execution (relation to relay deprecation)

In this case, @YuchenJin have:

  • Updated the RFC with the rationales and alternatives
  • Clarified that the scope of the proposal being in scoped to checking in the module as optional and non-disruptive or changing existing modules (nor do they imply deprecation of relay).

I think the above two actions have addressed the concerns being raised wrt to the particular scope of this RFC. @leandron @ekalda @Mousius please followup and see if you have more questions

Let us continue follow the overall guideline of providing grounded technical justifications and bring constructive discussions, knowing that some of the high-level tastes argument can be subjective.

In those cases, thinking about the community over code principle can go a long way. The particular Unity connection strategy already got majority support from the community(from more than 8 organizations), let us think about the following question: How can we collectively empower those members together?

@ZihengJiang
Copy link

ZihengJiang commented Oct 4, 2022

Thanks @YuchenJin for updating this RFC!

From the use-cases that I observed from ByteDance, the symbolic shape capabilities allows TVM to handle dynamic workloads that cannot be handled in other frameworks and be more widely adopted. And we can quickly enable iterations of unity connection as stated in RFC #91 , by bridging the high level representation with the low level TensorIR.

@FrozenGene
Copy link
Member

Based on our experience at NIO, dynamic shape support in Relax is extremely important for us. In fact, we have done many things on Relay trying to cover dynamic shape support on our user cases, however lack of first class support for symbolic dynamic shape still constraint us some ops / patterns can not exist on models. First class support for symbolic dynamic shape is extremely extremely important for us, especially some model is essentially dynamic input / output, for example Point Cloud. Relax, this is what I've been expecting for so long. If we have Relax, for Point Cloud or Object Detection model's dynamic output / Dynamic Batch model, Relax could solve it perfectly (whether from the view of functionality or performance).

Anyone has doubt I recommend to read this: https://github.com/tlc-pack/relax/wiki/Relax-Shape-Computation-Design.

Thanks @YuchenJin @tqchen and many relax authors to bring it, I very appreciate this work 👍 and in fact, I am evaluating relax internally and want to let relax to solve our problem ASAP.

@comaniac
Copy link
Contributor

comaniac commented Oct 4, 2022

In addition to the use cases and experience I've mentioned previously, I want to further highlight that symbolic shape support becomes even more important in these months for us at AWS, mainly due to the requirements of deploying decoder models (e.g., GPT). Since the text generation process is a natural dynamic shape workload in terms of sequence length and batch size, padding everything is not practical due to its inefficiency, which is already shown in latest paper publications. It is extremely important for TVM to support this case if we attempt to keep being the SOTA deep learning compiler.

Copy link
Member

@Mousius Mousius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks everyone for discussions so far. There are a lot of conversations around the clarifications and issues being answered. Two of the main concern being raised so far seems to be:

  • rationales and alternatives.
  • overall scope and execution (relation to relay deprecation)

In this case, @YuchenJin have:

  • Updated the RFC with the rationales and alternatives
  • Clarified that the scope of the proposal being in scoped to checking in the module as optional and non-disruptive or changing existing modules (nor do they imply deprecation of relay).

I think the above two actions have addressed the concerns being raised wrt to the particular scope of this RFC. @leandron @ekalda @Mousius please followup and see if you have more questions

Let us continue follow the overall guideline of providing grounded technical justifications and bring constructive discussions, knowing that some of the high-level tastes argument can be subjective.

I'm in agreement with my past self here, looking at the scale of the discussion on this RFC it seems prudent to do a functional decomposition exercise on Relax to break it down into the relevant units for integration. It appears that @YuchenJin has given an example of how to sub-divide the functionality into three separate blocks which can be landed independently:

  • C0: Add first-class dynamic shape support
  • C1: Add first-class support for interactions with TensorIR and TVM FFI
  • C2: Add first-class dataflow and side effect handling

I believe by breaking this down and tackling each unit independently we can analyse whether or not we can integrate these features into TVM without an optional module, if it requires breaking changes then we can easily do that now that we have a release cadence and notes to inform our users. It seems to me that we should address dynamic shapes first given @YuchenJin has already illustrated the changes required are not significant provided we don't try and maintain both behaviours?


### ****call_tir****

In Relax, we introduce `call_tir` to bridge graph-level IR and TIR. `call_tir` is an intrinsic that calls a TIR PrimFunc (that follows DPS) and returns the output. The semantics of `call_tir` can be demonstrated by the code below.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, could this not be solved using a utility function that returns the relevant nodes?

fn call_and_allocate(prim_func: PrimFunc):
  return SeqStmt[Allocate(), Call(prim_func)];

It'd be good not to have this ambiguity where call_tir is a specific way of calling a tir.PrimFunc rather than a general way to call TIR.


# 2. **Motivation and goals**

Relax is an effort within [TVM Unity](https://tvm.apache.org/2021/12/15/tvm-unity) that aims to evolve the graph-level IR to maximize **expressibility, performance, and portability** across today and tomorrow’s workloads. Relax has three key goals motivated by the TVM community’s needs, and lessons the community has learned in ML acceleration through years of using and developing TVM:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that training is not in scope of this RFC, can we hold off on adding side effect semantics to a future RFC? That leaves us with just optimizing for the dynamic shape workload.

@tqchen
Copy link
Member

tqchen commented Oct 4, 2022

@Mousius In this case, @YuchenJin 's reply clearly articulated that there is a close co-design of these factors, and changing to adopt dynamic alone would imply a one-step jump to relax -- which is not incremental. The data structure change would come with a set of supporting infra and co-design of things including shape inference, and other things.

Of course, if we do not maintain both behavior and allow an incremental transition. Then it is equivalent to a relay=>relax bridge that would allow part of the pipeline to go through in a more incremental fashion. This approach is consistent with what is being proposed in the RFC (with clear data structure boundaries).

I would suggest us not to trivialize the argument here, as it is easier to say why not do X than really go and do things. The learnings from @YuchenJin by concrete code certainly is very convincing, as well as their learnings of why not do things otherwise.

This is a case where there are subjective disagreements happens. In such cases, I would encourage again for us to put community into consideration, and the fact that things are not disrupting other existing flows

Copy link
Contributor

@leandron leandron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Clarified that the scope of the proposal being in scoped to checking in the module as optional and non-disruptive or changing existing modules (nor do they imply deprecation of relay).

I think the above two actions have addressed the concerns being raised wrt to the particular scope of this RFC. @leandron @ekalda @Mousius please followup and see if you have more questions

@tqchen thanks for the ping. I'll have quite limited time to follow-up here this week, however, I wanted just to comment on the last changes brought into the RFC text.

I think that at this point there is no dispute and we all agree that:

  • The features proposed in Relax are important features to be incorporated in TVM;
  • There are benefits in bringing new contributors to the Community;

At the same time, I feel Relax - as a new IR - it not an ordinary feature like "supporting some new framework as a frontend" or "incorporating a new algorithm for tuning". It is a whole pack of features in the form of a new IR.

Relax is actually part of TVM Unity, which claims to be a long term view for TVM - which is great. So what I feel (see all my previous questions) is that we need this RFC amended with what is this long term view with a focus on Relax.

I'm starting with the thinking that we don't want to maintain two IRs forever, and if really needed for complete lack of options we need to temporarily incorporate a new IR, we must have a horizon in which we'll consolidate in one.

The main question in my mind is: in the current form, what is the author's view on Relax being a replacement for Relay, as opposed to an alternative partially overlapping workflow?

Based on the answer to that question (the must come before we decide to merge this work), then we need to think about:

  • If we commit to migrate as a community to Relax, then what's the author's view of the rework needed in existing passes/runtimes/frontends/. IMHO the authors are not required to do the actual work, just the due diligence of giving the community a view of the technical work required;
  • If we don't commit to migrate as a community to Relax, what are the alternative designs we can make to evolve Relay with same features proposed here?
  • If the authors don't have enough information to take the decision, then I think it is on the authors role to refine the plan so that we get consensus on one of the options above.

That seems like a sensible thing to ask for a feature the size of Relax, in the context of TVM Unity. It just needs some more work to be done, so that we have enough information visible to the whole community on what decision we are taking and why.

I'll emphasise that I think we must have clarity on that decision before we merge this, as once it is merged in the code base and enabled in CI, we all indirectly committed to it. This is pretty much in line of what other projects in this are do, and I'd be happy with a roadmap based on information we have today, but I think it will beneficial to everyone if said roadmap is included in this RFC.

Comment on lines +703 to +705
We acknowledge that the migration of default flow (that involves Relay) would benefit from more in-depth discussions on possible approaches. They are not part of this proposal and we welcome further discussions on this topic. We think however, that it is useful to provide such a context. This section provides a context on possible difficulties in evolving Relay to bring some of the features.

From the development point of view, we think that the technical reasonings, community needs, and technical support around the module being present is sufficient to justify the inclusion of the optional module. We agree that we need to do further in-depth discussions on default flow evolution, and would be more than happy to continue the discussion after this RFC.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment on what are the discussions you think need to happen and what are the expected outcomes from these discussions?

The current wording feels a bit vague, considering Relax as a new IR that might generate uncertainty from the contributors on where to actually contribute new features, so, if TVM is moving to Relax, new features should go on Relax or if eventually we're going to stay with Relay, new features should actually Relay, and in this case, where does Relax fit in this puzzle?


Presenting optional modules and dialects to the ecosystem is an important way to welcome new improvements, land our technical commitment timely, continue to reinvent ourselves, and welcome new community members who have new use cases. It also aligns with practices we see in other ML projects, such as MLIR dialects, PyTorch FX modules, etc.

As in these previous cases (MLIR dialects and TorchFX), there can be some amount of perceived duplications with existing modules. Nevertheless, there are usually technical reasons for a new module, both from design consistency, time to delivery, and separation of concerns from stable modules during improvements. Additionally, there is also a majority amount of support to help continue maintaining the new module. This new community effort would help us to welcome new members to the community, and enhance the overall development of TVM.
Copy link
Contributor

@leandron leandron Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In previous comments I was unfamiliar with the project guidelines for MLIR and Torch, but reading their contribution guidelines, they actually require much more certainty on the future landscape for a given feature to make into the codebase, so I see no reason why we should have lower standards than those communities, at the same time we mention those communities as examples to be followed.

Based on that, I disagree on the premise that features should be brought into the codebase on the major justification of welcoming new contributors.

@Mousius
Copy link
Member

Mousius commented Oct 4, 2022

@Mousius In this case, @YuchenJin 's reply clearly articulated that there is a close co-design of these factors, and changing to adopt dynamic alone would imply a one-step jump to relax -- which is not incremental. The data structure change would come with a set of supporting infra and co-design of things including shape inference, and other things.

Co-design doesn't mean we have to land everything at once, it's often easier to prototype several features together and then break them down when we get to delivery - I believe that is most incremental and understandable rather than landing large changes with several dimensions at once.

I'm also unsure how it's a one-step jump to Relax as @YuchenJin has demonstrated that the functionality of Relax is split into several different pieces. I understand that in order to land the entire proposition of Relax it may take several pieces landing, but that's natural for a large change such as this?

Of course, if we do not maintain both behavior and allow an incremental transition. Then it is equivalent to a relay=>relax bridge that would allow part of the pipeline to go through in a more incremental fashion. This approach is consistent with what is being proposed in the RFC (with clear data structure boundaries).

They're not equivalent as we can avoid adding additional variations in our code by making a breaking change early and iteratively integrating the other various pieces of Relax into existing code. @slyubomirsky also expressed an interest in investigating the dynamic shapes in isolation so I believe this could have some momentum?

@LeshengJin
Copy link

LeshengJin commented Oct 4, 2022

As an individual Relax contributor from UCSD, I don’t feel our community has created an environment that welcomes new contributors and new contributions.

I am happy to see different opinions in the discussion, but it's really shocking and disappointing that we are afraid of moving forward even with large amount of community support, and for such a technically strong optional module.😞

Based on several community members, Relax can solve the needs that they care about. It’s also made clear by @YuchenJin in the RFC that Relax is an optional module which does not disrupt the current TVM flow, and it does not have the intention to replace Relay. In most open source projects, the evolution is done in phased manner. I am very surprised that such a module that can solve the current pain points of tvm has been blocked by subjective opinions.

It's hard to imagine that everyone who wants to contribute to tvm, especially those who do foundational work like relax, needs to do future predictions about existing flow deprecation and timelines, which in my opinion discourages people from contributing.

@YuchenJin
Copy link
Author

Thanks everyone for the feedback. One thing that we seem to agree on is that there is a strong need to support symbolic shape use cases for TVM, as represented by many of the folks who chimed into this thread.

Hopefully, we all agree that there is a strong need to support robust and high-quality dynamic shape compilations, and that is not being supported in the current TVM.

One of the questions is whether or not we can do that by separating the changes of C0, C1, C2 in Rationale and Alternatives.

In order to provide values to the user, we need end-to-end dynamic shape compilation for some specific environment of interest (for example it is less relevant at the moment for micro settings). C0, C1, and C2 are necessary to support a minimum but robust dynamic compilation:

  • We need to effectively lower the symbolic shape code to a sequence of calls into symbolic TensorIR functions and libraries (when we cannot generate TensorIR functions). This means in order to do so we need C1: Add first-class support for interactions with TensorIR and TVM FFI.
  • In order to further lower some of the shape handling (match_shape), we need to generate code that contains side effect (such that it allocates the buffer and writes into it). This naturally means we need C2: Add first-class dataflow and side effect handling.

In summary, in order to build a minimum viable dynamic shape compilation in the most high quality, robust, and general fashion, the lowering flow would require stages that contain C1 and C2.

We would also like to highlight that having C0, C1, and C2 helps us to achieve the goals faster. For example, running fusion on calls into TensorIR means we no longer need developers to annotate the fusion properties of each op, and we no longer need to handle weight layout rewrites in an ad-hoc way. This reduced engineering complexity thanks to better architecture helps us to bring robust high-quality support for dynamic shape faster to the community.

The dynamic shape compilation, along with first-class C1 and C2 interaction are needs that we do not support well today in TVM. They offer clear values to our community, and differentiate from the cases that Relay already supports.

One of the main topics raised was how Relax fits into TVM in general. We acknowledge that it is impossible to provide the full picture immediately. It is also not necessary to make all decisions in the beginning, as making uninformed decisions is worse. It is also common practices for OSS project to evolve and bringing in S1-conversations along the way. For example, when TorchFX got introduced, there was no prediction of TorchDynamo, which got introduced in the later part as things iterated through the codebase and became clearer.

Nevertheless, we would try our best to address how Relax fits into TVM:

C0, C1, C2 also highlight how cohesive relax fits into TVM’s current codebase more than many other modules in the codebase. In particular, the design deeply aligns with TensorIR, topi, symbolic shape and PackedFunc and many other modules.

We also view having Relax and Relay co-exist in the codebase as a positive thing and better than the current state:

  • We get dynamic shape support for users who needs them, empowering the community members.
  • We get more developers thanks to the empowerment, offsetting the burden brought by the module.
  • The burden of establishing the module is minimized as it will cause no disruption to the existing modules/flows.
  • In many cases, having Relay and Relax work together is a positive thing:
    • We use Relay for importing and graph optimization and use relax for TIR-related co-optimization. It is a great combination and a better pipeline for some of the use cases.

These positive parts already would bring the community a long way. We also acknowledge that we cannot pin down all the technical decisions around S1/S2 level in the very beginning. Please see the Future Opportunities section in the TVM Unity connection RFC, which I encourage everyone to check out.

The author also wants to come back to community empowerment. In this case, many in the community have voiced their need for empowerment with grounded technical reasonings. We would like to encourage us to take a step back, as most of the technical reasons are less grounded and we do not need to make all decisions in a single shot. We are looking at a module that is clearly isolated, makes no disruptive change to the code base, and is needed by the majority of community members. Having open-mindedness here would go a long way toward building a welcoming and inclusive community.

@wrongtest-intellif
Copy link

In Intellif, people build, maintain and extend the DL compilation stack with Relay in past years. However, we never think the upstreaming of a new module would break existing functionalities or cause confusions, but huge opportunities to solve many technical issues which prove to be not so easy to handle in Relay, which are already emphasized in the discussion thread.

From my perspective the TVM community is a very inclusive community. We do have modules of certain overlapped functionality co-exist without so much debates. As examples we could see different runtime implementation for Relay, TE-schedule and TensorIR-schedule, Ansor and meta-schedule, etc. Wish it is also not a problem on graph ast infra.

@evanluyifan
Copy link

Thanks for this great job !
Based on our experience at Meituan, dynamic shape is important for our use cases, e.g. OCR and ASR models with dynamic seq_len. Now we could solve these with relax and vm runtime :)

@slyubomirsky
Copy link

slyubomirsky commented Oct 6, 2022

For those interested, I think this recent paper shows one way how symbolic shapes could be made to work with Relay's type checking approach (Axon is clearly structured very similarly to Relay), though it would require substantially reworking the existing type relations in Relay. It's rather different from Relax's approach, so it's a possible point of comparison.

@yzh119
Copy link
Member

yzh119 commented Oct 7, 2022

I'm a graduate researcher at UW and had been a full-time SDE at AWS AI for years, my work is around Deep Learning Frameworks/Compilers. I feel like all of us agree dynamic shapes are essential so I don't want to spend more time emphasizing how important it is. I'm not a contributor to Relax, but I have been following it for a long time. I don't want to pretend to be neutral, I do think it is quite necessary to welcome Relax, rather than just adding dynamic shape support in Relay.

The main controversy in this thread is about whether to upgrade Relay incrementally or develop a new IR called Relax. I understand hardware companies appreciate stability, and we can see CUDA didn't change its interface drastically over the years, what a miracle! There must be several times people wanted to develop new languages/compilers for NVIDIA GPUs but CUDA survived, this is a lesson we should learn: in the beginning, we design things with a vision of the future in mind, then we maintain them with high standard, improve it incrementally and be customer-obsessed.

This is the ideal story, but we should not ignore that though CUDA was invented before the DL era, there are already many high-performance computing workloads the designer can refer to. Fortunately, even in 2022, the operators used in DL still highly align with HPC ones and are actually simpler (it's a world of GEMM). What about the story of (computational) graph-level IRs? The dominant workload in DL changes over time and I would say they cause a lot of headaches for framework and compiler designers: first CNNs/RNNs/LSTMs/Tree-LSTMs(the structure dynamism is one of the challenges Relay would like to tackle, but unfortunately they are used nowhere), then we have Transformers/GNNs(not as hot as Transformers because of hardware lottery, but who knows the future). Now we have entered a time where models converge, but scalability grows significantly: models become larger and larger, and a lot of engineers and researchers propose (checkpointing and rematerialization, quantization, graph substitution, fusion and stitching, sparsification and mixture-of-experts, hybrid parallelism) to optimize DL workloads at compile-time, and I'm glad to see many of them are developed upon TVM because TVM's design is always up-to-date and support new workloads quickly, however, Relay's current design cannot take full advantage of these new techniques, and the system has the trend of becoming fragmented. Relax is a great opportunity for us to reconsider the graph-level IR design: prune the redundancies and add new functionalities, it's exciting to see we can unify different levels of optimizations together in TVM Unity, once Relax is accepted by the community. Refactor makes things simpler, rather than complex.

Whenever we found it's time to make some changes, TVM always embraces new designs. This happens several times in TVM history: Prior to Relay, there is NNVM, which is deprecated and completely replaced with Relay. The previous Tensor-Expression has limited expressiveness, and the schedule tree data structure cannot support tensorization elegantly, then we have TensorIR, which is not only backward compatible, but also brings opportunities for developing new dialects (Ruihang and I designed SparseTIR upon it, works pretty good). The AutoTVM cannot generate scheduling templates automatically, then we have Ansor and Meta-Scheduler. I would emphasize that the core components of all these updates are upstreamed within several months, and do not break any backward compatibility, it credits to our hard-working and open-minded contributors and reviewers. Committing to TVM helps these contributors become MLC experts, some of them are PMC members now. I would say non of these refactoring influences TVM's reputation, on the contrary, it makes people impressed by TVM's speed in adapting to the future, and they are more willing to try TVM because it's open, it's driven by innovation.

I really don't understand what's the difference this time, when it comes to Relax? We have a bigger community, this is awesome and I definitely welcome your input and constructive suggestions on the future of this project. I view the New Scoped Module RFC as a contract between industrial developers and researchers/engineers like me that works on "toy prototypes", we promise not to touch anything that might influence user experience, we also don't want to be discouraged because my prototypes cannot be upstreamed and only stay in some random GitHub repo as a toy. I also think the new S0-S1-S2 progress is already the most painless approach to delivering new designs, and the effect is equivalent to incremental change. If people take a look at the Relax repo, it already has a huge amount of code there and well-written documentation (you can compare it with the official relay documentation), I think it's super inappropriate to ignore these contributors' devotion, especially individual contributors such as @LeshengJin . TVM has a huge user base of researchers, they are an important part of the community, and they also contribute high-quality code instead of just hacking.

Regarding the "lower standard than other communities" issue, TVM has high standards and we are not talking about standards. If no fundamental changes are allowed in DL infrastructures, google should stay at TF 1.0 and never develop JAX, and PyTorch should not create so many different compiler infrastructures (I want to share this slide again.

I'm willing to provide more details on every argument if requested.

Best,
Zihao

@zhiics
Copy link
Member

zhiics commented Oct 8, 2022

Based on my experience at several organizations, dynamic shape support is obviously very important, particularly along with the popularity of large language models. Also, efficiently supporting dynamic shape would be one of the major appealing features of a "modern" DLC. I think the above comments have also reached the agreement of importance of dynamic shape. The major argument is whether we need to have separate PRs to land this feature.

IMHO, Relax is already one of the components of Unity, and the current proposal again only contains the most value part of Relax which provides a minimum E2E compilation flow to enable the support of a dynamic model. This somehow has been working well before in both TVM and other open source project since the component doesn't blocking/breaking the current uses/deployment. For example, the first version of Relay also had IR, simple lowering, and necessary passes to quickly unblock the users/developers (e.g. AWS) who want to give it a try. Afterwards, we iterated on it many times to improve both the design and implementation.

As a contributor of TVM, I would encourage we focus more on the design itself and spot the design flaws and the missing key features that we should address so that users (some of them are already waiting for Relax as mentioned here) can quickly check it out and bring us back with more insightful feedback or directly contribute to the project.

@YuchenJin
Copy link
Author

There were concerns that bought up in RFC #95 that this RFC conversation did not cover "how proposal fit into TVM". We agree that discussing the fit is important and would like to refer to related conversations and sections:

@Lyken17
Copy link

Lyken17 commented Nov 9, 2022

I learn a lot from reading through the thread, and find most people here are from a system background: either doing related research in schools or heading an engineering team in companies. I would like to share some of my thoughts from a different perspective, as a TVM user and ML algorithm developer.

I am a graduate student at MIT and studying efficient deep learning algorithms and co-designs (details in my page, lab site and our recent project that trains NN on a 256kB MCU). We have been honest TVM users because of its flexibility, high performance and open-source. But, when we want to dive deeper and make some customizations, things are becoming complex and relay is no longer friendly

  • Unnecessary long call stack between python and cpp: Take relay.build as an example, a relay graph (in python) first does shape check (in cpp), then calls to wrapper (python), later feeds into TensorExpression (either in python or cpp), and then feed into VM for compilation (packed functions). ANY step in the middle can raise errors and developers can easily get lost in the pipeline. Actually you can find a lot of users reporting similar issues on the forum and only very few of them can fortunately get an answer from experienced developers.
  • Difficult to add a new operator because of complex pipeline: In our research, and also many other users development, adding new operators is a common request. But in current relay, even if we just want to add a simple Identity operator (y = x), we need to
    1. declare an attribute node.
    2. write type relation check in CPP.
    3. register OP in CPP.
    4. describe the compute.
    5. describe the schedule.
    6. wrap up with CPP.
    7. wrap up with python.
      Seven steps just to define an identity function? Seriously? In PyTorch it won't cost more than 20 lines. This significantly slows the growth of TVM community and if you check the PR history, the numbers of new operators and new contributors are quite limited this year, while PyTorch receives new operator implementations from the community every day.
  • Missing capability to call third-party implementations: Relay syntax does not, at least not easily, support users from call 3rd party backend like CuDNN, OpenVino, TensorRT. For the cloud, CuDNN and TensorRT are still SoTA for most benchmarks and without simple integration means inferior performance, which will make fewer people choose TVM. For the edge, the situation is even more serious because of hardware diversity. Take Qualcomm DSP as an example: even though the TVM hexagon support is in progress, but the best solution is still those manually written kernels in SNPE. It is not trivial to call other backends in current relay: BYOC is difficult to use and register custom operators can be quite complex as discussed in last point.

I understand those who want the backward compatibility so existing projects are not broken. But we cannot build a ship of Theseus in the real world and the above issues cannot be easily "improved" with current relay. If TVM do not embrace new designs and improve its user-friendliness, then, eventually developers will switch to other tools and this is indeed happening:

I like the TVM project and hope the community can be always active. TVM has a huge user base of researchers and Relax can allow them to easily contribute their code and idea to the repo, instead of tricky hacking and creating separate repos for each project. This is important for an open-source community -- just recall how mxnet loses its market and why PyTorch can beat TensorFlow even released one year later. TVM should consider Relax's upstreaming given its more thoughtful and user-friendly design, well-written documentation/tutorials, and S0,1,2 painless upgrading.

I would like to discuss more if there is any comments and questions.

@YuchenJin
Copy link
Author

Thanks everyone for the discussions! A brief recap of our discussions so far:

  • We are certain that Relax supports dynamic-shape workloads that are not supported by the current TVM, which can immediately benefit many community members and users.

  • For why Relax should be brought into the project today, we showed that having Relax and Relay co-exist in the codebase is a positive thing in several aspects ([RFC] Relax Upstreaming #89 (comment)). And the path to moving TVM to a Relax-only project will be long, so Relax and Relay co-existing is necessary for the foreseeable future, just like how TorchFX co-exists with TorchScript in the Pytorch project. We acknowledge the concern that Relax can bring confusion to some of the members in terms of which IR to contribute to, but we also encourage the community to consider the fact that Relax can directly bring dynamic-shape compilation to TVM while the original workloads can still be compiled by Relay compilation, and other factors including community empowerment and the scope of this proposed module.

  • It’s been pointed out that it would be helpful if we lay out the ideal scenario for how we see Relax and TVM Unity evolving over time in the TVM project. The reason we have built Relax is that we are confident that Relax both in current and future forms will significantly improve TVM, and we have outlined the future opportunities in https://github.com/tqchen/tvm-rfcs/blob/main/rfcs/0091-establish-tvm-unity-connection.md#4-future-opportunities-after-unity-connection. Nevertheless, it is helpful to explain in more detail given our current state of knowledge, so we will bring in the discussions of integration of Relax in TVM default flows and consolidation/deprecation of Relay and Relax as add-ons to the roadmap.

@tqchen
Copy link
Member

tqchen commented Dec 4, 2022

After seeing so many voices in this thread. I think it is important to provide a reply here.

I am wearing the Apache TVM hat as a ASF member and Apache TVM PMC member.

First of all, I would like to say thank you, everyone, for sharing your voices here. This post has received support from more than eight organizations from both industry and academic backgrounds. Your voices are very important to the community and will not be ignored. As many said, we would love the TVM community to continue being inclusive and innovative while maintaining the stability of existing developed components.

I also would like to come out and acknowledge the positions so far:

The position that @leandron made so far was:

  • We do not like to be in a state where relax and relay coexist without deciding the commitment of one replacing another.
  • As a result, due diligence of such a replacement is mandatory before merging the proposal.

I would like explicitly to acknowledge that the above positions have valid rationales, are completely valid, and can be a possible way of software development.

I think the position raised by @YuchenJin and others were:

  • Relax could have the potential to replace relay, but the proposal as it only proposes to have the two modules coexist.
  • Just like how most OSS projects bring in modules and evolve things (e.g. TorchFX being brought in overlaps with TorchScript, nor plans to immediately phase out TorchScript). The modules can coexist, evolve, and we continue conversations about future co-evolution.
  • Relax and Relay coexist in the codebase is already a positive step that we shall take, especially considering community empowerment.

These are also valid rationales and can be possible ways of developing things.

As a first step, I would like to acknowledge each others’ positions as they are valid rationales. The main difference is that there is a disagreement on how we should do things as a community.

Such a decision should be made collectively as a community, considering all the factors involved: including code and community factors. We all make our suggestions holding innovation, stability, and community into account.

When evaluating a proposal and empowering our community members, we expect every one of us to continue having a constructive conversation, considering the latest context.

While the initial comment made by @leandron is valid on its own, I would love to see we re-evaluate our positions message considering all the factors in the latest context, including community empowerment and the collective views of other members. I want to say that by no means do we simply seek to dismiss the original position -- i would apologize if I it makes it feel that way. Instead, we want to acknowledging each view, and we have disagreements on hows, and taking community's view into consideration. Have constructive public conversations in services of many who have voiced their support here -- as we continue to empower each other to bring an inclusive, welcoming and supportive community.

Thank you!

@junrushao
Copy link
Member

My position:

  • Relay and Relax is going to co-exist as parallel submodules in TVM, and one should not affect the other at all;
  • Committed to keeping Relay source code in "main" in the foreseeable future without hinting about potential deprecation;
  • Having Relax in "main" >>> having Relax in a separate branch > not having Relax at all.

@HLearning
Copy link

need to update it quickly

@tqchen
Copy link
Member

tqchen commented Jan 25, 2023

Five years ago, we started with a community that comes with a common vision in mind – enabling machine learning engineers to optimize and run computations efficiently on any hardware backend.

Five years later, the fields of machine learning and MLC(ML compilation) have gone under rapid changes. That same vision is still shared among this community. This is why many of us still feel so fresh when writing code patches, bug fixes, architectural refactors, and new features. We are here today thanks to a diverse community that comes with different perspectives and areas of focus but still aligns around that common vision.

As a project, we benefit from different perspectives to survive in the ever-changing and competitive area of ML compilation. Hardly one can predict every detail of the future (just observing the set of recent changes such as chatGPT and stable diffusion). Hardly can one definitively assert that one approach would be better than another one from the very beginning. Enabling diverse possibilities helps to move the project forward while enabling different needs.

As a community, while we care about different subsets of modules and do not always need to work on the same thing, there is always an overlap of interests, regardless of whether it is the graph, FFI, TensorIR, or backend that sustains collaborations among different people. Most importantly, we come with a mindset of empowering each other under the same vision.

Thank you, everyone, for participating in this thread.

This thread arrives at its current state due to different perspectives on possible thoughts of project procedural operations (whether a detailed migration plan and commitment to migration are necessary for the new module proposal). There is a common agreement that migration(if it happens and is being proposed) would require a lot of details and community buy-in, but different opinions about when that can and how that should happen.

On behalf of the TVM PMC, I would like to recommend an initial step to help us to recognize achieve the following goals from different members of the community:

  • G0: Get us out of stagnation and empower the community, including many who shared their support in this thread, to participate in unity development in the TVM community.
  • G1: Give some time to answer questions, and provide examples to those who have shared needs to have more detailed evidence and possible feasibility analysis of migrating some modules.

Specifically, we would recommend us to follow an existing practice in projects like Hadoop, to empower related development in a branch. ASF mechanism allows any committer to create a branch in the apache repo and do collaborative development there at their own pace. Per our existing process, merging a branch into main still requires lazy consensus. Branch development offers flexibility while accepting the risk of blocking when merging to the main. As a result, there are general incentives to keep alignment with the majority of the community and continued engagement to get buy-in. Branch development offers a way to collaborate on a possible but not definitive future of the project, as a branch can come with different outcomes such as being partially merged, continued development, or abandoned. Enabling different perspectives is important for us both as a project and community.

TVM PMC re-affirmed that branch development can be used as an option for the project and specific development around tvm unity. We would like to offer it as a possible option for the community and the first step of execution, with the goal of getting related pieces into main. I wrote down a more detailed post, which we would love to get everyone’s feedback. Of course, this is only one possible option, and community members can freely choose their ways of participation.

Developing in a branch will also give some time buffer to answer G1. It is valuable to answer questions and have grounded conversations to give more information to the members who are not yet on board with the new module. Noticeably, to many community members, detailed code examples, benchmarks, and continued engagement are necessary to get broader community buy-in. We would recommend having focused discussions on the questions of interest (e.g. give concrete code tutorials for BYOC) to help the community members who have related questions. We encourage such continued conversations in forum threads, meetups, and development interactions with the goal of getting as much information as possible. Again such interactions aim at demonstrating possibilities, but not warrant deprecation or migration since that choice should still lie in the hands of the community. Hopefully, they give more comprehensive pictures for us to make follow-up decisions collectively.

As part of winter break, I started to do more coding, and I was really fascinated to see that passion still is deep in my heart(and I believe in many of us) after so many years, thanks to this community and our common vision. As a community member, I am really motivated to spend focused energy helping to build concrete code examples and tutorial materials for G1.

Please also checkout this post for more details

@tqchen
Copy link
Member

tqchen commented Apr 4, 2023

An update, thanks to effort of many community members we are now at a period where the initial foundational items in unity branch are now established.

One goal of the G1 is to give some time answer questions, and provide examples to those who have shared needs to have more detailed evidence and possible feasibility analysis of migrating some modules.

As part of the effort, the community members have been posting tutorials on related topics of interest in the forum. I would encourage folks in this thread to feel free to ask for what additional tutorials you want to see and/or specific technical questions.

@tqchen
Copy link
Member

tqchen commented Jul 26, 2023

Just another update, it is great to see unity being developed and used for dynamic shape and emerging use-cases

One goal of the G1 is to give some time answer question. There are more topics of related interest(some might related to the questions in this thread https://discuss.tvm.apache.org/c/development/unity/14) Please check it out and feel free to participate the discussions and technical questions

@tqchen
Copy link
Member

tqchen commented Sep 12, 2023

sending another reminder for everyone to chime into related unity discussion threads https://discuss.tvm.apache.org/c/development/unity/14, love to see your participations on all the technical discussions and see the how we can collectively address your needs

@FrozenGene
Copy link
Member

I want to know do we have plan to decide when to merge Unity branch into main branch? As LLM is so popular now, without Unity, we can not support it well.

@tqchen
Copy link
Member

tqchen commented Sep 26, 2023

Thanks @FrozenGene for bring this up! To bring broader awareness, we posted a new strategy proposal here https://discuss.tvm.apache.org/t/discuss-tvm-core-strategy-for-emerging-needs/15751 to concretely enable LLMs and other usecases

@slyubomirsky
Copy link

It's worth noting that with the merging of Unity into TVM's main branch, Relax has already been de facto upstreamed.

@tqchen tqchen closed this Jan 22, 2024
@tqchen
Copy link
Member

tqchen commented Jan 22, 2024

indeed, check out apache/tvm#16446

@YuchenJin
Copy link
Author

It's worth noting that with the merging of Unity into TVM's main branch, Relax has already been de facto upstreamed.

🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.