Skip to content
This repository has been archived by the owner on May 22, 2023. It is now read-only.

[DISCUSS] AST Related Discussion #52

Closed
ZihengJiang opened this issue Nov 26, 2021 · 5 comments
Closed

[DISCUSS] AST Related Discussion #52

ZihengJiang opened this issue Nov 26, 2021 · 5 comments

Comments

@ZihengJiang
Copy link
Contributor

ZihengJiang commented Nov 26, 2021

I feel there are several points worth discussion in the AST part. I list here for discussion:

P1. Rename expr.h to ir.h

Since it also contains non-expression objects like Binding

P2. Change the type of Function.body from Expr to SeqExpr.

We already assumed that the body should be SeqExpr. And if we want to allow other expression type like If to be here, we need to handle those cases everywhere. Instead, we could change the type to SeqExpr and make the If nested in the SeqExpr.

P3. Change the name of SeqExpr.body to SeqExpr.result.

This is because the blocks field in the SeqExpr is more like the content of the IR node, it is confused (at least for me) to have another body field but actually represents the value of the sequential expression. There is also some opposing opinion, for example:

Welcome for discussion! cc @mbs-octoml @jroesch @electriclilies

@electriclilies
Copy link
Contributor

electriclilies commented Nov 29, 2021

P1. Since TVM has multiple IRs, I think it would be confusing to call the header file ir.h. Additionally, Relay does have non-expression nodes that live in expr.h (RefWrite is an example of this).
I think we should either be consistent with Relay's convention (i.e., keep expr.h) or rename it something more specific to the project, like relax.h or relax_ast.h or relax_ir.h or relax_expr.h.
My personal preference would be to make it project specific: one thing I don't like about the file structure of TVM today is that we have duplicates of header files that live in different directories-- there's include/tvm/ir/expr.h and include/tvm/relay/expr.h and include/tvm/tir/expr.h. It's hard to keep them all straight when you are reading the code.
P2.
I haven't looked at the AST in a while, but I'm actually a bit confused about how SeqExpr is different from Let. I thought in initial discussions SeqExpr was a sequence of expressions or statements followed by a value. However, on looking at it again it seems that it is literally just a bunch of bindings followed by a body, which is just a Let. If there isn't any difference, can we just call SeqExpr Let?

P3. Again, haven't looked at the AST in a while, but I would think that SeqExpr would inherit from Expr, and therefore would be allowed to be in the body of a function since the body of the function is an expression.

Could you make a inheritance diagram of the current AST before we continue this discussion? I think that would make it much clearer what is going on structurally. Thanks!

@YuchenJin
Copy link
Collaborator

YuchenJin commented Nov 30, 2021

The above discussions contain three key questions with regards to AST/IR design:
Q0. Handling of a sequential list of bindings without control flow
Q1. Handling of structured control flow
Q2. Handling of unstructured control flow

Let's take a look at previous compiler designs:

D0. Functional language

  • Most functional languages handle Q0 implicitly through recursion (let).
  • Functional languages also handle Q1 implicitly recursion in AST. For example, in the case of If then else, then and else branches are simply Expr.
  • Do not support Q2.

D1. LLVM

  • LLVM handles Q0 explicitly through an explicit basic block data structure through a list of bindings.
  • LLVM does not handle structured control flow while supports unstructured control flow such as branching instruction since it's low-level IR.
  • Only support Q2.

D2. MLIR

  • MLIR handles Q0 explicitly quite similarly to LLVM.
  • It supports through a data structure called Region which contains a list of (binding) blocks.
  • Optionally support Q2.

As we can see, there are two major choices for each question:

C0. Handling of binding blocks

  • C0a. implicit through recursion
  • C0b. explicit data structure for binding block

C1. Handling of structured control flow

  • C1a. implicit through recursion, where body of a structured control flow can be any Expr
  • C1b. explicit data structure (such as Region or SeqExpr)

C2. Handling of unstructured control flow

  • C2a. Support Q2
  • C2b. Do not support Q2
    C2a is more low-level than C2b. In our case, we find that C2b is enough and allows for robust AD.

Traditional functional compiler (C0a + C1a) allows us to handle everything with recursion and the rewriting is Expr => Expr.
Another typical choice (C0b + C1b) happens in SSA-based designs that are explicit about binding block and structured control flow.

Right now in Relax, we followed a C0b + C1a approach, and we have already made the Visitor/Mutator handle this general case. Changing function.body to SeqExpr (just like explicit Region in MLIR) means we will adopt a C0b + C1b approach, if we take this approach, then we will need to change all the structured control flow to SeqExpr.

@ZihengJiang
Copy link
Contributor Author

Hi @electriclilies ,

For question you proposed in P1: we have different header files since those expressions are defined for different submodules. For example, ir/expr.h defines basic and common expressions that can be used by relay and tir. tir/expr.h defines expressions that is only used by tir. It is a good point that relay/expr.h also contains non-expressions node.

For question you proposed in P2 and P3, here(https://github.com/tlc-pack/relax/wiki/Relax-AST-Design) is the current AST design, you could take a look and get more context for the discussion.

@YuchenJin
Copy link
Collaborator

For P1, one possible proposal is we refine the header files: we keep the Expr objects in expr.h, and move BindingBlock and Binding data structures to binding_block.h

@electriclilies
Copy link
Contributor

cc @junrushao1994

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants