Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codegen AST related code #3819

Closed
5 of 7 tasks
Boshen opened this issue Jun 22, 2024 · 10 comments
Closed
5 of 7 tasks

Codegen AST related code #3819

Boshen opened this issue Jun 22, 2024 · 10 comments
Assignees
Labels
C-enhancement Category - New feature or request

Comments

@Boshen
Copy link
Member

Boshen commented Jun 22, 2024

Background

For keeping compilation speed down for ourselves and our downstream crate users, I have avoided using macros to generate boilerplate code at all cost.

All performance issues (runtime and compilation speed) are considered as bugs in this project.

But due to the amount of AST related supporting code has expanded over the last few months, and our desire to experiment with different ideas with the AST, our current approach of manually updating everything is no longer functional.

The goal of this task is to provide code generation for all AST node related code.

Requirements

  • no build.rs published to the user
  • all generated code are checked into git
  • no nightly
  • Rust code residing in crates/oxc_ast should be the source of truth

Workflow

  • a user changes the oxc_ast crate
  • a watch change picks it up
  • parse all #[visited_node]
  • saves them into a model, e.g. struct ASTModel { name: String, attributes: Vec<Attributes> }
  • generate code and save file

Code to generate

  • ASTKind (Milestone 1)
  • GetSpan (Milestone 2)
  • Hash, PartialEq implementations (Milestone 3)
  • ASTBuilder (Milestone 4)
  • Visit (Milestone 5)
  • VisitMut (Milestone 5)
  • Traverse, remove current js code (Milestone 6)
  • typescript definition for napi and wasm moving the scope to AST transfer.

Every milestone should be a stack of PRs, since they will involve changing other files as well.

@Boshen Boshen added the C-enhancement Category - New feature or request label Jun 22, 2024
@overlookmotel
Copy link
Contributor

The AST also includes some types which are in oxc_syntax e.g. AssignmentOperator, BinaryOperator etc. Should we move those types into oxc_ast to simplify which files the codegen needs to read?

(I've never been clear why these types are in oxc_syntax, since they are part of AST)

@Boshen
Copy link
Member Author

Boshen commented Jun 22, 2024

Should we move

No, let's not scope creep.

There are other traits and functions that depend on these common operators.

@Boshen
Copy link
Member Author

Boshen commented Jul 2, 2024

Are we able to manually inline @inherit MemberExpression and generate all the other utilities for these inherit_variants! enums?

@rzvxa
Copy link
Contributor

rzvxa commented Jul 2, 2024

Are we able to manually inline @inherit MemberExpression and generate all the other utilities for these inherit_variants! enums?

Can you elaborate on it? Wouldn't we lose some information if we don't express the inheritance in the type system? While the actual code would behave the same, It might be useful for the user to see this as the definition:

inherit_variants! {
/// Argument
///
/// Inherits variants from [`Expression`]. See [`ast` module docs] for explanation of inheritance.
///
/// [`ast` module docs]: `super`
#[visited_node]
#[repr(C, u8)]
#[derive(Debug, Hash)]
#[cfg_attr(feature = "serialize", derive(Serialize, Tsify))]
#[cfg_attr(feature = "serialize", serde(untagged))]
pub enum Argument<'a> {
    SpreadElement(Box<'a, SpreadElement<'a>>) = 64,
    // `Expression` variants added here by `inherit_variants!` macro
    @inherit Expression
}
}

Instead of this:

/// Argument
///
/// Inherits variants from [`Expression`]. See [`ast` module docs] for explanation of inheritance.
///
/// [`ast` module docs]: `super`
#[visited_node]
#[repr(C, u8)]
#[derive(Debug, Hash)]
#[cfg_attr(feature = "serialize", derive(Serialize, Tsify))]
#[cfg_attr(feature = "serialize", serde(untagged))]
pub enum Argument<'a> {
    SpreadElement(Box<'a, SpreadElement<'a>>) = 64,

    BooleanLiteral(Box<'a, BooleanLiteral>) = 0,
    NullLiteral(Box<'a, NullLiteral>) = 1,
    NumericLiteral(Box<'a, NumericLiteral<'a>>) = 2,
    BigIntLiteral(Box<'a, BigIntLiteral<'a>>) = 3,
    RegExpLiteral(Box<'a, RegExpLiteral<'a>>) = 4,
    StringLiteral(Box<'a, StringLiteral<'a>>) = 5,
    TemplateLiteral(Box<'a, TemplateLiteral<'a>>) = 6,

    Identifier(Box<'a, IdentifierReference<'a>>) = 7,

    MetaProperty(Box<'a, MetaProperty<'a>>) = 8,
    Super(Box<'a, Super>) = 9,

    ArrayExpression(Box<'a, ArrayExpression<'a>>) = 10,
    ArrowFunctionExpression(Box<'a, ArrowFunctionExpression<'a>>) = 11,
    AssignmentExpression(Box<'a, AssignmentExpression<'a>>) = 12,
    AwaitExpression(Box<'a, AwaitExpression<'a>>) = 13,
    BinaryExpression(Box<'a, BinaryExpression<'a>>) = 14,
    CallExpression(Box<'a, CallExpression<'a>>) = 15,
    ChainExpression(Box<'a, ChainExpression<'a>>) = 16,
    // .
    // .
    // .
}

On another note, if we get the macro caching to work (related to this) we can have this without any compilation penalty which I think is much superior(And I guess @overlookmotel would agree).

/// Argument
///
/// Inherits variants from [`Expression`]. See [`ast` module docs] for explanation of inheritance.
///
/// [`ast` module docs]: `super`
#[inheritance]
#[visited_node]
#[repr(C, u8)]
#[derive(Debug, Hash)]
#[cfg_attr(feature = "serialize", derive(Serialize, Tsify))]
#[cfg_attr(feature = "serialize", serde(untagged))]
pub enum Argument<'a> {
    SpreadElement(Box<'a, SpreadElement<'a>>) = 64,
    #[inherit] // or maybe `#[inline]` since it is more rust-like?
    Expression(Expression<'a>),
}

@rzvxa
Copy link
Contributor

rzvxa commented Jul 2, 2024

If we want to have the inheritance expanded we should do the expansion with the codegen to keep it consistent and schema close to the ECMAspec. But IMHO, manually inlining the variants is going to be a maintenance nightmare. because they would easily drift apart with some time and careless reviewing(which always happens no matter how many people review the same piece of code).

@rzvxa
Copy link
Contributor

rzvxa commented Jul 8, 2024

Hash, PartialEq implementations (Milestone 3)

@Boshen @overlookmotel I think we should remove this one from our milestones and here's why.

The performance gain from pre-expanding builtin derives is going to chop off a handful of seconds from our build time and even that is an optimistic estimate. It might at first feel unintuitive but let me explain myself.

Syntax Extensions

The macro calls(including the macro_rules itself), attributes, and derives are among syntax extension kinds, A syntax extension is the last form of a macro(usually) before turning into the desired AST.

If you look at the implementation of SyntaxExtensionKind you'd find that there are 2 kinds of any macro kind, There are 2 sets Bang, Attr, Derive and LegacyBang, LegacyAttr, LegacyDerive.

What's the difference?

In a nutshell, legacy kinds have AST -> AST signature as opposed to TokenStream -> TokenStream.
As of right now, Rust registers all of its builtin macros as the former kind, These implementations are the closest to the bare metal and get executed almost as fast as any lowering process happening in the compiler.

What happens if they phase out the legacy kinds?

Well again it might seem intuitive that with this change a pre-expand can improve the build time by a good margin but the truth is that for dead simple macros such as these 2 if written as a procedural macro you are going to mostly wait on the initial build of the macro crate and then spend a good amount of time on the overhead of the proc_macro_server, So if they were to phase out the legacy expanders they still can implement the MultiItemModifier differently to short-circuit to the builtin implementation avoiding all of the shortcomings of normal procedural macros. That's exactly why declarative macros are fast to build.

Conclusion

I guess my conclusion is in the form of a question.
While pre-expanding the built-in derives can speed up the builds by N seconds is it really worth giving up on the built-in macros? It yields more performance if it is done on the derives implemented in the users' land.

Links

https://github.com/rust-lang/rust/blob/32e692681e457b06efedd05c5a4be0d718f39292/compiler/rustc_builtin_macros/src/derive.rs#L1
https://github.com/rust-lang/rust/blob/35b658fb1071d752d092751d9f17a9fa7b154ec4/compiler/rustc_builtin_macros/src/deriving/default.rs#L1
https://github.com/rust-lang/rust/blob/35b658fb1071d752d092751d9f17a9fa7b154ec4/compiler/rustc_builtin_macros/src/deriving/hash.rs#L1
https://github.com/rust-lang/rust/blob/35b658fb1071d752d092751d9f17a9fa7b154ec4/compiler/rustc_builtin_macros/src/deriving/cmp/partial_eq.rs#L1
https://github.com/rust-lang/rust/blob/35b658fb1071d752d092751d9f17a9fa7b154ec4/compiler/rustc_expand/src/proc_macro.rs#L110
https://github.com/rust-lang/rust/blob/35b658fb1071d752d092751d9f17a9fa7b154ec4/compiler/rustc_expand/src/mbe/macro_rules.rs#L366

@overlookmotel
Copy link
Contributor

You make a strong case for why performance wouldn't be a good reason to codegen these 2 traits. Thanks for all the background info. I was completely unaware of the legacy macro kinds, and Rust's macro machinery in general. Really interesting.

But, just to explain, my rationale for putting these 2 on the list wasn't for perf reasons. The reasons were:

Hash

We want Hash to skip span fields, and also symbol_id etc. Skipping Hash is currently achieved by a dummy Hash impl on Span type:

impl Hash for Span {
fn hash<H: Hasher>(&self, _state: &mut H) {
// hash to nothing so all ast spans can be comparable with hash
}
}

I saw this as a bit of a hack, and thought it would be more "proper" solution to implement Hash on all AST types with the span field explicitly skipped. That would be laborious and error-prone to do by hand, but much easier with codegen.

PartialEq

The reason why we are implementing Hash on AST types is for (if I remember right) comparing expressions in a lint rule to identify pointless if branches e.g. if (x) {} else if (x) { /* unreachable */ }. But actually Hash is not ideal for this use - PartialEq would be more efficient.

But it's hard to implement PartialEq because, again, we need to skip span fields, and the no-op hash hack isn't viable here, so we can't use #[derive(PartialEq)].

So I thought we could do that with codegen.

Conclusion

I'm not at all arguing against what you've said @rzvxa. I just wanted to explain that there was another angle.

@rzvxa
Copy link
Contributor

rzvxa commented Jul 9, 2024

Thanks for the explanation, Now I see the full picture for this one.

@Boshen
Copy link
Member Author

Boshen commented Jul 9, 2024

We can defer the implementation of Hash and PartialEq because we don't need them right now (we didn't need them for so long so probably won't need them for a longer time.)

As for Traverse, @rzvxa can decide continue working on it, or come back after research for ast template macro is completed.

@Boshen
Copy link
Member Author

Boshen commented Jul 11, 2024

Consider this is done, we can migrate Traverse at a later time, and raise new issues regarding ast codegen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category - New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants