Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: why is the Visitor trait limited to statements, relations & expressions? #934

Open
freshtonic opened this issue Jul 25, 2023 · 14 comments

Comments

@freshtonic
Copy link

freshtonic commented Jul 25, 2023

What is the reason for that particular design decision versus providing a more general Visitor implementation?

Two options for a generalised Visitor trait come to mind:

  1. expose pre + post trait method variants for every AST node type, or
  2. expose only two trait methods (pre_visit + post_visit) with signatures like fn pre_visit(&mut self, node: &AstNode) -> ControlFlow<Self::Break> - where AstNode is an enum with a wrapper variant for every AST node type found in src/ast/mod.rs and can be matched against.

Would the maintainers be interested in a PR that implements one of the above two approaches?

My preference would be for option 2 because it would not break the trait when node types are added/removed.

Suggested approach:

  1. Define a new RawVisitor trait (and RawVisitorMut trait) like this:
pub trait RawVisitor {
  type Break;
  fn pre_visit(&mut self, node: &AstNode) -> ControlFlow<Self::Break>;
  fn post_visit(&mut self, node: &AstNode) -> ControlFlow<Self::Break>;
}
  1. Define an adapter type (RawVisitorAdapter ?) that accepts a V: Visitor generic argument and implements RawVisitor & RawVisitorMut, which calls the appropriate method on V (or none at all)
struct RawVisitorAdapter<V: Visitor>(v);

impl<V: Visitor> RawVisitor for RawVisitorAdapter<V> {
  type Break = V::Break;

  fn pre_visit(&mut self, node: &AstNode) -> ControlFlow<Self::Break> {
     match node {
        AstNode(Statement) => self.0.pre_visit_statement(...),
        // etc
     }
  }

  fn post_visit(&mut self, node: &AstNode) -> ControlFlow<Self::Break>;
}
  1. Change the Visit derivation macros to generate code in terms of RawVisitor & RawVisitorMut instead of Visitor, like this:
pub trait Visit {
    fn visit_raw<V: RawVisitor>(&self, visitor: &mut V) -> ControlFlow<V::Break>;

    // This has an identical signature to the existing trait, but has a default implementation
    fn visit<V: Visitor>(&self, visitor: &mut V) -> ControlFlow<V::Break> {
       self.visit_raw(RawVisitorAdapter::new(visitor))
    }
}
@alamb
Copy link
Contributor

alamb commented Aug 7, 2023

What is the reason for that particular design decision versus providing a more general Visitor implementation?

The core reasons are to keep this crate easy to extend with new sql syntax by casual contributors and limited maintenance bandwidth.

The current visitor pattern / macros require a single new #[derive] call

In terms of your proposed implementation, it seems reasonable. I would be interested in knowing how it would work when a contributor added a new node, like #897

How would we ensure that these new nodes were added to the AstNode enum? What would happen if we forgot to do so?

perhaps @lovasoa and @tustvold have some other perspectives

Basically I think a general purpose rewriter / visitor would be a great addition, as long as we don't make it too hard to add new code to this repo

@lovasoa
Copy link
Contributor

lovasoa commented Aug 7, 2023

@freshtonic AST visiting in sqlparser-rs has a long history. It took very long for a design to be agreed on and merged. I had proposed an alternative, more general, approach that would have allowed to do what you wanted: #601

But it was rejected, by fear it would take too much work to maintain.

My proposition to embark new maintainers for the feature (myself and @nikis05) was rejected.

sqlparser-rs is at the very core of SQLPage, so I can reiterate my proposition: I wouldn't mind being added as a maintainer here :)

@alamb
Copy link
Contributor

alamb commented Aug 7, 2023

sqlparser-rs is at the very core of SQLPage, so I can reiterate my proposition: I wouldn't mind being added as a maintainer here :)

Indeed -- I am looking for help maintaining it for sure. I think you can do all the time consuming parts of maintenance (PR reviews, answering questions / tickets, etc) without having write access to the repository.

I am waiting for someone to actually show the initiative to do this work (there are plenty of people willing to write code).

I had a promising candidate in @AugustoFKL (see #808) but sadly there were some circumstances that required them to stop working here

I will write up a discussion explaining the current state of affairs

Update: #940

@freshtonic
Copy link
Author

freshtonic commented Aug 7, 2023

In terms of your proposed implementation, it seems reasonable. I would be interested in knowing how it would work when a contributor added a new node, like #897

How would we ensure that these new nodes were added to the AstNode enum? What would happen if we forgot to do so?

That's a great question. Assuming the parent node of the new node has a derived Visitor instance then compilation would fail due to the derived code not being able to find a matching AstNode variant, which is not ideal.

Ideally, the AstNode enum would be generated automatically via a proc macro, by walking the types starting from sqlparser::ast::Statement as the root type, but that would require some deep macro voodoo. I'm not even sure it's possible without using something like rust-analyzer as a library to discover all of the types recursively. Someone please let me know if it's achievable.

@lovasoa
Copy link
Contributor

lovasoa commented Aug 8, 2023

You don't need to create a huge enum. You can use the built in Any trait. Have a look at my PR ;)

@freshtonic
Copy link
Author

@lovasoa depends on if you want to match on it. I'll take a look at you PR though.

@alamb
Copy link
Contributor

alamb commented Aug 22, 2023

FWIW #951 added visiting TableFactor as well

@freshtonic
Copy link
Author

Update: I've been experimenting with various implementation strategies to generalise the AST visitor. Nothing to ready to show yet, but this is business critical for my company so it's a high priority to figure out something that works well.

I'll update this issue with a link to a draft PR hopefully in a matter of days.

@freshtonic
Copy link
Author

Progress update - I have a working generalised visitor implementation. I'll put up a PR later this week when I've done some more testing.

@osaton
Copy link

osaton commented Feb 10, 2024

Is this still happening? Generalised visitor implementation would be appreciated.

@freshtonic
Copy link
Author

freshtonic commented Feb 11, 2024

@osaton @alamb yes this is still happening. Apologies for the lack of comms.

TL;DR I ended up taking quite a different approach to what I originally planned, due to our (the company I work for) evolving needs.

The comprehensive Visitor implementation will be initially released as its own crate. My original intention was to release a PR against sqlparser itself, but the design space is fairly large and I'd like to retain the freedom to iterate on it after releasing the code (at least before committing to a version 1 release).

Summary of features so far:

  • A Visitor implementation is notified of every type in the sqlparser AST (200+ types when you take into account the Vec, Box and Option combinations, plus primitives).
  • The Visitor trait itself is generic over the AST node type of interest (there aren't 200+ trait methods - in fact there are just two: enter and exit).
  • The AST traversal is performed in an optimal order for semantic analysis workloads (instead of the order of AST fields in the source). Specifically, AST nodes that can refer to other AST nodes are visited after the AST node that introduces the referent - e.g. a FROM clause is visited before the a projection that refers to columns in the tables introduced by the FROM clause. This cuts down on the number of passes required for semantic analysis workloads.

In order to pull that off, I needed to write a build.rs script to analyse sqlparser's sources because hand writing AST traversal in semantically-useful order for all of the AST types would have been a lot of work (and error prone).

I need to write up some docs but I would love some feedback on the code and general approach before I publicly publish the repo. I can share it privately if either of you are interested in taking a look.

@freshtonic
Copy link
Author

freshtonic commented Feb 11, 2024

Three features that I really wanted from sqlparser while writing the enhanced Visitor code:

  1. Optional feature to have a unique ID baked into all AST nodes

It's possible for multiple AST nodes at different places in the tree that have the same type to have the same hash and compare as equal. This seems counter intuitive, but two syntactically identical nodes can be different semantically.

Not having robust (and cheap) means of uniquely identifying a node makes building HashMaps of derived metadata keyed by node particularly tricky.

  1. Optional feature to have span information baked into all AST nodes

This would be super useful for debugging and reporting errors nicely (like miette does).

  1. Type per enum variant (for Statement and Expr at least, maybe some additional other types)

It would be nice to not have to match an enum in order to pluck out a variant. Instead I'd like be able to implement Visitor<InsertStatement>.

1 & 2 could be solved by wrapping every struct field or enum variant field in a Node<T> type that Derefs to T but also contains a unique ID and span info. Because of the Deref it might be possible to do this without breaking backwards compatibility but I haven't actually tried it.

@osaton
Copy link

osaton commented Feb 12, 2024

@freshtonic Your visitor crate sounds perfect for my use case. I would love to try it but I'm not sure if I'm able to give any constructive criticism about the code as I'm fairly new to Rust and systems programming languages in general.

  1. Optional feature to have a unique ID baked into all AST nodes

It's possible for multiple AST nodes at different places in the tree that have the same type to have the same hash and compare as equal. This seems counter intuitive, but two syntactically identical nodes can be different semantically.

Not having robust (and cheap) means of uniquely identifying a node makes building HashMaps of derived metadata keyed by node particularly tricky.

  1. Optional feature to have span information baked into all AST nodes

This would be super useful for debugging and reporting errors nicely (like miette does).

  1. Type per enum variant (for Statement and Expr at least, maybe some additional other types)

I don't know if this can be done without breaking changes, but having both 1 and 2 would be amazing.

@ryb73
Copy link

ryb73 commented Apr 9, 2024

+1 to this – would be curious to try out your crate @freshtonic if/when it gets to a publishable state :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants