Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an AST visitor #601

Closed
wants to merge 19 commits into from
Closed

Add an AST visitor #601

wants to merge 19 commits into from

Conversation

lovasoa
Copy link
Contributor

@lovasoa lovasoa commented Sep 4, 2022

This adds an optional "derive-visitor" feature which lets the user visit and mutate the AST easily

Related to #78, #114, and #189

This is not ready to be merged, as this currently depends on nikis05/derive-visitor#9, but it already works using my fork of the repo

The upstream PR has been merged, this is ready for review :)

This lets you write code like

let sql = "SELECT x FROM table_a JOIN table_b";
let mut ast = Parser::parse_sql(&GenericDialect, sql).unwrap();

ast.drive_mut(&mut visitor_enter_fn_mut(|table: &mut TableFactor| {
    if let Table { name: ObjectName(parts), .. } = table {
        parts[0].value = parts[0].value.replace("table_", "");
    }
}));

assert_eq!(ast[0].to_string(), "SELECT x FROM a JOIN b");

This adds an optional "derive-visitor" feature which lets the user visit and mutate the AST easily
@lovasoa lovasoa marked this pull request as ready for review September 5, 2022 17:32
@tokarevart
Copy link

This is cool! I hope this feature will merge soon, exactly what i need for my case. I've been trying to implement a similar thing but as a separate crate.

And my use case is writing a restricted ANSI-like SQL query builder which will be used to build queries of a specified dialect (not only ANSI and with support for query parameters) after some post processing done with help of this feature.

@iskakaushik
Copy link
Contributor

I think this is a great feature overall, but I have some questions about the approach:

  1. Given that each visitor is scoped to the sub-tree it is visiting, can there be affordances to pass along additional context?
  2. I can also imagine use cases where one might want to change the type of the node itself. Consider changing a TableFactor to a Join or something along those lines. Consider something like https://docs.rs/syn/latest/syn/fold/index.html for example.

Also, @tokarevart, you mentioned that you were trying to implement something like this as a separate crate. Curious, are there any challenges to it?

@tokarevart
Copy link

@iskakaushik in case of my solution there were not many challenges, i defined a big enum with a variants for each sqlparser's AST type, then i implemented some traversal traits for each sqlparser's AST type. That was just a lot of lines of redefinition of sqlparser's types, even though i used macroses to make it as less boilerplate as possible. And actually even if i knew about derive_visitor crate beforehand i wouldn't be able to implement @lovasoa's solution as a separate crate, i feel like derives are just too useful for solving traversal/visiting problem in the most concise way (derive_visitor uses derives to reduce boilerplate).

I like the idea and implementation proposed by @lovasoa using derive_visitor crate much more than my own, though.

About your questions i think i can answer them, @lovasoa can correct me if i'm wrong somewhere:

  1. If i got what you meant right it can be solved by implementing visitor on a type that holds the state, like Counter struct here in the example of derive_visitor crate.
  2. The only way i see it work is when node is enum and we're changing that enum's variant, as i can see (by following your link) the parenthesizing example of Fold in syn::fold module is also implemented by changing Expr's variant. I think if a user wants to change some struct (not enum) type to another then maybe they should look for every accurance of those 2 types in the same enum variants arguments and then write their mutation accordingly, that is changing enum variants. Maybe there's a better way to handle this, what everyone thinks about this?

@lovasoa
Copy link
Contributor Author

lovasoa commented Sep 8, 2022

Yes,I would have answered the same as @tokarevart :)

  1. I'm not sure I understand the question, but if it's about passing external data to the visitor, that's not a problem with the current implementation. You can pass a closure that captures its context to drive_mut, or implement the Visitor trait on anything you want.
  2. Yes, changing the type of the node itself is the most interesting. Of course you cannot just replace one type by another in the AST, that would be totally type-unsafe. But my main use case for the visitor in SQLPage, and the reason I wrote this PR is doing things like replacing expressions with different expressions. Here is how you would implement your example of changing a TableFactor to a Join :
    let sql = "select a,b from t1";
    // Table t1 has changed and its fields are now split between tables t1 and t2
    let mut ast = Parser::parse_sql(&GenericDialect, sql).unwrap();
    ast[0].drive_mut(&mut visitor_enter_fn_mut(
        |table: &mut sqlparser::ast::TableWithJoins| {
            let has_t1 = once(&table.relation)
                .chain(table.joins.iter().map(|j| &j.relation))
                .any(|r| {
                    matches!(r, TableFactor::Table {name: ObjectName(idents), ..}
                        if idents[0].value == "t1")
                });
            if has_t1 {
                table.joins.push(Join {
                    relation: Table {
                        name: ObjectName(vec![Ident::from("t2")]),
                        alias: None,
                        args: None,
                        with_hints: vec![],
                    },
                    join_operator: JoinOperator::Inner(JoinConstraint::Using(vec![Ident::from(
                        "t_id",
                    )])),
                });
            }
        },
    ));
    assert_eq!(ast[0].to_string(), "SELECT a, b FROM t1 JOIN t2 USING(t_id)");

@lovasoa
Copy link
Contributor Author

lovasoa commented Sep 13, 2022

Hello :) Any news ? When do you think this can be merged ?
I need this for SQLPage

@lovasoa
Copy link
Contributor Author

lovasoa commented Sep 24, 2022

Hey, can someone please review this, and merge this if possible ?
I would need it for sqlpage. I currently use my own fork of sqlparser, but I can't push to crates.io with a git dependency.
Notifying @andygrove, @alamb, @waitingkuo :)

@alamb
Copy link
Contributor

alamb commented Sep 24, 2022

I will review this today or tomorrow

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @lovasoa -- this looks like a very cool feature and one I have wanted in my own projects

I am concerned about the new dependency of derive-vistor (which has only been published for 14 days and is not particularly widely used yet). Are there any other options you considered ? What do you think about possibly moving the parts of derive-visitor you want to use into sqlparser itself (aka a little copying rather than dependency)?

I am also a bit concerned a little about the relatively few tests (as in I suspect I could remove most of the #[cfg_attr(feature = "derive-visitor", ...)] lines and all the tests would still pass

The other thing that I noticed while reviewing this PR is that this feature may not be very discoverable (https://docs.rs/sqlparser/latest/sqlparser/ wouldn't show it) -- perhaps as a follow on we can add an example showing it.

@lovasoa
Copy link
Contributor Author

lovasoa commented Sep 24, 2022

I could indeed have done better for tests and documentation, you are right. I'll improve these.

As for vendoring derive-vistor, are you sure about that ? This is a proc macro, s sqlparser would need to distribute an entire new crate just for that. There is not really any part of the crate that is not used here, so we would need to copy the entire codebase. And we wouldn't benefit from the contributions that happen on the original crate.

@alamb
Copy link
Contributor

alamb commented Sep 26, 2022

As for vendoring derive-vistor, are you sure about that ? This is a proc macro, s sqlparser would need to distribute an entire new crate just for that. There is not really any part of the crate that is not used here, so we would need to copy the entire codebase. And we wouldn't benefit from the contributions that happen on the original crate.

What I am thinking about is that the sqlparser crate is used in a wide variety of applications / usecases and has relatively small maintenance budget (as you can probably see from the delayed response).

I think one of the great strengths of sqlparser is its relative simplicity -- it has few dependencies, the code is relatively straightforward and east to contribute to, and is fast to compile. The proc-macro implementation in this PR is elegant and avoids boiler plate code, but I worry about the additional complexity (and now there is that much more code to maintain in sqlparser, higher barrier to new contributors, and a new source of potential issues (if something goes wrong we have to debug it in other crates)

What would you think about making a new crate that provided rewriting functionality (rather than in the sqlparser-rs crate itself)? I would be willing to help maintain this as we may end up using such functionality in my own project.

Perhaps sqlparser-rewriter-rs or something?

One example of how to implement a rewriter without changing sqlparser-rs is iox_query/src/frontend/sql_rewrite.rs in https://github.com/influxdata/influxdb_iox/pull/5438

@coveralls
Copy link

coveralls commented Sep 27, 2022

Pull Request Test Coverage Report for Build 3453111794

  • 1 of 15 (6.67%) changed or added relevant lines in 2 files are covered.
  • 290 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.02%) to 86.253%

Changes Missing Coverage Covered Lines Changed/Added Lines %
tests/sqlparser_visitor.rs 1 4 25.0%
src/lib.rs 0 11 0.0%
Files with Coverage Reduction New Missed Lines %
src/lib.rs 290 9.98%
Totals Coverage Status
Change from base Build 3448124733: -0.02%
Covered Lines: 11808
Relevant Lines: 13690

💛 - Coveralls

@lovasoa
Copy link
Contributor Author

lovasoa commented Sep 27, 2022

Hey @alamb ! Thanks for your comments.

I added more tests, and documentation to make the feature more discoverable.

Do you really think the added complexity to this repo is a lot to handle ? It is just annotations added to structs and attributes, and associated tests and documentation.

On the other hand, I think the benefit is huge, because looking at code on github, it looks like many user of this crate end up wanting to visit the AST, and currently everyone hand-rolls their own code to do that, and ends up with verbose and fragile code, that often needs to be updated when the sqlparser AST changes.

Manually implementing a visitor that is specific to sqlparser in a different crate would work, but this would be a very large amount of tedious and uninteresting work, both to write initially, and to maintain in the long term. It would be very easy for it to fall out of sync, and there would be no compiler guarantee that all fields are indeed visited.

I do agree that adding a dependency on an external crate that is not widely used is not ideal for a crate such as sqlparser. But maybe we can mitigate the risks by asking some sqlparser maintainers to be added as maintainers of derive-visitor? I for myself would be happy to help maintaining derive-visitor. This way fixing potential issues in the visitor code would be a contribution to the entire rust community, and help a larger set of rust developers. What would you think, @nikis05 ?

@nikis05
Copy link

nikis05 commented Sep 27, 2022

@lovasoa Sure, sounds good, I will be happy to add more maintainers.

@alamb
Copy link
Contributor

alamb commented Sep 28, 2022

Do you really think the added complexity to this repo is a lot to handle ? It is just annotations added to structs and attributes, and associated tests and documentation.

The short answer is yes. The longer answer is that every little feature addition / dependency addition in itself is typically not that complex. However, over time the libraries end up accumulating a large number of them.

Also, sadly, I do not have the time to review the derive visitor crate's code carefully at this time

On the other hand, I think the benefit is huge, because looking at code on github, it looks like many user of this crate end up wanting to visit the AST,

Yes I agree this is a commonly desired pattern.

Manually implementing a visitor that is specific to sqlparser in a different crate would work, but this would be a very large amount of tedious and uninteresting work, both to write initially, and to maintain in the long term. It would be very easy for it to fall out of sync, and there would be no compiler guarantee that all fields are indeed visited.

I think it would be possible to write a code generator to generate visitors / rewriters rather than requiring a bunch of annotations to the sqlparser crate. That is effectively what the proc macro is doing. This would have the additional benefit that users who wanted such a feature could bring in a new crate, and users who didn't could keep using the existing structures.

I do agree that adding a dependency on an external crate that is not widely used is not ideal for a crate such as sqlparser. But maybe we can mitigate the risks by asking some sqlparser maintainers to be added as maintainers of derive-visitor?

Thank you for the offer -- I unfortunately don't have time to maintain another crate.

@alamb
Copy link
Contributor

alamb commented Sep 28, 2022

Perhaps @Dandandan or @andygrove have some opinions they could share?

@nikis05
Copy link

nikis05 commented Sep 29, 2022

Just throwing in a possible option. There is a feature in serde that allows bypassing the orphan rule and using serde with third-party crates that don’t directly integrate it. I could implement a similar feature in derive_visitor. It would then be possible to create a separate crate that enables visitor functionality for sqlparser while leaving sqlparser’s own code free from extra dependencies or annotations.

Updating of said crate to match sqlparser‘s AST schema could be almost fully automated using some kind of github watcher that triggers a codegen run when the upstream crate is updated.

@alamb
Copy link
Contributor

alamb commented Sep 29, 2022

@nikis05 that idea sounds intriguing.

@nikis05
Copy link

nikis05 commented Sep 29, 2022

@alamb but this option has its own problems / downsides,

  1. It makes the feature somewhat less discoverable as it is in a separate crate;
  2. I would need to implement this "remote" attribute in derive-visitor. And from user prospective, having this remote / wrapper-based implementation is slightly more confusing compared to having a direct trait implementation;
  3. A codegen would need to be written to create wrapper types for sqlparser::ast AST types, and changes to sqlparser could hypothetically break said codegen;
  4. Something would need to trigger the codegen on changes. Since a GH workflow in one repo can't watch for pushes in another repo, either sqlparser's action would need to send a triggering request, or some kind of watcher service would have to be running.

This could be way too much effort compared to just adding derive_visitor to the main crate behind a feature flag, or to just implementing visitor from scratch. Even once implemented, it could fall apart at any moment, and would need someone to watch over it.

In any case if you decide to somehow support derive_visitor for this crate I would be happy to assist in any way.

@alamb
Copy link
Contributor

alamb commented Sep 29, 2022

This could be way too much effort compared to just adding derive_visitor to the main crate behind a feature flag, or to just implementing visitor from scratch. Even once implemented, it could fall apart at any moment, and would need someone to watch over it.

In my mind this is the key tradeoff -- basically who will maintain the code and how much effort they are willing to expend to do so. As mentioned, I don't have a lot of time to spend maintaining sqlparser as it is, which is why I have this extreme aversion to making it more complicated, even though I understand the arguments and benefits that the total effort would be lower if the code was in the sqlparser crate. The key difference is who would be doing the effort

@lovasoa
Copy link
Contributor Author

lovasoa commented Sep 29, 2022

@alamb : Would you be ready to add a new maintainer to sqlparser itself ? sqlparser is a core part of SQLPage, so I would be more than happy to help maintaining it, especially for the parts that I use the most myself.

@alamb
Copy link
Contributor

alamb commented Oct 1, 2022

@alamb : Would you be ready to add a new maintainer to sqlparser itself ? sqlparser is a core part of SQLPage, so I would be more than happy to help maintaining it, especially for the parts that I use the most myself.

Thank you for the offer. I have started a discussion on #643 where hopefully we can figure out a path forward

@alamb
Copy link
Contributor

alamb commented Oct 4, 2022

After more thought, I am not willing to take on or subject users of this crate to the additional potential maintenance of a new dependency to sql-parser that is not used anywhere else. I am sure it is awesome but I sadly don't have the time to help debug or polish it

Here is what I am looking at: https://crates.io/crates/derive-visitor/reverse_dependencies

Screen Shot 2022-10-04 at 6 35 09 AM

For this particular usecase, if/when I ever need it, I would likely create a new crate that automatically generates code for a visitor (rather than procedural macros in this crate)

@alamb
Copy link
Contributor

alamb commented Oct 4, 2022

If other maintainers feel differently and are willing to approve and help maintain this feature, I am not opposed. I just don't have the time to devote now

@stuartcarnie
Copy link

Overall, I like the derive-visitor crate, as it reduces a lot of boilerplate code, however, I would like to raise a small concern.

I don't believe it is an issue for the sqlparser-rs crate, as it's AST is not recursive, however, the derive-visitor crate doesn't appear to support exit conditions to handle recursion or more generally, cyclic graphs. Typically this is handled in the visitor by way of the enter or exit functions returning a status.

One possible solution is for the enter function signature to change to return a recursion status, such as the following:

enum Recursion {
    Continue,
    Stop
}

An example of this approach can be observed in DataFusion, where the "pre_" functions return a status:

https://github.com/apache/arrow-datafusion/blob/45fc415daa7028559ef3477e53a184a114149f9e/datafusion/expr/src/expr_visitor.rs#L23-L30

I think it would be important to address this, as it will break consumers if such a change were introduced later.

@nikis05
Copy link

nikis05 commented Oct 4, 2022

@stuartcarnie there is an issue currently open in derive_visitor with the idea to introduce support for early exit: nikis05/derive-visitor#5. I will gladly implement it if you think it would be useful, API proposals are welcome.

I added some design questions in the issue discussion that I need help answering before implementing this.

@lovasoa
Copy link
Contributor Author

lovasoa commented Oct 5, 2022

If other maintainers feel differently and are willing to approve and help maintain this feature, I am not opposed. I just don't have the time to devote now

@alamb , I do understand that. However, I still feel like it's a shame to leave this crate in a "semi-abandoned" state when there are people who are actually ready to do the work !

We are not talking about some obscure feature here. We are talking about something that almost all users would benefit from, and that is implemented in a few hundred lines of code in an actively maintained external crate, the maintainer of which is willing to collaborate with sqlparser.

@lovasoa
Copy link
Contributor Author

lovasoa commented Oct 5, 2022

I think it would be important to address this, as it will break consumers if such a change were introduced later.

I do agree faillible visitors are great to have in general, but in the context of sqlparser, I'm not sure handling infinitely large sql statements is a required feature. The parser does not and probably never should create recursive data structures !

@alamb
Copy link
Contributor

alamb commented Oct 6, 2022

@alamb , I do understand that. However, I still feel like it's a shame to leave this crate in a "semi-abandoned" state when there are people who are actually ready to do the work !

If you are willing to do the work, I recommend taking an alternate strategy and either forking the project, adding the feature there, and then keeping up, or take the alternate path and figure out how to auto generate visitor code without having to modify the crate itself.

Sorry I can't help more

@alamb alamb marked this pull request as draft November 30, 2022 16:36
@lovasoa
Copy link
Contributor Author

lovasoa commented Dec 31, 2022

Closing in favor of #765

I'll make another PR to add a mutable visitor to the existing system

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants