Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example for writing an SQL analysis pass #10871

Closed
Tracked by #7013
alamb opened this issue Jun 11, 2024 · 3 comments · Fixed by #10938
Closed
Tracked by #7013

Add example for writing an SQL analysis pass #10871

alamb opened this issue Jun 11, 2024 · 3 comments · Fixed by #10938
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Jun 11, 2024

Is your feature request related to a problem or challenge?

We have many good examples in https://github.com/apache/datafusion/tree/3773fb7fb54419f889e7d18b73e9eb48069eb08e/datafusion-examples

However, we don't have anexample for writing some sort of SQL analysis -- for example what @LorrensP-2158466 mentions about finding cyclic joins in #10808 (comment).

However, I know of several users of DataFusion doing this (e.g. @sadboy and SDF)

Describe the solution you'd like

The ideal example I think would be to add a file to https://github.com/apache/datafusion/tree/3773fb7fb54419f889e7d18b73e9eb48069eb08e/datafusion-examples

sql_analysis.rs

Perhaps the example could show how to create LogicalPlans for several sql query texts, and then use DataFusion structures to do an analysis (like maybe join counts, or predicate analysis or something) would be really neat.

I think this would be very valuable

Describe alternatives you've considered

No response

Additional context

No response

@LorrensP-2158466
Copy link
Contributor

I like the Join count analysis, I had to do this in my project (mentioned in #10808) in a really hacky way. We can have 2 sorts of counts:

  1. Count every individual join
  2. Count all groups of joins (subsequent joins that are related) with their respective count(this would have been useful for me)

I do have a question.
How can we show these results? Because Analyzer rules only return transformed LogicalPlans, or do I understand something in a wrong way?

@alamb
Copy link
Contributor Author

alamb commented Jun 15, 2024

Thank you @LorrensP-2158466 -- sorry for the delay . I have been traveling

I do have a question.
How can we show these results? Because Analyzer rules only return transformed LogicalPlans, or do I understand something in a wrong way?

I think in an example, we can use println! which is what we do in other examples:

println!(
"Unoptimized Logical Plan:\n\n{}\n",
logical_plan.display_indent()
);

Normally the idea is to give a well documented example that shows the basic pattern that people can start with

For this one, maybe you could show how to use TreeNode::apply to walk the tree. Something like (totally untested)

let mut join_count = 0;
plan.apply(|child| {
  if matches!(child, LogicalPlan::Join(_)) {
    join_count += 1;
  }
});

println!("Found {join_count} joins in the plan");

@LorrensP-2158466
Copy link
Contributor

Thanks for the reply!

That's exactly what I have made, I'll open up a PR later today or tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants