Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement General Purpose Constant Folding with the Expression Evaluator #1070

Closed
alamb opened this issue Oct 4, 2021 · 0 comments · Fixed by #1153
Closed

Implement General Purpose Constant Folding with the Expression Evaluator #1070

alamb opened this issue Oct 4, 2021 · 0 comments · Fixed by #1153
Labels
datafusion Changes in the datafusion crate enhancement New feature or request performance Make DataFusion faster

Comments

@alamb
Copy link
Contributor

alamb commented Oct 4, 2021

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

A classic part of query optimization is algebraic transformations such as partially evaluating expressions once at plan time rather than over and over for each row during execution time.

For example, a predicate such as where time < date_trunc('2021-10-04Z10:12:13', 'year') can be rewritten to where time < '2021-01-01Z00:00:00' which both saves many redundant evaluations of the date_trunc functions and also unlocks additional optimizations such as parquet row group pruning and using constant comparison kernels.

DataFusion has a basic constant folding implementation here: https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/optimizer/constant_folding.rs

However, as implemented, it has a few drawbacks:

  1. It only covers a few algebraic transformations (like boolean algebra and now() expansion)
  2. It effectively is a second implementation of expression evaluation so
    2a. As new expression support is added, it would also need to be added to constant_folding.rs
    2b. It runs the risk of producing different answers than if the expression had been evaluated at runtime
  3. It does not handle functions and sorting out how to support it for user defined functions would be non trivial

Describe the solution you'd like

Reuse the existing expression evaluation framework (namely PhysicalExpr::evaluate and everything in physical_plan/expressions) to implement constant folding.

This would be beneficial because:

  1. All current and future expression types could be evaluated (including user defined functions)
  2. It would allow more sophisticated expression transformations / optimziations such as WIP: Extended Tokomak optimizer #1066

The high level idea would be to walk the Expr tree bottom up, and if a subtree contained only constants (and non volitalie functions #1069) create and run a PhysicalExpr to produce a single value, and then replace the subtree with that appropriate constant.

Describe alternatives you've considered
I think it is possible to implement expression evaluation as a set of rewrite rules (as is partially done in #1066) but that still has the downside that the behavior can deviate from the actual expression evaluation in PhysicalExpr

Additional context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate enhancement New feature or request performance Make DataFusion faster
Projects
None yet
1 participant