Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Improve hive partition pruning with datetime predicates from SQL #19680

Merged
merged 7 commits into from
Nov 8, 2024

Conversation

nameexhaustion
Copy link
Collaborator

@nameexhaustion nameexhaustion commented Nov 7, 2024

StatsEvaluator currently only works if the inputs are literals as it directly takes from the Expr -

  • col('A') < lit(1) - [ok]
  • col('A') < lit('2024-01-01').str.strptime(..) - [err]
    • This type of predicate is created by LazyFrame.sql() - as it does not have access to the IR context / schema during creation.

This PR adds an evaluate_inline to PhysicalExpr with some basic support for literals, casting and FuntionExprs. The function is intended to evaluate to Some(column) only if we consider the operation to be cheap:

  • Cast / FunctionExpr will only evaluate if the input is of length 1, which is generally the case with literals in predicate expressions
  • A depth_limit is used to protect against highly nested expressions
  • Some PhysicalExprs also cache the result in a OnceLock - as they can be called multiple times from the stats evaluator

With this change the folllowing query should now be able to prune files on hive partitions:

  • scan_parquet(...).sql("select * from self where hive_date1 < '2024-01-02'")

@nameexhaustion nameexhaustion changed the title feat: Improve hive partition pruning with datetime types perf: Improve hive partition pruning with datetime types Nov 7, 2024
@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars performance Performance issues or improvements labels Nov 7, 2024
@nameexhaustion nameexhaustion changed the title perf: Improve hive partition pruning with datetime types perf: Improve hive partition pruning with datetime predicates from SQL Nov 7, 2024
Copy link

codecov bot commented Nov 7, 2024

Codecov Report

Attention: Patch coverage is 87.83784% with 9 lines in your changes missing coverage. Please review.

Project coverage is 79.74%. Comparing base (4b03406) to head (51ab112).
Report is 20 commits behind head on main.

Files with missing lines Patch % Lines
crates/polars-expr/src/expressions/alias.rs 0.00% 6 Missing ⚠️
crates/polars-expr/src/expressions/binary.rs 81.81% 2 Missing ⚠️
crates/polars-expr/src/expressions/literal.rs 91.66% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #19680   +/-   ##
=======================================
  Coverage   79.73%   79.74%           
=======================================
  Files        1541     1541           
  Lines      212227   212262   +35     
  Branches     2449     2449           
=======================================
+ Hits       169226   169259   +33     
- Misses      42446    42448    +2     
  Partials      555      555           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@coastalwhite
Copy link
Collaborator

I like this. Maybe in the future we can add a general Context argument to this so that we can also deal with #19240 possibly.

@ritchie46 ritchie46 merged commit 235f240 into pola-rs:main Nov 8, 2024
25 checks passed
@c-peters c-peters added the accepted Ready for implementation label Nov 11, 2024
@nameexhaustion nameexhaustion deleted the io-filter-cast branch November 18, 2024 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants