-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Improve hive partition pruning with datetime predicates from SQL #19680
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #19680 +/- ##
=======================================
Coverage 79.73% 79.74%
=======================================
Files 1541 1541
Lines 212227 212262 +35
Branches 2449 2449
=======================================
+ Hits 169226 169259 +33
- Misses 42446 42448 +2
Partials 555 555 ☔ View full report in Codecov by Sentry. |
I like this. Maybe in the future we can add a general |
StatsEvaluator
currently only works if the inputs are literals as it directly takes from theExpr
-col('A') < lit(1)
- [ok]col('A') < lit('2024-01-01').str.strptime(..)
- [err]LazyFrame.sql()
- as it does not have access to the IR context / schema during creation.This PR adds an
evaluate_inline
toPhysicalExpr
with some basic support for literals, casting andFuntionExpr
s. The function is intended to evaluate toSome(column)
only if we consider the operation to be cheap:Cast
/FunctionExpr
will only evaluate if the input is of length 1, which is generally the case with literals in predicate expressionsOnceLock
- as they can be called multiple times from the stats evaluatorWith this change the folllowing query should now be able to prune files on hive partitions:
scan_parquet(...).sql("select * from self where hive_date1 < '2024-01-02'")