-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRAFT: Resolve function calls by name during planning #8447
Conversation
Thanks @edmondop -- I hope to check this out in more detail tomorrow |
Do you mean follow the model of try_optimize /// `OptimizerRule` transforms one [`LogicalPlan`] into another which
/// computes the same results, but in a potentially more efficient
/// way. If there are no suitable transformations for the input plan,
/// the optimizer can simply return it as is.
pub trait OptimizerRule {
/// Try and rewrite `plan` to an optimized form, returning None if the plan cannot be
/// optimized by this rule.
fn try_optimize(
&self,
plan: &LogicalPlan,
config: &dyn OptimizerConfig,
) -> Result<Option<LogicalPlan>>;
... Maybe we could do something similar for AnalayzerRules like pub trait AnalyzerRule {
/// Rewrite `plan`
fn analyze(&self, plan: LogicalPlan, config: &AnalyzerConfig) -> Result<LogicalPlan>;
...
}
/// Options to control DataFusion Analyzer Passes.
pub trait AnalyzerConfig {
/// Return a function registry for resolving names
fn function_registry(&self) -> &dyn FunctionRegistry;
/// return datafusion configuration options
fn options(&self) -> &ConfigOptions;
} I am not sure if the compiler will be happy with that construction or not so we may have to play around with it Let me know if you want me to give it a try Thanks for pushing this along cc @2010YOUY01 |
@alamb apologies, I would need some guidance also on what to do on some unit tests. For example, this test https://github.com/apache/arrow-datafusion/blob/main/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs#L1533-L1549 How do you think we should proceed? |
I think these tests will need to be updated to resolve the expressions However, I think this brings up a good point about making sure the For any API on the However, if we create the exprs directly like in Maybe we could add a function like So it could be used like let expr = now()
.cast(DataType::Integer) // cast now to integer
.resolve(®istry) // recursively resolve all functions calls, like `now()` into actual function referenes 🤔 I can try and sketch up something like this later today if you want? |
That would be great. I am not sure how to do that without duplicating the
code from the Analyzer that resolves built in functions
…On Mon, 11 Dec 2023 at 07:21, Andrew Lamb ***@***.***> wrote:
For example, this test
https://github.com/apache/arrow-datafusion/blob/main/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs#L1533-L1549
will never pass now, because it uses the now function, that uses call_fn
which doesn't get resolved if they analyzer is not applied
I think these tests will need to be updated to resolve the expressions
"now" to the actual now() function
However, I think this brings up a good point about making sure the expr_fns
can be easily used with the lower level expr api (like simplification).
For any API on the SessionContext or DataFrame, expr_fn::now() and
related functions will be run through the analyzer.
However, if we create the exprs directly like in expr_simplifer (and
there are some examples) we have to arrange for the expressions to be
resolved.
Maybe we could add a function like Expr::resolve to do this more easily?
So it could be used like
let expr = now()
.cast(DataType::Integer) // cast now to integer
.resolve() // recursively resolve all functions calls, like `now()` into actual function referenes
🤔
I can try and sketch up something like this later today if you want?
—
Reply to this email directly, view it on GitHub
<#8447 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGGOKITQEOGLT7F5IF4AKTYI4QIBAVCNFSM6AAAAABAKNZHHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJQGI4TIMBXGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Helping with this PR is on my list, but I am behind on other reviews and bug fixes. Sorry for the delay |
Thank you for taking this issue! @edmondop Looks like
I found there are several issues for this rewrite to resolve approach:
Is it possible to just do Reference: current let pow = df.registry().udf("pow")?;
let expr1 = pow.call(vec![col("a"), col("b")]); |
🤔 I played around with this PR some this afternoon. I came to the same conclusion that we would have to make non trivial API changes to the expr_fn() APIs if we go the "must resolve string names to functions" as part of an analysis pass. The only other thing I can think of is to change the dependency structure Right now, the expr_fns are in the We could potentially move them from Let me give that a whirl to see how it would look |
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
Addresses #8157
Implement an AnalyzerRules that rewrites Expr::ScalarFunction replacing
ScalarFunctionDefinition::Name
intoScalarFunctionDefinition::BuiltIn
orScalarFunctionDefinition:ScalarUDF
at plan time.Introduces a new
AnalyzerConfig
to wrap aFunctionRegistry
andConfigOptions
Ignore from here, left for reference
Unclear how to pass the FunctionRegistry down to the analyzer rule.
Option 1: When creating the analyzer
Using this approach means giving up the Default implementation, as well as doing some refactoring in https://github.com/apache/arrow-datafusion/blob/c8e1c84e6b4f1292afa6f5517bc6978b55758723/datafusion/core/src/execution/context/mod.rs#L1348, since the analyzer needs a FunctionRegistry and the SessionState implements that trait
Option 2: When invoking the analyzer
Using this approach means changing the
pub trait Analyzer
adding a fourth parameter.However this creates on its own a problem
Maybe the best solution is to create an intermediate struct that contains the various HashMaps of SessionState, an pass that struct to the Analyzer, as well as the SessionState constructor? Suggestions are welcome :)