-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support UDAF in substrait producer/consumer #8119
Conversation
Signed-off-by: Ruihang Xia <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @waynexia -- this makes sense to me and given the test coverage I think it is ready to merge!
for arg in args { | ||
arguments.push(FunctionArgument { arg_type: Some(ArgType::Value(to_substrait_rex(arg, schema, 0, extension_info)?)) }); | ||
} | ||
let function_name = fun.name.to_lowercase(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why lowercase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a specific reason, and I guess this is a convention from translating SQL ident, where strings are preprocessed with normalize_ident()
before mapping to concrete functions. But at this stage the input is all logical plans. to_lowercase()
seems unnecessary. How do you think about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: I found that to_lowercase
is unnecessary since _register_function()
will do that. I've removed all to_lowercase
in producer.rs
except the one in register_function()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @waynexia
filter, | ||
order_by, | ||
}))) | ||
// try udaf first, then built-in aggr fn. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically the same function resolution logic that we have for sql. I wonder if we should consolidate it somewhere 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was considering putting this logic into SessionContext
, as well as resolving scalar and window functions. However sql_function_to_expr
uses ContextProvider
to do this resolving. Maybe we have to do some refactors on ContextProvider
at first.
By the way, I noticed that ContextProvider
has lots of mock impl but those function resolving methods are only used in sql_function_to_expr
. It also looks like an indication that refactoring is needed to me.
Co-authored-by: Andrew Lamb <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Thanks @waynexia ! |
Which issue does this PR close?
Closes #.
Rationale for this change
Support to serialize and deserialize UDAF in substrait. Unlike extension plan, UDAF has more information in substrait proto, it's wrapped in Aggregate Rex so we only need to implement the
name
->impl
part.What changes are included in this PR?
Add support of UDAF to substrait logical producer and consumer
Are these changes tested?
Yes.
Are there any user-facing changes?