Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support UDAF in substrait producer/consumer #8119

Merged
merged 3 commits into from
Nov 12, 2023

Conversation

waynexia
Copy link
Member

Which issue does this PR close?

Closes #.

Rationale for this change

Support to serialize and deserialize UDAF in substrait. Unlike extension plan, UDAF has more information in substrait proto, it's wrapped in Aggregate Rex so we only need to implement the name -> impl part.

What changes are included in this PR?

Add support of UDAF to substrait logical producer and consumer

Are these changes tested?

Yes.

Are there any user-facing changes?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @waynexia -- this makes sense to me and given the test coverage I think it is ready to merge!

for arg in args {
arguments.push(FunctionArgument { arg_type: Some(ArgType::Value(to_substrait_rex(arg, schema, 0, extension_info)?)) });
}
let function_name = fun.name.to_lowercase();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why lowercase?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a specific reason, and I guess this is a convention from translating SQL ident, where strings are preprocessed with normalize_ident() before mapping to concrete functions. But at this stage the input is all logical plans. to_lowercase() seems unnecessary. How do you think about it?

Copy link
Member Author

@waynexia waynexia Nov 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I found that to_lowercase is unnecessary since _register_function() will do that. I've removed all to_lowercase in producer.rs except the one in register_function()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @waynexia

filter,
order_by,
})))
// try udaf first, then built-in aggr fn.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically the same function resolution logic that we have for sql. I wonder if we should consolidate it somewhere 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering putting this logic into SessionContext, as well as resolving scalar and window functions. However sql_function_to_expr uses ContextProvider to do this resolving. Maybe we have to do some refactors on ContextProvider at first.

By the way, I noticed that ContextProvider has lots of mock impl but those function resolving methods are only used in sql_function_to_expr. It also looks like an indication that refactoring is needed to me.

datafusion/substrait/src/logical_plan/consumer.rs Outdated Show resolved Hide resolved
@alamb alamb merged commit 824bb66 into apache:main Nov 12, 2023
22 checks passed
@alamb
Copy link
Contributor

alamb commented Nov 12, 2023

Thanks @waynexia !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants