-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve consistency and documentation on error handling in in UDFs #11618
Comments
We should definitely explain them better in the doc. Planning Error
Execution Error
|
I agree with what @2010YOUY01 says Some additional comments about "Planning Error" are that for most UDFs that provide a signature DataFusion will ensure that the function is called with one of the valid types. The UDF can assume the inputs are one of the valid combinations If it wants to check and validate, it should return an internal_err if the planner provides arguments that don't match one of its proposed signatures |
I had done some work in aligning the errors in the past, see #9164 for details |
@2010YOUY01 just to confirm, for example in this case
it should return a planner error, not an exec_error. Also the check on the length of the arguments should be done in return_type and not in
Is this correct? |
I agree. The first error can be triggered by bad input, so it's a planning error
The second one is redundant, but it's also okay to change it to |
Ideally argument check should be handled by But before the |
I had file an issue to improve on signature for coercion and length + type checking for functions #10507 Although I'm not sure does it really help or not since |
@jayzhan211 do you think we should start by adding an helper macro for pattern matching branches of UDF udf_unreachable!() that might create an InternalError DataFusionError::Internal("This branch of the UDF should never be executed since pre-requisites are checked through type_signature and return_type") |
IMO we should improve on signature instead of adding check in different function. This not only add unnecessary overhead and let the code less readable. And ideally we don't need any check in |
Wouldn't inevitably be necessary if you do pattern matching to handle the branches that are impossible? (Because of signature enforcement) Only if we use refined types it would be possible to avoid I think |
I guess at least we could verified it with |
Is your feature request related to a problem or challenge?
When writing a new UDF, a developer needs to decide how to perform error management in functions that return
Result
, such asreturn_type
andinvoke
. Looking at the existing codebase it is not obvious what are the error management best practicesErrors in return type
Regexp like uses an
plan_err
data types of args do not match the expected typesdatafusion/datafusion/functions/src/regex/regexplike.rs
Line 74 in 77311a5
Except expects two arguments but doesn't check the length, just the type
datafusion/datafusion/functions-array/src/except.rs
Line 55 in 77311a5
Errors in invoke
In the resize function, there is a check on argument lengths in invoke which is not present in the
return_type
functiondatafusion/datafusion/functions-array/src/resize.rs
Line 27 in 77311a5
The same function also returns
exec_err
, andinternal_datafusion_err
Describe the solution you'd like
As a developers of custom UDF I would like to know:
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: