-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate ScalarUDF output rows and fix nulls for array_has
and get_field
for Map
#10148
Changes from 8 commits
c94cac8
5d67c73
96f4fe2
5587b03
bba9bfe
9be1a5a
2a93b8c
9743382
7eebcf2
c972d7d
e5bbfaf
ed41d3a
6603135
cc53bd3
cf7fac3
fc304ae
e70245e
cda3e3b
da77fb2
efb1c5f
6c397e8
c1458c2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -114,6 +114,8 @@ impl ScalarUDFImpl for GetFieldFunc { | |
let keys = arrow::compute::kernels::cmp::eq(&key_scalar, map_array.keys())?; | ||
let entries = arrow::compute::filter(map_array.entries(), &keys)?; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. using filter will reduce the number of input rows to the number of rows that have keys matching the input key. But we want to respect the number of input rows, and give null for any rows not having the matching key There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand this If the input is like this (two rows, each three elements)
An expression like
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Previous implememtation map_array.entries() has type of
With the example above, the layout of field "fields" will be a vector of 2 array, where first array is a list of key, and second array is a list of value
with this computation, the result is a boolean aray where "key" = "c"
and thus this operation will reduce the number of rows into
Problem However, let's add a row where the map does not have key "c" in between
map_array.entries() underneath is represented as
and the return result will be
instead of
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would expect the result of evaluating
to be:
For example, in duckdb: D create table foo as values (MAP {'a':1, 'b':2, 'c':100}), (MAP{ 'a':1, 'b':2}), (MAP {'a':1, 'b':2, 'c':200});
D select * from foo;
┌───────────────────────┐
│ col0 │
│ map(varchar, integer) │
├───────────────────────┤
│ {a=1, b=2, c=100} │
│ {a=1, b=2} │
│ {a=1, b=2, c=200} │
└───────────────────────┘
D select col0['c'] from foo;
┌───────────┐
│ col0['c'] │
│ int32[] │
├───────────┤
│ [100] │
│ [] │
│ [200] │
└───────────┘ Basically a scalar function has the invarant that each input row produces exactly 1 output row There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, the previous implementation will not return null (according to slt https://github.com/apache/datafusion/pull/10148/files#diff-4db02ec06ada20062cbb518eeb713c2643f5a51e89774bdadd9966410baa7d1dR47) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i also explained in this discussion: #10148 (comment) |
||
let entries_struct_array = as_struct_array(entries.as_ref())?; | ||
let st = entries_struct_array.column(1).clone(); | ||
println!("{:?}", st.len()); | ||
Ok(ColumnarValue::Array(entries_struct_array.column(1).clone())) | ||
} | ||
(DataType::Struct(_), ScalarValue::Utf8(Some(k))) => { | ||
|
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -5169,8 +5169,9 @@ false false false true | |||||||||||||||||||||||
true false true false | ||||||||||||||||||||||||
true false false true | ||||||||||||||||||||||||
false true false false | ||||||||||||||||||||||||
false false false false | ||||||||||||||||||||||||
false false false false | ||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The test result does not look correct, because it ignore some null rows in between |
||||||||||||||||||||||||
NULL NULL false false | ||||||||||||||||||||||||
false false NULL false | ||||||||||||||||||||||||
false false false NULL | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
query BBBB | ||||||||||||||||||||||||
select array_has(arrow_cast(column1, 'LargeList(List(Int64))'), make_array(5, 6)), | ||||||||||||||||||||||||
|
@@ -5183,8 +5184,9 @@ false false false true | |||||||||||||||||||||||
true false true false | ||||||||||||||||||||||||
true false false true | ||||||||||||||||||||||||
false true false false | ||||||||||||||||||||||||
false false false false | ||||||||||||||||||||||||
false false false false | ||||||||||||||||||||||||
NULL NULL false false | ||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I double checked and the datafusion/datafusion/sqllogictest/test_files/array.slt Lines 58 to 68 in cc53bd3
|
||||||||||||||||||||||||
false false NULL false | ||||||||||||||||||||||||
false false false NULL | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
query BBBB | ||||||||||||||||||||||||
select array_has(column1, make_array(5, 6)), | ||||||||||||||||||||||||
|
@@ -5197,8 +5199,9 @@ false false false true | |||||||||||||||||||||||
true false true false | ||||||||||||||||||||||||
true false false true | ||||||||||||||||||||||||
false true false false | ||||||||||||||||||||||||
false false false false | ||||||||||||||||||||||||
false false false false | ||||||||||||||||||||||||
NULL NULL false false | ||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Iikewise I agree this should have 7 output rows datafusion/datafusion/sqllogictest/test_files/array.slt Lines 80 to 90 in cc53bd3
|
||||||||||||||||||||||||
false false NULL false | ||||||||||||||||||||||||
false false false NULL | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
query BBBBBBBBBBBBB | ||||||||||||||||||||||||
select array_has_all(make_array(1,2,3), make_array(1,3)), | ||||||||||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌 -- very nice