-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate ScalarUDF output rows and fix nulls for array_has
and get_field
for Map
#10148
Validate ScalarUDF output rows and fix nulls for array_has
and get_field
for Map
#10148
Conversation
ScalarFunctionDefinition::UDF(ref fun) => fun.invoke(&inputs), | ||
ScalarFunctionDefinition::UDF(ref fun) => { | ||
let output = fun.invoke(&inputs)?; | ||
let output_count = match &output { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datafusion/datafusion/functions/src/utils.rs
Line 108 in 19356b2
let result = (inner)(&args); |
I think we should check it inside the function wrapper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then anyone who implement ScalarUDFImpl trait and not using this util can miss this validation right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example this function:
fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then anyone who implement ScalarUDFImpl trait and not using this util can miss this validation right?
True, but in another way, all the function is forced to check the length, even if it does not expect to.
But, I didn't think of any example that the rows length may change, probably only aggregate func like arrayagg maybe change the rows length 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the initial purpose of this validation is to announce users who define their own udf to follow this constraint (maybe they provide a wrong implementation, and our code just panic with ambiguous error, for example the error in the reported issue #5635
after adding this validation, i found one udf violating this rule which is arrow_typeof.
Can we consider this function a udaf, i need your opinion here @alamb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ColumnarValue::Scalar is used for single row
hmm, maybe this is the point i got confused, because the way i look at how this ColumnarValue work, this is not correct. e.g in projection stream above. How i understand is: ColumnarValue::Scalar looks like a single value represent a single shared value of n rows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skip the validation for arrow_typeof
is fine to me, but skip it for ScalarValue
in general is not. I think most of the function expects scalar in scalar out, so we still need to check if the result is scalar (single row).
will be done some where else, depends on the plan
The assumption here is possible to change in the future, if it changes, the code breaks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for the review :D understood
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw I also tested with Spark and Duckdb for their version of "typeof"
Duckdb
toai@ToaiPC:~/proj/rust/arrow-datafusion/datafusion/core/tests/data$ duckdb
v0.10.2 1601d94f94
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D select typeof(ints) from parquet_map.parquet
;
┌──────────────────────┐
│ typeof(ints) │
│ varchar │
├──────────────────────┤
│ MAP(VARCHAR, BIGINT) │
│ MAP(VARCHAR, BIGINT) │
│ MAP(VARCHAR, BIGINT) │
│ MAP(VARCHAR, BIGINT) │
│ MAP(VARCHAR, BIGINT) │
│ MAP(VARCHAR, BIGINT) │
and Spark
val data = Seq(
("Alice", Map("age" -> "25", "email" -> "[email protected]")),
("Bob", Map("age" -> "30", "email" -> "[email protected]")),
("Carol", Map("age" -> "35", "email" -> "[email protected]"))
)
val df = data.toDF("name", "attributes")
val types = df.select($"name", typeof($"name").as("name_type"), $"attributes", typeof($"attributes").as("attributes_type"))
types.show(false)
+-----+---------+---------------------------------------+------------------+
|name |name_type|attributes |attributes_type |
+-----+---------+---------------------------------------+------------------+
|Alice|string |{age -> 25, email -> [email protected]}|map<string,string>|
|Bob |string |{age -> 30, email -> [email protected]} |map<string,string>|
|Carol|string |{age -> 35, email -> [email protected]}|map<string,string>|
+-----+---------+---------------------------------------+------------------+
Look like they also respect number of input row
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh actually current arrow_typeof also respect this behavior (added extra test case in arrow_typeof.slt)
query T
select arrow_typeof(col) from (select 1 as col union select 2 as col) as table_a;
----
Int64
Int64
=> ArrowTypeOfFunc always return a Scalar
Ok(ColumnarValue::Scalar(ScalarValue::from(format!(
"{input_data_type}"
))))
but the rendered record batch shows n rows anyway, does this mean "wrong implementation but with correct result"?
The omitted row is where column1 = null |
@@ -5169,8 +5169,9 @@ false false false true | |||
true false true false | |||
true false false true | |||
false true false false | |||
false false false false | |||
false false false false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test result does not look correct, because it ignore some null rows in between
ScalarFunctionDefinition::UDF(ref fun) => { | ||
let output = fun.invoke(&inputs)?; | ||
// Only arrow_typeof can bypass this rule | ||
if fun.name() != "arrow_typeof" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should introduce validate_number_of_rows
to ScalarUDFImpl with the default value is true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i added to the trait and the wrapper
} | ||
// respect null input | ||
(_, _) => { | ||
boolean_builder.append_null(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -44,6 +44,7 @@ DELETE 24 | |||
query T | |||
SELECT strings['not_found'] FROM data LIMIT 1; | |||
---- | |||
NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with Map. Why should we return null here?
Without the change in Map, what is the error like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will throws the invalidation error i added in this PR. I think the correct behavior is to return null for every input rows not having the associated key. I took a look at duckdb and spark and they also have this behavior
import org.apache.spark.sql.functions._
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().appName("Spark SQL Map Example").getOrCreate()
import spark.implicits._
val data = Seq(
("Alice", Map("age" -> "25", "email" -> "[email protected]")),
("Bob", Map("age" -> "30", "email" -> "[email protected]")),
("Carol", Map("age" -> "35", "email" -> "[email protected]"))
)
val df = data.toDF("name", "attributes")
val result = df.select($"name", $"attributes.email".as("email"),$"attributes.notfound".as("should_be_null"))
// Show the DataFrame
result.show(false)
+-----+-----------------+--------------+
|name |email |should_be_null|
+-----+-----------------+--------------+
|Alice|[email protected]|NULL |
|Bob |[email protected] |NULL |
|Carol|[email protected]|NULL |
+-----+-----------------+--------------+
And also, a similar implementation in Datafusion is array_element also return null if the index goes out of range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is how it works on main:
> select strings['not_found'] from 'datafusion/core/tests/data/parquet_map.parquet';
0 row(s) fetched.
Elapsed 0.006 seconds.
Here is how it works on this PR (aka has a single row for each input row)
DataFusion CLI v37.1.0
> select strings['not_found'] from '../datafusion/core/tests/data/parquet_map.parquet';
+----------------------------------------------------------------------+
| ../datafusion/core/tests/data/parquet_map.parquet.strings[not_found] |
+----------------------------------------------------------------------+
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| . |
| . |
| . |
+----------------------------------------------------------------------+
209 row(s) fetched. (First 40 displayed. Use --maxrows to adjust)
Elapsed 0.033 seconds.
let map_array = as_map_array(array.as_ref())?; | ||
let key_scalar = Scalar::new(StringArray::from(vec![k.clone()])); | ||
let keys = arrow::compute::kernels::cmp::eq(&key_scalar, map_array.keys())?; | ||
let entries = arrow::compute::filter(map_array.entries(), &keys)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using filter will reduce the number of input rows to the number of rows that have keys matching the input key. But we want to respect the number of input rows, and give null for any rows not having the matching key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this
If the input is like this (two rows, each three elements)
{ a: 1, b: 2, c: 100}
{ a: 3, b: 4, c: 200}
An expression like col['c']
will still return 2 rows (but each row will have only a single element)
{ c: 100 }
{ c: 200 }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous implememtation
map_array.entries() has type of
pub struct StructArray {
len: usize,
data_type: DataType,
nulls: Option<NullBuffer>,
fields: Vec<ArrayRef>,
}
With the example above, the layout of field "fields" will be a vector of 2 array, where first array is a list of key, and second array is a list of value
[0]: ["a","b","c","a","b",c"]
[1]: [1,2,100,3,4,200]
let keys = arrow::compute::kernels::cmp::eq(&key_scalar, map_array.keys())?;
with this computation, the result is a boolean aray where "key" = "c"
[false,false,true,false,false,true]
and thus this operation will reduce the number of rows into
let entries = arrow::compute::filter(map_array.entries(), &keys)?;
[0]: ["c,"c"]
[1]: [100,200]
Problem
However, let's add a row where the map does not have key "c" in between
{ a: 1, b: 2, c: 100}
{ a: 1, b: 2}
{ a: 3, b: 4, c: 200}
map_array.entries() underneath is represented as
[0]: ["a,"b","c","a","b","a","b","c"]
[1]: [1,2,100,1,2,3,4,200]
let entries = arrow::compute::filter(map_array.entries(), &keys)?;
Now rows after filtered will be
[0]: ["c","c"]
[1]: [100,200]
and the return result will be
{ c: 100 }
{ c: 200 }
instead of
{ c: 100 }
null
{ c: 200 }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect the result of evaluating col[b]
on
{ a: 1, b: 2, c: 100}
{ a: 1, b: 2}
{ a: 3, b: 4, c: 200}
to be:
{ c: 100 }
null
{ c: 200 }
For example, in duckdb:
D create table foo as values (MAP {'a':1, 'b':2, 'c':100}), (MAP{ 'a':1, 'b':2}), (MAP {'a':1, 'b':2, 'c':200});
D select * from foo;
┌───────────────────────┐
│ col0 │
│ map(varchar, integer) │
├───────────────────────┤
│ {a=1, b=2, c=100} │
│ {a=1, b=2} │
│ {a=1, b=2, c=200} │
└───────────────────────┘
D select col0['c'] from foo;
┌───────────┐
│ col0['c'] │
│ int32[] │
├───────────┤
│ [100] │
│ [] │
│ [200] │
└───────────┘
Basically a scalar function has the invarant that each input row produces exactly 1 output row
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the previous implementation will not return null (according to slt https://github.com/apache/datafusion/pull/10148/files#diff-4db02ec06ada20062cbb518eeb713c2643f5a51e89774bdadd9966410baa7d1dR47)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i also explained in this discussion: #10148 (comment)
array_has
and get_field
for Map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the work here @duongcongtoai and @jayzhan211 . I don't think this PR handles scalar values correctly -- I left some comments explaining why. Let me know what you think
if fun.validate_number_of_rows() { | ||
let output_size = match &output { | ||
ColumnarValue::Array(array) => array.len(), | ||
ColumnarValue::Scalar(_) => 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this logic is correct -- specifically ColumnarValue::Scalar
represents (logically) a single value for all the rows.
Thus I think if the function returns ColumnarValue::Scalar
that implicitly can be any number of rows.
Therefore I think the check could be something like
if let ColumnarValue::Array(array) = &output {
if output_size != array.len() {
return internal_err...
}
}
I'll try and make a PR to improve the documentation on ColumnarValue to make this clearer: https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.ColumnarValue.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, so we don't need to check the length if the output is a scalar value, we actually had a discussion here: #10148 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a PR to improve the docs #10265
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh then perhaps the current implementation of arrow_typeof can be wrong, if we have a map like this:
{id:1,name:"bob"}
{id:"string_id",name:"alice"}
then arrow_typeof(map.id) will returns
int64
int64
Is it possible to have this kind of input, because a map's schema can be dynamic per row
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I agree that we can check with Array only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is map key allowed to have different types, which db has the similar model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha i 'v read duckdb's map definition again
A dictionary of multiple named values, each key having the same type and each value having the same type. Keys and values can be any type and can be different types from one another
I am new to this kind of project, but i think we don't have and should not have that use case 🤔. Else it will affect alot of execution logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Duckdb, map keys can be different value for each row, but the types should not be different.
In OLAP which has the columnar format (we use arrow), it does not make sense to have different types on different rows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MAPs must have a single type for all keys, and a single type for all values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -69,4 +69,8 @@ impl ScalarUDFImpl for ArrowTypeOfFunc { | |||
"{input_data_type}" | |||
)))) | |||
} | |||
|
|||
fn validate_number_of_rows(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct -- arrow_typeof always returns the same number of rows as its input...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then i think this new method is no longer needed
let map_array = as_map_array(array.as_ref())?; | ||
let key_scalar = Scalar::new(StringArray::from(vec![k.clone()])); | ||
let keys = arrow::compute::kernels::cmp::eq(&key_scalar, map_array.keys())?; | ||
let entries = arrow::compute::filter(map_array.entries(), &keys)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this
If the input is like this (two rows, each three elements)
{ a: 1, b: 2, c: 100}
{ a: 3, b: 4, c: 200}
An expression like col['c']
will still return 2 rows (but each row will have only a single element)
{ c: 100 }
{ c: 200 }
Requested change made, please help me check again @alamb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @duongcongtoai -- this looks good to me. Thank you @jayzhan211 for the review
All in all this is a really nice change 🏆 : thank you for fixing the underlying problem that the supposedly easy-to-add check exposed
.err() | ||
.unwrap() | ||
.to_string(), | ||
"UDF returned a different number of rows than expected" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌 -- very nice
@@ -44,6 +44,7 @@ DELETE 24 | |||
query T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to remind myself what this data looked like. Here it is for anyone else who may be interested
DataFusion CLI v37.1.0
> select * from 'datafusion/core/tests/data/parquet_map.parquet';
+----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+
| ints | strings | timestamp |
+----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+
| {bytes: 38906} | {host: 198.194.132.41, method: GET, protocol: HTTP/1.0, referer: https://some.com/this/endpoint/prints/money, request: /observability/metrics/production, status: 400, user-identifier: shaneIxD} | 06/Oct/2023:17:53:45 |
| {bytes: 44606} | {host: 140.115.224.194, method: PATCH, protocol: HTTP/1.0, referer: https://we.org/user/booperbot124, request: /booper/bopper/mooper/mopper, status: 304, user-identifier: jesseddy} | 06/Oct/2023:17:53:45 |
| {bytes: 23517} | {host: 63.69.43.67, method: GET, protocol: HTTP/2.0, referer: https://random.net/booper/bopper/mooper/mopper, request: /booper/bopper/mooper/mopper, status: 550, user-identifier: jesseddy} | 06/Oct/2023:17:53:45 |
| {bytes: 44876} | {host: 69.4.253.156, method: PATCH, protocol: HTTP/1.1, referer: https://some.net/booper/bopper/mooper/mopper, request: /user/booperbot124, status: 403, user-identifier: Karimmove} | 06/Oct/2023:17:53:45 |
| {bytes: 34122} | {host: 239.152.196.123, method: DELETE, protocol: HTTP/2.0, referer: https://for.com/observability/metrics/production, request: /apps/deploy, status: 403, user-identifier: meln1ks} | 06/Oct/2023:17:53:45 |
| {bytes: 37438} | {host: 95.243.186.123, method: DELETE, protocol: HTTP/1.1, referer: https://make.de/wp-admin, request: /wp-admin, status: 550, user-identifier: Karimmove} | 06/Oct/2023:17:53:45 |
| {bytes: 45784} | {host: 66.142.251.66, method: PUT, protocol: HTTP/2.0, referer: https://some.org/apps/deploy, request: /secret-info/open-sesame, status: 403, user-identifier: benefritz} | 06/Oct/2023:17:53:45 |
| {bytes: 27788} | {host: 157.85.140.215, method: GET, protocol: HTTP/1.1, referer: https://random.de/booper/bopper/mooper/mopper, request: /booper/bopper/mooper/mopper, status: 401, user-identifier: devankoshal} | 06/Oct/2023:17:53:45 |
| {bytes: 5344} | {host: 62.191.179.3, method: POST, protocol: HTTP/1.0, referer: https://random.org/booper/bopper/mooper/mopper, request: /observability/metrics/production, status: 400, user-identifier: jesseddy} | 06/Oct/2023:17:53:45 |
| {bytes: 9136} | {host: 237.213.221.20, method: PUT, protocol: HTTP/2.0, referer: https://some.us/this/endpoint/prints/money, request: /observability/metrics/production, status: 304, user-identifier: ahmadajmi} | 06/Oct/2023:17:53:46 |
| {bytes: 5640} | {host: 38.148.115.2, method: GET, protocol: HTTP/1.0, referer: https://for.net/apps/deploy, request: /do-not-access/needs-work, status: 301, user-identifier: benefritz} | 06/Oct/2023:17:53:46 |
...
@@ -44,6 +44,7 @@ DELETE 24 | |||
query T | |||
SELECT strings['not_found'] FROM data LIMIT 1; | |||
---- | |||
NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is how it works on main:
> select strings['not_found'] from 'datafusion/core/tests/data/parquet_map.parquet';
0 row(s) fetched.
Elapsed 0.006 seconds.
Here is how it works on this PR (aka has a single row for each input row)
DataFusion CLI v37.1.0
> select strings['not_found'] from '../datafusion/core/tests/data/parquet_map.parquet';
+----------------------------------------------------------------------+
| ../datafusion/core/tests/data/parquet_map.parquet.strings[not_found] |
+----------------------------------------------------------------------+
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| . |
| . |
| . |
+----------------------------------------------------------------------+
209 row(s) fetched. (First 40 displayed. Use --maxrows to adjust)
Elapsed 0.033 seconds.
@@ -5197,8 +5199,9 @@ false false false true | |||
true false true false | |||
true false false true | |||
false true false false | |||
false false false false | |||
false false false false | |||
NULL NULL false false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iikewise I agree this should have 7 output rows
datafusion/datafusion/sqllogictest/test_files/array.slt
Lines 80 to 90 in cc53bd3
statement ok | |
CREATE TABLE fixed_size_arrays | |
AS VALUES | |
(arrow_cast(make_array(make_array(NULL, 2),make_array(3, NULL)), 'FixedSizeList(2, List(Int64))'), arrow_cast(make_array(1.1, 2.2, 3.3), 'FixedSizeList(3, Float64)'), arrow_cast(make_array('L', 'o', 'r', 'e', 'm'), 'FixedSizeList(5, Utf8)')), | |
(arrow_cast(make_array(make_array(3, 4),make_array(5, 6)), 'FixedSizeList(2, List(Int64))'), arrow_cast(make_array(NULL, 5.5, 6.6), 'FixedSizeList(3, Float64)'), arrow_cast(make_array('i', 'p', NULL, 'u', 'm'), 'FixedSizeList(5, Utf8)')), | |
(arrow_cast(make_array(make_array(5, 6),make_array(7, 8)), 'FixedSizeList(2, List(Int64))'), arrow_cast(make_array(7.7, 8.8, 9.9), 'FixedSizeList(3, Float64)'), arrow_cast(make_array('d', NULL, 'l', 'o', 'r'), 'FixedSizeList(5, Utf8)')), | |
(arrow_cast(make_array(make_array(7, NULL),make_array(9, 10)), 'FixedSizeList(2, List(Int64))'), arrow_cast(make_array(10.1, NULL, 12.2), 'FixedSizeList(3, Float64)'), arrow_cast(make_array('s', 'i', 't', 'a', 'b'), 'FixedSizeList(5, Utf8)')), | |
(NULL, arrow_cast(make_array(13.3, 14.4, 15.5), 'FixedSizeList(3, Float64)'), arrow_cast(make_array('a', 'm', 'e', 't', 'x'), 'FixedSizeList(5, Utf8)')), | |
(arrow_cast(make_array(make_array(11, 12),make_array(13, 14)), 'FixedSizeList(2, List(Int64))'), NULL, arrow_cast(make_array(',','a','b','c','d'), 'FixedSizeList(5, Utf8)')), | |
(arrow_cast(make_array(make_array(15, 16),make_array(NULL, 18)), 'FixedSizeList(2, List(Int64))'), arrow_cast(make_array(16.6, 17.7, 18.8), 'FixedSizeList(3, Float64)'), NULL) | |
; |
@@ -5183,8 +5184,9 @@ false false false true | |||
true false true false | |||
true false false true | |||
false true false false | |||
false false false false | |||
false false false false | |||
NULL NULL false false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double checked and the arrays
table has 7 rows, so I agree the correct answer has 7 output rows as well
datafusion/datafusion/sqllogictest/test_files/array.slt
Lines 58 to 68 in cc53bd3
statement ok | |
CREATE TABLE arrays | |
AS VALUES | |
(make_array(make_array(NULL, 2),make_array(3, NULL)), make_array(1.1, 2.2, 3.3), make_array('L', 'o', 'r', 'e', 'm')), | |
(make_array(make_array(3, 4),make_array(5, 6)), make_array(NULL, 5.5, 6.6), make_array('i', 'p', NULL, 'u', 'm')), | |
(make_array(make_array(5, 6),make_array(7, 8)), make_array(7.7, 8.8, 9.9), make_array('d', NULL, 'l', 'o', 'r')), | |
(make_array(make_array(7, NULL),make_array(9, 10)), make_array(10.1, NULL, 12.2), make_array('s', 'i', 't')), | |
(NULL, make_array(13.3, 14.4, 15.5), make_array('a', 'm', 'e', 't')), | |
(make_array(make_array(11, 12),make_array(13, 14)), NULL, make_array(',')), | |
(make_array(make_array(15, 16),make_array(NULL, 18)), make_array(16.6, 17.7, 18.8), NULL) | |
; |
array_has
and get_field
for Map
array_has
and get_field
for Map
Which issue does this PR close?
Closes #5735.
Add a small constraint for each UDF to have the same input and output size, with "arrow_typeof" as an exception
2 implementation failed the constraint (also fixed inside this PR)
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?