-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize the validation of Decimal256
#2320
Comments
This also can speed up the construction of decimal arrays like #2313 |
Maybe we should add iter<Option<[u8]>> for decimal array and reduce construct the decimal128 and decimal258 which contain the useless precision/scale for validation. |
Also just iterate the array like |
I think it is important to:
Performance wise I haven't checked if we are validating things twice |
also add the benchmark result of this function. |
I find it unexpected that a variable length byte array outperforms a fixed-length i128?
I think it would help to take a step back here, as I at least am very confused as to what is being validated when. For most array types we have an invariant that For example, |
@tustvold Have you benched these two implementation? In current logic, we should get the byte array and convert the byte array to i128 or bigint/string first, and then compare with the bound min/max value. I guess we can just compare the fixed length byte array with the bound min/max with the format of fixed length byte array. |
We can get the decimal array from decimal array builder. But in the arrow-rs, I find some case in which we can skip the validation. For example, reading decimal value from parquet file(INT32,INT64, OR byte array), taking data from decimal array. We can refile these cases. |
Currently we only do decimal validation where:
I think it is somehow selective right now. Actually we do decimal validation less than necessary, as mentioned above |
When we create the decimal array from iter<Option>, and need to do validation. Because the default precision is the max precision(38).
|
Maybe we'd better update the title if we don't update the decimal128 validation. |
Title is updated |
Decimal256
: compare data using the byte array instead of string
Decimal256
: compare data using the byte array instead of stringDecimal256
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
When arrow construct a decimal array, we need to verify the data of decimal with the precision.
From current implemention
arrow-rs/arrow/src/datatypes/datatype.rs
Line 456 in d87f6a4
arrow-rs/arrow/src/datatypes/datatype.rs
Line 485 in d87f6a4
We compare the decimal using i128 or string, it may be inefficient.
optimize the validation using the byte array instead of i128 or string.
Describe the solution you'd like
For exmaple, precision 3, The max value is i128.to_le_byte(999), and the min value if i128.to_le_byte(-999).
Comparing the decimal128(value,precision,scale) just need to compare the value ([u8;16]) with the min and max.
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: