-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve DecimalArray
API ergonomics: add iter()
, FromIterator
, with_precision_and_scale
#1223
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @liukun4515 I coded this one up to try and improve the code for working with DecimalArray
s (and I wanted to write some code this weekend rather than just reviewing code!)
/// builder.append_null().unwrap(); | ||
/// builder.append_value(-8_887_000_000).unwrap(); | ||
/// let decimal_array: DecimalArray = builder.finish(); | ||
/// // Create a DecimalArray with the default precision and scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an example of how the new API workrs -- I think it is easier to understand and use
} | ||
|
||
#[test] | ||
fn test_decimal_iter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also included a basic iter()
and into_iter()
functions -- while there is room for performance improvement I think the API is solid
Codecov Report
@@ Coverage Diff @@
## master #1223 +/- ##
==========================================
+ Coverage 83.02% 83.04% +0.01%
==========================================
Files 180 180
Lines 52289 52418 +129
==========================================
+ Hits 43413 43529 +116
- Misses 8876 8889 +13
Continue to review full report at Codecov.
|
arrow/src/array/array_binary.rs
Outdated
/// The default precision and scale used when not specified | ||
pub fn default_type() -> DataType { | ||
// Keep maximum precision | ||
DataType::Decimal(38, 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can replace 38 to DECIMAL_MAX_PRECISION
, and 10 to OTHER_DEFAULT_VALUE
.
@@ -189,6 +194,12 @@ impl fmt::Display for DataType { | |||
} | |||
} | |||
|
|||
/// The maximum precision for [DataType::Decimal] values | |||
pub const DECIMAL_MAX_PRECISION: usize = 38; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the datafusion, we use
https://github.com/apache/arrow-datafusion/blob/e92225d6f4660363eec3b1e767a188fccebb7ed9/datafusion/src/scalar.rs#L39 .
Maybe we can replace the const in the datafusion.
If we have the plan to support the decimal256, we may need to use some labels to identify diff decimal type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we should remove the duplication in DataFusion -- when this PR is merged, I can file a ticket in DataFusion to upgrade to the new DecimalAPI and remove the duplication
Do you want to refactor some codes like arrow-rs/arrow/src/compute/kernels/cast.rs Line 2075 in 66e029e
If you have the plan to refactor this code, I will review this later. LGTM, expect the #1223 (comment) |
I was planning to do that in a follow on PR to keep this PR small and focused on the API changes. So I will plan to make the changes you suggest in this PR, and then prepare a new one that refactors the rest of the code. Thank you for the review. |
Update: I have not forgotten about this PR but I have been busy with other tasks this week. If I don't get to it later this week I'll work on it over the weekend |
🤔 while working to port some of the code over, it turns out that DecimalBuilder also does value validation on each value (so it is certain that the decimal values are within correct range. I need to think about this 🤔 |
back to draft while I think about this a bit |
This PR is now ready for (re) review @liukun4515 cc @sweb as you contributed the initial You can see examples of how the API is used in #1247 and #1249 |
@@ -1153,87 +1153,6 @@ pub struct FixedSizeBinaryBuilder { | |||
builder: FixedSizeListBuilder<UInt8Builder>, | |||
} | |||
|
|||
pub const MAX_DECIMAL_FOR_EACH_PRECISION: [i128; 38] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to datatype
))); | ||
} | ||
Ok(result) | ||
validate_decimal_precision(result, precision) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I consolidated the bounds checking (so that I could also use it in with_precision_and_scale
)
FYI @liukun4515 this is ready for review. I know you away for a few days so no worries. I plan to wait for a review prior to merging this PR; If it gets reviewed by tomorrow's cutoff for arrow 9.0.0 I'll include it there, otherwise this can wait for 2 weeks and perhaps be included in arrow 10.0.0 |
I will review it later today. |
@@ -1153,87 +1153,6 @@ pub struct FixedSizeBinaryBuilder { | |||
builder: FixedSizeListBuilder<UInt8Builder>, | |||
} | |||
|
|||
pub const MAX_DECIMAL_FOR_EACH_PRECISION: [i128; 38] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is an API breaks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added 'api-change' label
@@ -2297,7 +2298,7 @@ mod tests { | |||
let array = Arc::new(array) as ArrayRef; | |||
let casted_array = cast(&array, &DataType::Decimal(3, 1)); | |||
assert!(casted_array.is_err()); | |||
assert_eq!("Invalid argument error: The value of 1000 i128 is not compatible with Decimal(3,1)", casted_array.unwrap_err().to_string()); | |||
assert_eq!("Invalid argument error: 1000 is too large to store in a Decimal of precision 3. Max is 999", casted_array.unwrap_err().to_string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice error message.
assert_eq!(actual, expected); | ||
} | ||
|
||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Sorry for the delayed review.
No problem -- thanks for the review @liukun4515 |
Thanks again for the help @liukun4515 |
DecimalArray
API ergonomics: add iter(), create from iter(), change precision / scaleDecimalArray
API ergonomics: add iter()
, FromIterator
, change precision / scale
DecimalArray
API ergonomics: add iter()
, FromIterator
, change precision / scaleDecimalArray
API ergonomics: add iter()
, FromIterator
, with_precision_and_scale
Which issue does this PR close?
Closes #1009
Closes #1083
Rationale for this change
It is somewhat awkward to create
DecimalArray
as well as iterate through their values.This leads to (repeated) code like
create_decimal_array
https://sourcegraph.com/search?q=context:global+repo:%5Egithub%5C.com/apache/arrow-rs%24+create_decimal_array&patternType=literal(there is similar repetition in datafusion)
There are also more bounds checks than necessary (given
DecimalArray::value()
checks on each accessWhat changes are included in this PR?
DecimalArray::from
andDecimalArray::from_iter_values
(mirroringPrimitiveArray
)DecimalArray::into_iter()
andDecimalArray::iter()
for iterating through valuesDecimalArray::with_precision_and_scale()
for changing the relevant precision and scale, and validate precisionAre there any user-facing changes?
DecimalArray
sdatatype
Follow on PRs:
DecimalArray
creation API in parquet crate #1247