Improve `DecimalArray` API ergonomics: add `iter()`, `FromIterator`, `with_precision_and_scale` #1223

alamb · 2022-01-23T12:25:27Z

Which issue does this PR close?

Rationale for this change

It is somewhat awkward to create DecimalArray as well as iterate through their values.

This leads to (repeated) code like create_decimal_array https://sourcegraph.com/search?q=context:global+repo:%5Egithub%5C.com/apache/arrow-rs%24+create_decimal_array&patternType=literal

    fn create_decimal_array(
        array: &[Option<i128>],
        precision: usize,
        scale: usize,
    ) -> DecimalArray {
        let mut decimal_builder = DecimalBuilder::new(array.len(), precision, scale);
        for value in array {
            match value {
                None => {
                    decimal_builder.append_null().unwrap();
                }
                Some(v) => {
                    decimal_builder.append_value(*v).unwrap();
                }
            }
        }
        decimal_builder.finish()
    }

(there is similar repetition in datafusion)

There are also more bounds checks than necessary (given DecimalArray::value() checks on each access

What changes are included in this PR?

Add DecimalArray::from and DecimalArray::from_iter_values (mirroring PrimitiveArray)
Add DecimalArray::into_iter() and DecimalArray::iter() for iterating through values
Add DecimalArray::with_precision_and_scale() for changing the relevant precision and scale, and validate precision
Add documentation
Refactor some existing code to show how the new APIs can be used

Are there any user-facing changes?

Nicer APIs for creating and working with DecimalArrays
Constants for working with Decimal values are moved to datatype

Follow on PRs:

alamb

cc @liukun4515 I coded this one up to try and improve the code for working with DecimalArrays (and I wanted to write some code this weekend rather than just reviewing code!)

arrow/src/array/array_binary.rs

alamb · 2022-01-23T12:27:10Z

arrow/src/array/array_binary.rs

-///    builder.append_null().unwrap();
-///    builder.append_value(-8_887_000_000).unwrap();
-///    let decimal_array: DecimalArray = builder.finish();
+///    // Create a DecimalArray with the default precision and scale


this is an example of how the new API workrs -- I think it is easier to understand and use

alamb · 2022-01-23T12:27:59Z

arrow/src/array/array_binary.rs

+    }
+
+    #[test]
+    fn test_decimal_iter() {


Also included a basic iter() and into_iter() functions -- while there is room for performance improvement I think the API is solid

codecov-commenter · 2022-01-23T12:38:45Z

Codecov Report

Merging #1223 (cd83d94) into master (c7e36ea) will increase coverage by 0.01%.
The diff coverage is 91.78%.

@@            Coverage Diff             @@
##           master    #1223      +/-   ##
==========================================
+ Coverage   83.02%   83.04%   +0.01%     
==========================================
  Files         180      180              
  Lines       52289    52418     +129     
==========================================
+ Hits        43413    43529     +116     
- Misses       8876     8889      +13

Impacted Files	Coverage Δ
arrow/src/array/mod.rs	`100.00% <ø> (ø)`
arrow/src/datatypes/datatype.rs	`66.40% <66.66%> (+0.01%)`	⬆️
arrow/src/array/array_binary.rs	`93.53% <93.16%> (-0.12%)`	⬇️
arrow/src/array/builder.rs	`86.73% <100.00%> (-0.04%)`	⬇️
arrow/src/array/data.rs	`83.13% <100.00%> (+0.10%)`	⬆️
arrow/src/compute/kernels/cast.rs	`95.21% <100.00%> (+<0.01%)`	⬆️
arrow/src/csv/reader.rs	`88.12% <100.00%> (ø)`
parquet/src/encodings/encoding.rs	`93.52% <0.00%> (-0.20%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c7e36ea...cd83d94. Read the comment docs.

liukun4515 · 2022-01-24T02:00:14Z

arrow/src/array/array_binary.rs

+    /// The default precision and scale used when not specified
+    pub fn default_type() -> DataType {
+        // Keep maximum precision
+        DataType::Decimal(38, 10)


we can replace 38 to DECIMAL_MAX_PRECISION, and 10 to OTHER_DEFAULT_VALUE.

liukun4515 · 2022-01-24T02:09:38Z

arrow/src/datatypes/datatype.rs

@@ -189,6 +194,12 @@ impl fmt::Display for DataType {
    }
 }

+/// The maximum precision for [DataType::Decimal] values
+pub const DECIMAL_MAX_PRECISION: usize = 38;


In the datafusion, we use
https://github.com/apache/arrow-datafusion/blob/e92225d6f4660363eec3b1e767a188fccebb7ed9/datafusion/src/scalar.rs#L39 .
Maybe we can replace the const in the datafusion.
If we have the plan to support the decimal256, we may need to use some labels to identify diff decimal type.

I agree that we should remove the duplication in DataFusion -- when this PR is merged, I can file a ticket in DataFusion to upgrade to the new DecimalAPI and remove the duplication

liukun4515 · 2022-01-24T02:13:51Z

Do you want to refactor some codes like

arrow-rs/arrow/src/compute/kernels/cast.rs

Line 2075 in 66e029e

fn create_decimal_array(

in this pull request?

If you have the plan to refactor this code, I will review this later.

LGTM, expect the #1223 (comment)

alamb · 2022-01-24T13:57:28Z

Do you want to refactor some codes like ... in this pull request?

I was planning to do that in a follow on PR to keep this PR small and focused on the API changes.

So I will plan to make the changes you suggest in this PR, and then prepare a new one that refactors the rest of the code.

Thank you for the review.

alamb · 2022-01-26T22:06:22Z

Update: I have not forgotten about this PR but I have been busy with other tasks this week. If I don't get to it later this week I'll work on it over the weekend

alamb · 2022-01-29T17:01:22Z

🤔 while working to port some of the code over, it turns out that DecimalBuilder also does value validation on each value (so it is certain that the decimal values are within correct range. I need to think about this 🤔

alamb · 2022-01-29T20:10:30Z

back to draft while I think about this a bit

!

alamb · 2022-01-30T12:41:52Z

This PR is now ready for (re) review @liukun4515

cc @sweb as you contributed the initial DecimalArray implementation (thanks again!)

You can see examples of how the API is used in #1247 and #1249

alamb · 2022-01-30T12:48:25Z

arrow/src/array/builder.rs

@@ -1153,87 +1153,6 @@ pub struct FixedSizeBinaryBuilder {
    builder: FixedSizeListBuilder<UInt8Builder>,
 }

-pub const MAX_DECIMAL_FOR_EACH_PRECISION: [i128; 38] = [


moved to datatype

alamb · 2022-01-30T12:49:42Z

arrow/src/csv/reader.rs

-            )));
-        }
-        Ok(result)
+        validate_decimal_precision(result, precision)


I consolidated the bounds checking (so that I could also use it in with_precision_and_scale)

alamb · 2022-02-03T16:25:48Z

FYI @liukun4515 this is ready for review. I know you away for a few days so no worries.

I plan to wait for a review prior to merging this PR; If it gets reviewed by tomorrow's cutoff for arrow 9.0.0 I'll include it there, otherwise this can wait for 2 weeks and perhaps be included in arrow 10.0.0

liukun4515 · 2022-02-07T02:33:06Z

FYI @liukun4515 this is ready for review. I know you away for a few days so no worries.

I plan to wait for a review prior to merging this PR; If it gets reviewed by tomorrow's cutoff for arrow 9.0.0 I'll include it there, otherwise this can wait for 2 weeks and perhaps be included in arrow 10.0.0

I will review it later today.

liukun4515 · 2022-02-08T13:11:39Z

arrow/src/array/builder.rs

@@ -1153,87 +1153,6 @@ pub struct FixedSizeBinaryBuilder {
    builder: FixedSizeListBuilder<UInt8Builder>,
 }

-pub const MAX_DECIMAL_FOR_EACH_PRECISION: [i128; 38] = [


This change is an API breaks.

Added 'api-change' label

liukun4515 · 2022-02-08T13:12:28Z

arrow/src/compute/kernels/cast.rs

@@ -2297,7 +2298,7 @@ mod tests {
        let array = Arc::new(array) as ArrayRef;
        let casted_array = cast(&array, &DataType::Decimal(3, 1));
        assert!(casted_array.is_err());
-        assert_eq!("Invalid argument error: The value of 1000 i128 is not compatible with Decimal(3,1)", casted_array.unwrap_err().to_string());
+        assert_eq!("Invalid argument error: 1000 is too large to store in a Decimal of precision 3. Max is 999", casted_array.unwrap_err().to_string());


Nice error message.

liukun4515 · 2022-02-08T13:19:43Z

arrow/src/array/array_binary.rs

+        assert_eq!(actual, expected);
+    }
+
+    #[test]


good tests.

liukun4515

LGTM
Sorry for the delayed review.

alamb · 2022-02-08T13:41:57Z

No problem -- thanks for the review @liukun4515

alamb · 2022-02-08T20:11:02Z

Thanks again for the help @liukun4515

alamb added 4 commits January 23, 2022 05:58

DecimalArray: create from iter, iter(), docs

62b34b1

Add with_precision and scale

1368e18

Implement iter() and into_iter() for DecimalArray

7a77846

Clean up and tests

41f2cbc

github-actions bot added the arrow Changes to the arrow crate label Jan 23, 2022

alamb commented Jan 23, 2022

View reviewed changes

This was referenced Jan 23, 2022

Add creator from Iterator of i128 to get the decimalarray #1009

Closed

Add into iter for decimal array #1083

Closed

liukun4515 reviewed Jan 24, 2022

View reviewed changes

alamb added 2 commits January 29, 2022 10:54

Merge remote-tracking branch 'apache/master' into alamb/decimal_creation

cd12f67

Return Result rather than panic

89f9cee

alamb mentioned this pull request Jan 29, 2022

Use new DecimalArray creation API in parquet crate #1247

Merged

alamb marked this pull request as draft January 29, 2022 20:10

alamb added 4 commits January 30, 2022 06:16

Refactor error handling into separate function

529fc04

Validate data in with_precision_and_scale

27fe607

!

Use named constant values

62e1ca8

clippy

e4a1d65

alamb mentioned this pull request Jan 30, 2022

Use new DecimalArray creation API in arrow crate #1249

Merged

alamb marked this pull request as ready for review January 30, 2022 12:41

alamb commented Jan 30, 2022

View reviewed changes

liukun4515 reviewed Feb 8, 2022

View reviewed changes

arrow/src/array/array_binary.rs

assert_eq!(actual, expected);

}

#[test]

Copy link

Contributor

liukun4515 Feb 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good tests.

alamb reacted with heart emoji

liukun4515 approved these changes Feb 8, 2022

View reviewed changes

Merge remote-tracking branch 'apache/master' into alamb/decimal_creation

cd83d94

alamb merged commit 35e16be into apache:master Feb 8, 2022

alamb added the api-change Changes to the arrow API label Feb 8, 2022

alamb deleted the alamb/decimal_creation branch February 8, 2022 20:10

alamb removed the api-change Changes to the arrow API label Feb 16, 2022

alamb changed the title ~~DecimalArray API ergonomics: add iter(), create from iter(), change precision / scale~~ Improve DecimalArray API ergonomics: add iter(), FromIterator, change precision / scale Feb 16, 2022

alamb changed the title ~~Improve DecimalArray API ergonomics: add iter(), FromIterator, change precision / scale~~ Improve DecimalArray API ergonomics: add iter(), FromIterator, with_precision_and_scale Feb 16, 2022

alamb mentioned this pull request Feb 21, 2022

[Minor] Clean up DecimalArray API Usage apache/datafusion#1869

Merged

alamb mentioned this pull request May 23, 2022

Optimize take kernel for Decimal #1190

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `DecimalArray` API ergonomics: add `iter()`, `FromIterator`, `with_precision_and_scale` #1223

Improve `DecimalArray` API ergonomics: add `iter()`, `FromIterator`, `with_precision_and_scale` #1223

alamb commented Jan 23, 2022 •

edited

Loading

alamb left a comment

alamb Jan 23, 2022

alamb Jan 23, 2022

codecov-commenter commented Jan 23, 2022 •

edited

Loading

liukun4515 Jan 24, 2022

liukun4515 Jan 24, 2022 •

edited

Loading

alamb Jan 30, 2022

liukun4515 commented Jan 24, 2022 •

edited

Loading

alamb commented Jan 24, 2022

alamb commented Jan 26, 2022

alamb commented Jan 29, 2022

alamb commented Jan 29, 2022

alamb commented Jan 30, 2022

alamb Jan 30, 2022

alamb Jan 30, 2022

alamb commented Feb 3, 2022

liukun4515 commented Feb 7, 2022

liukun4515 Feb 8, 2022

alamb Feb 8, 2022

liukun4515 Feb 8, 2022

liukun4515 Feb 8, 2022

liukun4515 left a comment

alamb commented Feb 8, 2022

alamb commented Feb 8, 2022

Improve DecimalArray API ergonomics: add iter(), FromIterator, with_precision_and_scale #1223

Improve DecimalArray API ergonomics: add iter(), FromIterator, with_precision_and_scale #1223

Conversation

alamb commented Jan 23, 2022 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Follow on PRs:

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Jan 23, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

liukun4515 Jan 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liukun4515 commented Jan 24, 2022 • edited Loading

alamb commented Jan 24, 2022

alamb commented Jan 26, 2022

alamb commented Jan 29, 2022

alamb commented Jan 29, 2022

alamb commented Jan 30, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Feb 3, 2022

liukun4515 commented Feb 7, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liukun4515 left a comment

Choose a reason for hiding this comment

alamb commented Feb 8, 2022

alamb commented Feb 8, 2022

Improve `DecimalArray` API ergonomics: add `iter()`, `FromIterator`, `with_precision_and_scale` #1223

Improve `DecimalArray` API ergonomics: add `iter()`, `FromIterator`, `with_precision_and_scale` #1223

alamb commented Jan 23, 2022 •

edited

Loading

codecov-commenter commented Jan 23, 2022 •

edited

Loading

liukun4515 Jan 24, 2022 •

edited

Loading

liukun4515 commented Jan 24, 2022 •

edited

Loading