Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Decimal256 API #1914

Merged
merged 10 commits into from
Jun 23, 2022
Merged

Add Decimal256 API #1914

merged 10 commits into from
Jun 23, 2022

Conversation

viirya
Copy link
Member

@viirya viirya commented Jun 20, 2022

Which issue does this PR close?

Closes #1913.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jun 20, 2022
@codecov-commenter
Copy link

codecov-commenter commented Jun 20, 2022

Codecov Report

Merging #1914 (32be11a) into master (ded6316) will increase coverage by 0.01%.
The diff coverage is 90.72%.

@@            Coverage Diff             @@
##           master    #1914      +/-   ##
==========================================
+ Coverage   83.41%   83.43%   +0.01%     
==========================================
  Files         214      214              
  Lines       56991    57061      +70     
==========================================
+ Hits        47541    47610      +69     
- Misses       9450     9451       +1     
Impacted Files Coverage Δ
arrow/src/util/decimal.rs 91.50% <90.62%> (+11.91%) ⬆️
arrow/src/array/array_binary.rs 94.18% <100.00%> (ø)
arrow/src/array/array_dictionary.rs 91.53% <0.00%> (-0.39%) ⬇️
parquet_derive/src/parquet_field.rs 65.98% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ded6316...32be11a. Read the comment docs.

self.value.partial_cmp(&other.value)
impl Decimal128 {
/// Creates `Decimal128` from an `i128` value.
pub fn new_from_i128(precision: usize, scale: usize, value: i128) -> Self {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_from_i128 is still convenient to use. So I'm not sure if we want to remove it. I keep it so far.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is mostly used in test code internally. Another option is to not expose it, maybe we can make it crate public.

/// returning an error. The bytes should be stored in little-endian order.
///
/// Safety:
/// This method doesn't validate if the decimal value represented by the bytes
Copy link
Member Author

@viirya viirya Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DecimalArray and DecimalBuilder already do value validation. So I skip validation here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DecimalArray and DecimalBuilder already do value validation. So I skip validation here.

If we can make sure this, I think it looks good me.

Comment on lines +74 to +75
/// If the string representation cannot be fitted with the precision of the decimal,
/// the string will be truncated.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've considered if the truncation is needed here. I added it eventually because as_string outputting a string larger than specified precision looks weird.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DecimalArray and DecimalArray already do value validation. So I skip validation here.

Why do we need the truncation if we can always make sure that the value can be stored within the precision?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wonder if users might take it to use separately (i.e., without the value validation in DecimalArray and DecimalBuilder) and getting confused.

@alamb
Copy link
Contributor

alamb commented Jun 21, 2022

I will try and find time to review this tomorrow

if sign.len() == 1 {
bound += 1;
}
let value_str = value_str[0..bound].to_string();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we slice from the most significant digit?
If value_str = 1000, precision = 3, scale = 1, then the expected output is 10.0?

Copy link
Contributor

@liukun4515 liukun4515 Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the precision is 3, and the max value is 999

@liukun4515
Copy link
Contributor

I will review this PR tomorrow. @viirya

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good to me after resolving comments from @HaoYang670 and @liukun4515

arrow/src/util/decimal.rs Outdated Show resolved Hide resolved
arrow/src/util/decimal.rs Show resolved Hide resolved
@viirya
Copy link
Member Author

viirya commented Jun 22, 2022

Seems some unrelated errors in interop test between Go and C#:

################# FAILURES #################
FAILED TEST: datetime Go producing,  C# consuming
1 failures
1
Error: `docker-compose --file /home/runner/work/arrow-rs/arrow-rs/docker-compose.yml run --rm -e ARCHERY_INTEGRATION_WITH_RUST=1 conda-integration` exited with a non-zero exit code 1, see the process log above.

assert_eq!(value.to_string(), "-0.01");

bytes = i128::MAX.to_le_bytes();
let value = Decimal128::try_new_from_bytes(38, 2, &bytes).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The width of i128::MAX.to_le_bytes() is 39.
The input value of i128 or bytes are larger than the precision, and it looks a bit wired to me.
For decimal(3,2), if we put 12345 and 12344 to the decimal(3,2), we will get the same value of tostring
But the ord and eq is not same.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i128 maximum is larger than the precision 38 so truncation is happened. I thought either I put value validation into Decimal structs, or do string truncation. As I mentioned earlier, DecimalArray/DecimalBuilder already do value validation, I feel it is redundant to do it here again. Ideally these Decimal structs are used only in input/output of DecimalArray. Truncation is used to for sure that we won't produce a string longer than its precision. For its usage, it should not be given a value larger than its precision because it will be caught by DecimalArray/DecimalBuilder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively it is an option to get error from to_string if string length is larger than precision. But as DecimalBuilder can optionally skip value validation, it means we probably can have Decimal structs with such case. Then having an error seems not fitting with it.

Copy link
Contributor

@liukun4515 liukun4515 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s almost looks good to me.

@viirya
Copy link
Member Author

viirya commented Jun 23, 2022

@alamb Can we add this into 17.0.0 too? This changes Decimal128 so it will be an api change in 18.0.0 if we put this to 18.0.0. If it is okay to change, then that's fine.

@viirya
Copy link
Member Author

viirya commented Jun 23, 2022

If this is not needed to catch up 17.0.0, I will leave it open for a while in case others want to comment.

@alamb
Copy link
Contributor

alamb commented Jun 23, 2022

Thanks @viirya and @liukun4515 for the careful review

I think this look good enough to go for me -- given @liukun4515 approved this PR I will assume he is ok with merging as is (and we can improve it more in future PRs).

I don't quite follow all the discussion on #1914 (comment) so I don't know if there is any outstanding issues there we should be tracking or not 🤔 Let me know if it would help if I filed follow on tasks.

@alamb alamb merged commit f0df5e0 into apache:master Jun 23, 2022
@alamb alamb changed the title Add Decimal256 API Add Decimal256 API Jun 23, 2022
@viirya
Copy link
Member Author

viirya commented Jun 23, 2022

Thank you @alamb @liukun4515 @HaoYang670

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Decimal256 API
5 participants