Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite Decimal and DecimalArray using const_generic #2383

Merged
merged 3 commits into from
Aug 9, 2022

Conversation

HaoYang670
Copy link
Contributor

@HaoYang670 HaoYang670 commented Aug 9, 2022

Signed-off-by: remzi [email protected]

Which issue does this PR close?

This is a subtask of #2384 .

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

  1. From developer's view: Yes
  2. From user's view: No.

@github-actions github-actions bot added arrow Changes to the arrow crate parquet Changes to the parquet crate labels Aug 9, 2022
@HaoYang670 HaoYang670 changed the title Rewrite Decimal and DecimalArray using const_generic Rewrite Decimal and DecimalArray using const_generic Aug 9, 2022
@HaoYang670
Copy link
Contributor Author

HaoYang670 commented Aug 9, 2022

Seems like there are some unrelated errors in cargo doc.
Filed #2385 to track this.

Signed-off-by: remzi <[email protected]>
fn from(data: ArrayData) -> Self {
assert_eq!(
data.buffers().len(),
1,
"Decimal128Array data should contain 1 buffer only (values)"
"DecimalArray data should contain 1 buffer only (values)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may show Decimal128Array or Decimal256Array based on byte width.

Comment on lines 78 to 82
mod private_decimal {
pub trait DecimalArrayPrivate {
fn raw_value_data_ptr(&self) -> *const u8;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this trait?

const DEFAULT_TYPE: DataType;
const MAX_PRECISION: usize;
const MAX_SCALE: usize;
pub struct BasicDecimalArray<const BYTE_WIDTH: usize> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still have no idea how to constrain byte width, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also seal the byte_width in this way:
https://users.rust-lang.org/t/how-to-seal-the-const-generic/77947/2

pub const DEFAULT_TYPE: DataType = BasicDecimal::<BYTE_WIDTH>::DEFAULT_TYPE;
pub const MAX_PRECISION: usize = BasicDecimal::<BYTE_WIDTH>::MAX_PRECISION;
pub const MAX_SCALE: usize = BasicDecimal::<BYTE_WIDTH>::MAX_SCALE;
pub const TYPE_CONSTRUCTOR: fn(usize, usize) -> DataType =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can TYPE_CONSTRUCTOR be non-public?

DataType::Decimal256,
DataType::Decimal256(DECIMAL256_MAX_PRECISION, DECIMAL_DEFAULT_SCALE),
),
_ => panic!("invalid byte width"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could constrain the byte width here. When compile the constant items, the compiler will give error if byte width != 16 or 32.
@viirya

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention what is valid byte width in the message.

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool, was going to do something similar so awesome you've already done it. Will review in more detail later today

arrow/src/util/decimal.rs Outdated Show resolved Hide resolved

impl<const BYTE_WIDTH: usize> BasicDecimal<BYTE_WIDTH> {
#[allow(clippy::type_complexity)]
const _MAX_PRECISION_SCALE_CONSTRUCTOR_DEFAULT_TYPE: (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is not pub, I think MAX_PRECISION_SCALE_CONSTRUCTOR_DEFAULT_TYPE might be okay. A _ prefix seems redundant.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few comments as above. Otherwise it looks good to me. Thanks for the refactoring.

@HaoYang670
Copy link
Contributor Author

I assume that this PR will not introduce performance regression. But we need more test because weird things often happen.

@liukun4515
Copy link
Contributor

I assume that this PR will not introduce performance regression. But we need more test because weird things often happen.

Thanks for your concern about performance, that is why I submit these two pr about decimal optimization. #2360 #2357

In our service, there are many columns with decimal data type.

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, will run benchmarks to double check

impl Decimal128Array {
/// Creates a [Decimal128Array] with default precision and scale,
/// based on an iterator of `i128` values without nulls
pub fn from_iter_values<I: IntoIterator<Item = i128>>(iter: I) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record this method is unsound, but was unsound before. There is a broader issue here

Signed-off-by: remzi <[email protected]>
@tustvold
Copy link
Contributor

tustvold commented Aug 9, 2022

So we don't actually have any decimal benchmarks that I can find... Created #2388

@tustvold
Copy link
Contributor

tustvold commented Aug 9, 2022

I'm going to get this in as I think it is a valuable cleanup, and we can continue to iterate on performance in subsequent PRs

@tustvold tustvold merged commit 77c814c into apache:master Aug 9, 2022
@ursabot
Copy link

ursabot commented Aug 9, 2022

Benchmark runs are scheduled for baseline = 56f7904 and contender = 77c814c. 77c814c is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@HaoYang670
Copy link
Contributor Author

HaoYang670 commented Aug 9, 2022

Thank you for your review @viirya @tustvold @liukun4515

@HaoYang670 HaoYang670 deleted the const_generic_decimal branch August 9, 2022 10:46
@alamb
Copy link
Contributor

alamb commented Aug 9, 2022

It wasn't 100% clear to me -- but @liukun4515 this PR may have improved decimal validation performance. Perhaps you can run your benchmarks again now to see if things have improved.

@viirya
Copy link
Member

viirya commented Aug 9, 2022

I guess one point from @tustvold is that using fixed length slices when constructing decimals would help the compiler elide bounds checks, the performance gain might come from it.

@viirya
Copy link
Member

viirya commented Aug 9, 2022

Opened a minor PR #2389 to clean up some comments not addressed yet.

@HaoYang670
Copy link
Contributor Author

I guess one point from @tustvold is that using fixed length slices when constructing decimals would help the compiler elide bounds checks, the performance gain might come from it.

Yes, I found we have discussion about bound checking in #2360.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants