-
Notifications
You must be signed in to change notification settings - Fork 847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite Decimal
and DecimalArray
using const_generic
#2383
Conversation
Signed-off-by: remzi <[email protected]>
Decimal
and DecimalArray
using const_generic
Decimal
and DecimalArray
using const_generic
Seems like there are some unrelated errors in |
Signed-off-by: remzi <[email protected]>
fn from(data: ArrayData) -> Self { | ||
assert_eq!( | ||
data.buffers().len(), | ||
1, | ||
"Decimal128Array data should contain 1 buffer only (values)" | ||
"DecimalArray data should contain 1 buffer only (values)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may show Decimal128Array
or Decimal256Array
based on byte width.
mod private_decimal { | ||
pub trait DecimalArrayPrivate { | ||
fn raw_value_data_ptr(&self) -> *const u8; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this trait?
const DEFAULT_TYPE: DataType; | ||
const MAX_PRECISION: usize; | ||
const MAX_SCALE: usize; | ||
pub struct BasicDecimalArray<const BYTE_WIDTH: usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still have no idea how to constrain byte width, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also seal the byte_width
in this way:
https://users.rust-lang.org/t/how-to-seal-the-const-generic/77947/2
pub const DEFAULT_TYPE: DataType = BasicDecimal::<BYTE_WIDTH>::DEFAULT_TYPE; | ||
pub const MAX_PRECISION: usize = BasicDecimal::<BYTE_WIDTH>::MAX_PRECISION; | ||
pub const MAX_SCALE: usize = BasicDecimal::<BYTE_WIDTH>::MAX_SCALE; | ||
pub const TYPE_CONSTRUCTOR: fn(usize, usize) -> DataType = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can TYPE_CONSTRUCTOR be non-public?
DataType::Decimal256, | ||
DataType::Decimal256(DECIMAL256_MAX_PRECISION, DECIMAL_DEFAULT_SCALE), | ||
), | ||
_ => panic!("invalid byte width"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could constrain the byte width here. When compile the constant items, the compiler will give error if byte width != 16 or 32.
@viirya
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention what is valid byte width in the message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really cool, was going to do something similar so awesome you've already done it. Will review in more detail later today
|
||
impl<const BYTE_WIDTH: usize> BasicDecimal<BYTE_WIDTH> { | ||
#[allow(clippy::type_complexity)] | ||
const _MAX_PRECISION_SCALE_CONSTRUCTOR_DEFAULT_TYPE: ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is not pub, I think MAX_PRECISION_SCALE_CONSTRUCTOR_DEFAULT_TYPE
might be okay. A _
prefix seems redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few comments as above. Otherwise it looks good to me. Thanks for the refactoring.
I assume that this PR will not introduce performance regression. But we need more test because weird things often happen. |
Thanks for your concern about performance, that is why I submit these two pr about decimal optimization. #2360 #2357 In our service, there are many columns with decimal data type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, will run benchmarks to double check
impl Decimal128Array { | ||
/// Creates a [Decimal128Array] with default precision and scale, | ||
/// based on an iterator of `i128` values without nulls | ||
pub fn from_iter_values<I: IntoIterator<Item = i128>>(iter: I) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record this method is unsound, but was unsound before. There is a broader issue here
Signed-off-by: remzi <[email protected]>
So we don't actually have any decimal benchmarks that I can find... Created #2388 |
I'm going to get this in as I think it is a valuable cleanup, and we can continue to iterate on performance in subsequent PRs |
Benchmark runs are scheduled for baseline = 56f7904 and contender = 77c814c. 77c814c is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Thank you for your review @viirya @tustvold @liukun4515 |
It wasn't 100% clear to me -- but @liukun4515 this PR may have improved decimal validation performance. Perhaps you can run your benchmarks again now to see if things have improved. |
I guess one point from @tustvold is that using fixed length slices when constructing decimals would help the compiler elide bounds checks, the performance gain might come from it. |
Opened a minor PR #2389 to clean up some comments not addressed yet. |
Signed-off-by: remzi [email protected]
Which issue does this PR close?
This is a subtask of #2384 .
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?