-
Notifications
You must be signed in to change notification settings - Fork 847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parquet reader: Support reading decimals from parquet BYTE_ARRAY
type
#2160
parquet reader: Support reading decimals from parquet BYTE_ARRAY
type
#2160
Conversation
@@ -76,16 +76,6 @@ impl DecimalArrayConverter { | |||
pub fn new(precision: i32, scale: i32) -> Self { | |||
Self { precision, scale } | |||
} | |||
|
|||
fn from_bytes_to_i128(b: &[u8]) -> i128 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this method out and as a public method for this mod
d497681
to
c7268f4
Compare
Codecov Report
@@ Coverage Diff @@
## master #2160 +/- ##
==========================================
- Coverage 82.85% 82.83% -0.02%
==========================================
Files 237 237
Lines 61381 61461 +80
==========================================
+ Hits 50856 50913 +57
- Misses 10525 10548 +23
Help us with your feedback. Take ten seconds to tell us how you rate us. |
BYTE_ARRAY
type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes sense to me -- thank you @liukun4515
Field::new("decimal1", DataType::Decimal(4,2), false), | ||
Field::new("decimal2", DataType::Decimal(12,2), false), | ||
Field::new("decimal3", DataType::Decimal(30,2), false), | ||
Field::new("decimal4", DataType::Decimal(33,2), false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would this have previously failed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, this has previously failed as I just ran this test to confirm.
---- arrow::schema::tests::test_decimal_fields stdout ----
thread 'arrow::schema::tests::test_decimal_fields' panicked at 'called `Result::unwrap()` on an `Err` value: ArrowError("Unable to convert parquet BYTE_ARRAY logical type Some(Decimal { scale: 2, precision: 33 }) or converted type DECIMAL")', parquet/src/arrow/schema.rs:1734:60
@@ -233,6 +225,18 @@ fn build_primitive_reader( | |||
column_desc, | |||
arrow_type, | |||
), | |||
Some(DataType::Decimal(precision, scale)) => { | |||
// read decimal data from parquet binary physical type | |||
let convert = DecimalByteArrayConvert::new(DecimalArrayConverter::new(precision as i32, scale as i32)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the core change?
(namely to support reading arrow DataType::Decimal
from parquet BYTE_ARRAY
type where previously the code only supported DataType::Decimal
in `FIXED_LEN_BYTE_ARRAY fields)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
previously, decimal just can be read from int32,int64,byte_array
, but in the definition of parquet decimal can be read from int32,int64,byte_array,fixed_length_byte_array
.
Other reviewers might find the context on https://github.com/apache/arrow-datafusion/pull/2960/files#r928343239 helpful |
// read decimal data from parquet binary physical type | ||
let convert = DecimalByteArrayConvert::new(DecimalArrayConverter::new(precision as i32, scale as i32)); | ||
Ok(Box::new( | ||
ComplexObjectArrayReader::<ByteArrayType,DecimalByteArrayConvert>::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style?
ComplexObjectArrayReader::<ByteArrayType,DecimalByteArrayConvert>::new( | |
ComplexObjectArrayReader::<ByteArrayType, DecimalByteArrayConvert>::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code has been formatted.
Benchmark runs are scheduled for baseline = ec3530d and contender = 37dd037. 37dd037 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #2159
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?