-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fuzz test different parquet encodings #1156
Conversation
let encodings = &[ | ||
Encoding::PLAIN, | ||
Encoding::RLE_DICTIONARY, | ||
//Encoding::DELTA_LENGTH_BYTE_ARRAY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to not be supported at the moment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like that came in with the original array reader in #384 from @yordan-pavlov
Do you think it would be a good exercise to support DELTA_LENGTH_BYTE_ARRAY in the reader? If so, I can file a ticket and see if anyone else is interested in picking it up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#1082 will add support for it
36a70db
to
96a2d4a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
ConvertedType::NONE, | ||
None, | ||
&converter, | ||
&[Encoding::PLAIN, Encoding::RLE_DICTIONARY], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the reason not to include DELTA_BINARY_PACKED
here that that encoding is not supported for FixedLengthByteArrays
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was at least my interpretation of https://github.com/apache/parquet-format/blob/master/Encodings.md
FIXED_LEN_BYTE_ARRAY only seems support PLAIN and dictionary encoding
let encodings = &[ | ||
Encoding::PLAIN, | ||
Encoding::RLE_DICTIONARY, | ||
//Encoding::DELTA_LENGTH_BYTE_ARRAY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like that came in with the original array reader in #384 from @yordan-pavlov
Do you think it would be a good exercise to support DELTA_LENGTH_BYTE_ARRAY in the reader? If so, I can file a ticket and see if anyone else is interested in picking it up
max_data_page_size: 1024 * 1024, | ||
max_dict_page_size: 1024 * 1024, | ||
writer_version: WriterVersion::PARQUET_1_0, | ||
..Default::default() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double checked that these are the same values as in impl Default for TestOptions
👍
Which issue does this PR close?
Relates to #1053
Rationale for this change
Improved test coverage
What changes are included in this PR?
#1110 extended the fuzz tests to support different array types, nulls and multiple page row groups. This builds on this to also test different encodings
Are there any user-facing changes?
No, this PR only adds more tests