You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug, including details regarding any error messages, version, and platform.
As part of adding Parquet encryption to arrow-rs (apache/arrow-rs#6637), @rok and I found that arrow-rs could not read the example files in parquet-testing due to invalid repetition levels. arrow-rs complains that:
Parquet error: first repetition level of batch must be 0
This is due to the int64 list column data being written with the repetition levels flipped, 0 should indicate the start of a new list but 1 is used:
Related to this, is it also a bug that Arrow would read these files without complaining? If I test reading one of these files into Arrow format with PyArrow, the first leaf value is skipped.
Component(s)
C++, Parquet
The text was updated successfully, but these errors were encountered:
…yption test data (#45074)
### Rationale for this change
This makes the test data readable by other Parquet implementations that validate the repetition levels.
### What changes are included in this PR?
* Corrects the generation of encryption test files so that the int64 list columns correctly start lists with repetition level 0.
* Updates the parquet-testing submodule to use the corrected files.
### Are these changes tested?
Yes, covered by existing tests.
### Are there any user-facing changes?
No
* GitHub Issue: #45073
Authored-by: Adam Reeve <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
Describe the bug, including details regarding any error messages, version, and platform.
As part of adding Parquet encryption to arrow-rs (apache/arrow-rs#6637), @rok and I found that arrow-rs could not read the example files in parquet-testing due to invalid repetition levels. arrow-rs complains that:
This is due to the int64 list column data being written with the repetition levels flipped, 0 should indicate the start of a new list but 1 is used:
arrow/cpp/src/parquet/encryption/test_encryption_util.cc
Line 121 in b655852
Related to this, is it also a bug that Arrow would read these files without complaining? If I test reading one of these files into Arrow format with PyArrow, the first leaf value is skipped.
Component(s)
C++, Parquet
The text was updated successfully, but these errors were encountered: