-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose bloom filter offset in ColumnChunkMetadata
#1309
Conversation
Thank you @shanisolomon for helping to push the rust reader implementation along 🎉 -- I think the tests are failing here because the reference to the new example data in apache/parquet-testing#22 hasn't been incorporated yet. Once we get the new example data in parquet-testing, we can then update the submodule reference on this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the code contribution @shanisolomon -- the code looks great to me 🏅 -- a very nice first contribution.
Once we sort out the getting the appropriate test file in parquet-data I think we'll be good to go.
If you don't mind my asking, do you have larger plans for supporting index pages and bloom filters? I think there would be substantial interest in the larger Rust/arrow community for Bloom filter support and I would be willing to find time to help design / review it if that would be helpful
Thanks again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I am now preparing a PR to update the arrowrs to apache/parquet-testing#22 |
Thanks @alamb. Regarding further bloom filters support - yes, we would eventually be interested in reading bloom filters (and other optional metadata) in order to optimize our reads. Currently we're just interested in knowing whether or not such bloom filters exists, hence this PR. I'll be sure to reach out to you when and if we choose to do so :). Thanks! |
I verified that this branch passes all the tests when I cherry-picked 1bf62e3 locally So the next steps should be:
|
Codecov Report
@@ Coverage Diff @@
## master #1309 +/- ##
==========================================
- Coverage 83.04% 83.03% -0.01%
==========================================
Files 180 180
Lines 52846 52865 +19
==========================================
+ Hits 43884 43895 +11
- Misses 8962 8970 +8
Continue to review full report at Codecov.
|
Thanks again @shanisolomon |
ColumnChunkMetadata
Closes #1308.
Exposing the bloom filter offset in in the column chunk metadata so parquet engines could optimize their reads.
Test file
data_index_bloom.parquet
is added to parquet-testing via apache/parquet-testing#22.