Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(rust): faster decode on Parquet HybridRLE #17208

Merged
merged 2 commits into from
Jun 26, 2024

Conversation

coastalwhite
Copy link
Collaborator

This PR performs several optimization surrounding the HybridRLE decoding:

  • A lot of iterators in favor of collecting and buffering
  • Flatten nested iterators into a single iterators
  • Remove many memcopies in bitpacked::decode.

Overall, my microbenchmarks showed an improvement of roughly 10x for the HybridRLE decoding. In order to get a better view on how to much impact this has on totality of parquet decoding I did a benchmark using the Yellow NYC Taxi dataset. This got me a roughly 1.8x speed-up for general parquet decoding.

This PR performs several optimization surrounding the HybridRLE decoding:

- A lot of iterators in favor of collecting and buffering
- Flatten nested iterators into a single iterators
- Remove many memcopies in `bitpacked::decode`.

Overall, my microbenchmarks showed an improvement of roughly 10x for the HybridRLE decoding. In order to get a better view on how to much impact this has on totality of parquet decoding I did a benchmark using the Yellow NYC Taxi dataset. This got me a roughly 1.8x speed-up.
@github-actions github-actions bot added performance Performance issues or improvements rust Related to Rust Polars labels Jun 26, 2024
Copy link

codecov bot commented Jun 26, 2024

Codecov Report

Attention: Patch coverage is 92.30769% with 20 lines in your changes missing coverage. Please review.

Project coverage is 80.85%. Comparing base (7a1be3b) to head (93a6ee6).
Report is 11 commits behind head on main.

Current head 93a6ee6 differs from pull request most recent head 882e362

Please upload reports for the commit 882e362 to get more accurate results.

Files Patch % Lines
...s-parquet/src/parquet/encoding/bitpacked/decode.rs 89.77% 9 Missing ⚠️
...ars-parquet/src/parquet/encoding/hybrid_rle/mod.rs 97.01% 4 Missing ⚠️
...quet/src/arrow/read/deserialize/binary/decoders.rs 25.00% 3 Missing ⚠️
...rquet/src/arrow/read/deserialize/dictionary/mod.rs 50.00% 2 Missing ⚠️
.../arrow/read/deserialize/fixed_size_binary/basic.rs 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #17208      +/-   ##
==========================================
+ Coverage   80.84%   80.85%   +0.01%     
==========================================
  Files        1464     1465       +1     
  Lines      192016   192190     +174     
  Branches     2743     2745       +2     
==========================================
+ Hits       155243   155404     +161     
- Misses      36262    36281      +19     
+ Partials      511      505       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46 ritchie46 merged commit cc4a0df into pola-rs:main Jun 26, 2024
21 checks passed
@coastalwhite coastalwhite deleted the perf-parquet-hybridrle-unnest branch June 26, 2024 14:45
@c-peters c-peters added the accepted Ready for implementation label Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation performance Performance issues or improvements rust Related to Rust Polars
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants