Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support hive partitioning in scan_ipc #17434

Merged
merged 4 commits into from
Jul 5, 2024

Conversation

nameexhaustion
Copy link
Collaborator

@nameexhaustion nameexhaustion commented Jul 5, 2024

Resolves #14885


// Used to determine the next file to open. This guarantees the order.
let path_index = AtomicUsize::new(0);
let row_counter = RwLock::new(ConsecutiveCountState::new(self.paths.len()));
Copy link
Collaborator Author

@nameexhaustion nameexhaustion Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by simplify a bit by removing these atomics - into_par_iter guarantees result ordering

// Used to determine the next file to open. This guarantees the order.
let path_index = AtomicUsize::new(0);
let row_counter = RwLock::new(ConsecutiveCountState::new(self.paths.len()));
let mut dfs = if let Some(mut n_rows) = self.file_options.n_rows {
Copy link
Collaborator Author

@nameexhaustion nameexhaustion Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by - use sequential read if there is a row limit - this is usually what we do in other places (union, csv scan)

@nameexhaustion nameexhaustion changed the title feat: scan_ipc(hive_partitioning=True) feat: S scan_ipc(hive_partitioning=True) Jul 5, 2024
@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Jul 5, 2024
@nameexhaustion nameexhaustion changed the title feat: S scan_ipc(hive_partitioning=True) feat: Support hive partitioning in scan_ipc Jul 5, 2024
@nameexhaustion nameexhaustion marked this pull request as ready for review July 5, 2024 06:15
Copy link

codecov bot commented Jul 5, 2024

Codecov Report

Attention: Patch coverage is 98.40426% with 3 lines in your changes missing coverage. Please review.

Project coverage is 80.63%. Comparing base (f803053) to head (a851221).

Files Patch % Lines
crates/polars-io/src/hive.rs 95.65% 1 Missing ⚠️
crates/polars-io/src/ipc/ipc_file.rs 98.03% 1 Missing ⚠️
crates/polars-mem-engine/src/executors/scan/ipc.rs 98.82% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #17434      +/-   ##
==========================================
- Coverage   80.68%   80.63%   -0.06%     
==========================================
  Files        1476     1477       +1     
  Lines      193490   193457      -33     
  Branches     2760     2760              
==========================================
- Hits       156126   155996     -130     
- Misses      36854    36951      +97     
  Partials      510      510              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@ritchie46 ritchie46 merged commit c390fd7 into pola-rs:main Jul 5, 2024
28 checks passed
@nameexhaustion nameexhaustion deleted the hive-ipc branch July 8, 2024 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support hive partitioning in scan_ipc
2 participants