Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make RowSelection's from_consecutive_ranges public #5848

Merged
merged 3 commits into from
Jun 11, 2024

Conversation

advancedxy
Copy link
Contributor

Which issue does this PR close?

Closes #5846

Rationale for this change

This constructor method should be easier to use.

What changes are included in this PR?

  1. make the from_consecutive_ranges function public
  2. tweak a bit and makes the total rows optional
  3. update the tests.

Are there any user-facing changes?

NO.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Jun 6, 2024
@advancedxy
Copy link
Contributor Author

@alamb @tustvold it would be great if you can take a look at this when you are free.

ranges: I,
total_rows: usize,
total_rows_opt: Option<usize>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use-case for not knowing the total number of rows?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think that making total_rows optional will make it very hard to use this API correctly. Specifically, the `total_rows1 almost always needs to be necessary, otherwise it will not be possible to represent skipping the last rows in the selection

For example, a range of 100..200 for 400 total rows needs to look ike

Skip(100)
Scan(100)
Skip(100). <--- you can't figure this out from the ranges, you need the total row count

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I thought the final skip is optional based on the trim function. I will revert the API change later. I am traveling now, so it might take some days for me to get back with access to a computer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I thought the final skip is optional based on the trim function.

It probably didn't help that several test cases in apache/datafusion#10738 didn't do this correctly either. I have fixed them in apache/datafusion#10813

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank all for your suggestions. Addressed in the latest commit. Please let me know what you think.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @advancedxy

ranges: I,
total_rows: usize,
total_rows_opt: Option<usize>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think that making total_rows optional will make it very hard to use this API correctly. Specifically, the `total_rows1 almost always needs to be necessary, otherwise it will not be possible to represent skipping the last rows in the selection

For example, a range of 100..200 for 400 total rows needs to look ike

Skip(100)
Scan(100)
Skip(100). <--- you can't figure this out from the ranges, you need the total row count

@advancedxy
Copy link
Contributor Author

The CI failure seems unrelated to this PR?

Run cargo msrv verify
  cargo msrv verify
  shell: sh -e {0}
  env:
    RUSTFLAGS: -C debuginfo=1
    RUST_BACKTRACE: 1
Fetching index
Verifying the Minimum Supported Rust Version (MSRV) for toolchain x86_64-unknown-linux-gnu
Using check command cargo check
   Failed check command cargo check didn't succeed
Crate source was found to be incompatible with its MSRV '1.6[2](https://github.com/apache/arrow-rs/actions/runs/9462178785/job/26064555740?pr=5848#step:11:2).1', as defined in '/__w/arrow-rs/arrow-rs/object_store/Cargo.toml'
Error: Process completed with exit code 1.

@alamb
Copy link
Contributor

alamb commented Jun 11, 2024

The CI failure seems unrelated to this PR?

@korowa fixed this in #5866 earlier today

I merged up from main to get a clean CI run

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @advancedxy

@@ -81,6 +81,13 @@ impl RowSelector {
///
/// let actual: Vec<RowSelector> = selection.into();
/// assert_eq!(actual, expected);
///
/// // you can also create a selection from consecutive ranges
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit a20116e into apache:master Jun 11, 2024
16 checks passed
@alamb
Copy link
Contributor

alamb commented Jun 11, 2024

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make RowSelection::from_consecutive_ranges public
3 participants