Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't Duplicate Offset Index on RowGroupMetadata #4142

Merged
merged 3 commits into from
Apr 27, 2023

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Relates to #4090

Rationale for this change

Noticed this whilst working on #4090, it is peculiar to be storing a subset of the page index on RowGroupMetadata in addition to at the top-level on ParquetMetadata. An argument could perhaps be made for storing the offset and column indices at the ColumnChunkMetadata level, but duplicating it at the top-level and the row group level is confusing and does not seem to be necessary.

What changes are included in this PR?

Are there any user-facing changes?

@tustvold tustvold added the api-change Changes to the arrow API label Apr 26, 2023
@github-actions github-actions bot added the parquet Changes to the parquet crate label Apr 26, 2023
@@ -999,29 +1004,9 @@ mod tests {
requests: Default::default(),
};

let options = ArrowReaderOptions::new().with_page_index(true);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just a copy-paste of the test above and was unnecessary, so I opted to just remove it

@@ -132,13 +132,25 @@ impl ParquetMetaData {
}

/// Returns page indexes in this file.
#[deprecated(note = "Use Self::column_index")]
pub fn page_indexes(&self) -> Option<&ParquetColumnIndex> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These accessors were very confusingly named, so I opted to rename them for clarity

@tustvold tustvold merged commit d8a3b1c into apache:master Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants