Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPC writer truncated sliced list/map values #5071

Merged
merged 3 commits into from
Nov 13, 2023

Conversation

Jefffrey
Copy link
Contributor

Which issue does this PR close?

Closes #4409

Rationale for this change

Serialize only the necessary child data values for ListArray/LargeListArray/MapArray

What changes are included in this PR?

Apply similar truncating logic for Binary/Utf8 to List/LargeList/Map when doing IPC write, but ensure we apply the truncation to their child array instead of values buffer (as done for Binary/Utf8)

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Nov 13, 2023
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you.

This PR also appears to include some parquet-testing submodule changes, it might be worth backing this out.

@@ -1139,6 +1139,29 @@ fn get_buffer_element_width(spec: &BufferSpec) -> usize {
}
}

/// Common functionality for re-encoding offsets. Returns the new offsets as well as
/// original start offset and length for use in slicing child data.
fn reencode_offsets<O: OffsetSizeTrait>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👌

@Jefffrey
Copy link
Contributor Author

This PR also appears to include some parquet-testing submodule changes, it might be worth backing this out.

Ah my mistake, reverted

@tustvold tustvold merged commit 924b6e9 into apache:master Nov 13, 2023
25 checks passed
@tustvold
Copy link
Contributor

Thanks once again

@Jefffrey Jefffrey deleted the ipc_writer_truncate_sliced_lists branch November 13, 2023 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IPC Writer Truncate Sliced List Values
2 participants