-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass Decompressed Size to Parquet Codec::decompress #2956
Labels
enhancement
Any new improvement worthy of a entry in the changelog
good first issue
Good for newcomers
help wanted
parquet
Changes to the parquet crate
Comments
tustvold
added
the
enhancement
Any new improvement worthy of a entry in the changelog
label
Oct 27, 2022
Then, I guess that the internal behavior of the
|
marioloko
pushed a commit
to marioloko/arrow-rs
that referenced
this issue
Oct 27, 2022
Added optional argument uncompressed_size to Coded::decompress to do a better estimation of the required uncompress size. * snappy: Probably no much improvement as `decompress_len` is already accurate. * gzip: No improvement. Ignores the size hint. * brotli: Probably no much improvement. The buffer size will be equal to the uncompressed_size size. * lz4: No improvement. As the buffer is located at the stack there are no extra allocations. Then it probably is better to keep it working as it is. * zstd: No improvement. Ignores the size hint. * lz4_raw: Improvement. The estimation method over-estimates, so knowin the uncompressed size reduces allocations.
tustvold
pushed a commit
that referenced
this issue
Oct 29, 2022
* Pass decompressed size to parquet Codec::decompress (#2956) Added optional argument uncompressed_size to Coded::decompress to do a better estimation of the required uncompress size. * snappy: Probably no much improvement as `decompress_len` is already accurate. * gzip: No improvement. Ignores the size hint. * brotli: Probably no much improvement. The buffer size will be equal to the uncompressed_size size. * lz4: No improvement. As the buffer is located at the stack there are no extra allocations. Then it probably is better to keep it working as it is. * zstd: No improvement. Ignores the size hint. * lz4_raw: Improvement. The estimation method over-estimates, so knowin the uncompressed size reduces allocations. * Do not include header size in uncompressed_size. A page may contain header, uncompressed size includes the header size. The `decompress` method expects to receive the `uncompress_size` for the compress block, that is without the page headers. Co-authored-by: Adrián Gallego Castellanos <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
enhancement
Any new improvement worthy of a entry in the changelog
good first issue
Good for newcomers
help wanted
parquet
Changes to the parquet crate
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
We know the size of the decompressed data, as this information is provided in the page header. We can potentially avoid unnecessary reallocations by passing this down to the codecs.
Describe the solution you'd like
Add an optional
uncompressed_size: Option<usize>
toCodec::decompress
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: