-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: we may have a chance to avoid some copies when decoding with some alignment adjustment(page alignment, chunk alignment) #3115
Comments
We already align each chunk to 8 bytes but there was a bug preventing this from working correctly.
How much stronger? 64 byte alignment for a 4KiB chunk seems pretty extensive (up to 6% of the block is wasted space) but I suppose it is manageable. Still, if we want to require this then I'd prefer changing the |
I did the math wrong. 64/4096 is 1/64th so more like 1-2%. |
We align buffers in the file writer but we were not doing the same thing in the test utility. This forced encodings to do extra copies. We remove one such copy in this PR. Closes #3115
instantiate a copy for the data using .to_vec doesn't guarantee the data starts at a 64 byte aligned position either, we may be able to get rid of this copy with the currently page layout padding, if so, changes in bitpack mini-block compression logic needed.
|
PR #3101 added alignment in page layout and chunk layout, but PR #3099 still need to do a copy of the raw data read from disk to start decoding, see the code here
for fastlanes bitpacking, there is also a copy
lance/rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs
Line 1727 in c237bcb
in
binary + miniblock
, I think this copy can be avoided if we align each chunk to 4 bytes.in
fastlanes bitpacking
, because the use of SIMD instruction, the alignment requirement is stronger, reference here, we may also need to change the compression logic to allow itThe text was updated successfully, but these errors were encountered: