-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: reduce copies and skip zeroing memory we read into #1238
Conversation
Benchmark results (using benchmarks in #1235):
|
@@ -230,9 +231,22 @@ impl<'a, T: ByteArrayType> BinaryDecoder<'a, T> { | |||
.null_bit_buffer(Some(null_buf.into())); | |||
} | |||
|
|||
// TODO: replace this with safe method once arrow-rs 47.0.0 comes out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when is 47.0.0 coming out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a week or two? And then we have to wait another two weeks for the next Datafusion release cycle. So we can do this in like 4 weeks I think.
// Zero-copy conversion from bytes | ||
// Safety: the bytes are owned by the `data` value, so the pointer | ||
// will be valid for the lifetime of the Arc we are passing in. | ||
let buf = unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need the alignment check because this is already byte_width aligned?
also, technically bytes
can be empty right? Would that cause any issues with NonNull::new(bytes.as_ptr() as _).unwrap()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah for bytes I think we just require need it to be aligned with u8, which is true of any allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our simd code uses unaligned load now, so it is unnecessary to be word-aligned (i.e., 8 bytes). About 10% overhead due to misaligned memory loading in SIMD computation, much less in overall e2e latency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I align the primitive buffers because arrow-rs errors otherwise
When reading primitive and binary data, we are copying data. We can skip the copies if the buffer is already correctly aligned.
Also, for local filesystem we are spending a lot of time zeroing memory that we immediately write into. This PR change to just initialize without zeroing and write into the data. A check is added to make sure that we don't expose any uninitialized data in the output buffer.