-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move FFI to sub-crates #3687
Move FFI to sub-crates #3687
Conversation
arrow-data/src/ffi.rs
Outdated
/// Returns the buffer at the provided index | ||
/// | ||
/// # Panic | ||
/// Panics if index exceeds the number of buffers or the buffer is not correctly aligned | ||
pub fn buffer(&self, index: usize) -> *const u8 { | ||
assert!(!self.buffers.is_null()); | ||
assert!(index < self.num_buffers()); | ||
// SAFETY: | ||
// If buffers is not null must be valid for reads up to num_buffers | ||
unsafe { std::ptr::read_unaligned((self.buffers as *mut *const u8).add(index)) } | ||
} | ||
|
||
/// Returns the number of buffers | ||
pub fn num_buffers(&self) -> usize { | ||
self.n_buffers as _ | ||
} | ||
|
||
/// Returns the child at the provided index | ||
pub fn child(&self, index: usize) -> &FFI_ArrowArray { | ||
assert!(!self.children.is_null()); | ||
assert!(index < self.num_children()); | ||
// Safety: | ||
// If children is not null must be valid for reads up to num_children | ||
unsafe { | ||
let child = std::ptr::read_unaligned(self.children.add(index)); | ||
child.as_ref().unwrap() | ||
} | ||
} | ||
|
||
/// Returns the number of children | ||
pub fn num_children(&self) -> usize { | ||
self.n_children as _ | ||
} | ||
|
||
/// Returns the dictionary if any | ||
pub fn dictionary(&self) -> Option<&Self> { | ||
// Safety: | ||
// If dictionary is not null should be valid for reads of `Self` | ||
unsafe { self.dictionary.as_ref() } | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods are new, but just inline what was in arrow::ffi
to avoid needing access to private members
// Safety: | ||
// If children is not null must be valid for reads up to num_children | ||
unsafe { | ||
let child = std::ptr::read_unaligned(self.children.add(index)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously the implementation did a blind dereference here, I couldn't find anything to suggest alignment was actually guaranteed, so changed to use read_unaligned
. On x86 there is no difference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the rationale for this change?
https://doc.rust-lang.org/std/ptr/fn.read_unaligned.html
Is it because the data (may) come from an unaligned location?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it because the data (may) come from an unaligned location?
Yes, the C Data interface does not specify what alignment the various pointers may or may not have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally I cannot think that how/why the pointers come from unaligned location, but assuming it could be seems safer.
schema.release = None; | ||
Ok(schema) => { | ||
unsafe { std::ptr::copy(addr_of!(schema), out, 1) }; | ||
std::mem::forget(schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a "nicer" way to implement the move, that avoids needing access to the private internals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @viirya
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By forget
ing it, will some non-pointer fields like flags
and n_children
be released? Supposed that it should be but wanted to verify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flags
and n_children
aren't separate allocations, they are all stored inline with the rest of the struct members on the stack allocated schema
. So no they aren't freed, but they also don't need to be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I suppose that you meant they are deallocated from stack?
.map(|ptr| Buffer::from_custom_allocation(ptr, len, owner)) | ||
} | ||
|
||
fn create_child( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method was previously incorrect as it coerced non-static borrows to static borrows. Fortunately it can be removed without any change to the public interface
@@ -827,7 +423,7 @@ pub struct ArrowArray { | |||
pub struct ArrowArrayChild<'a> { | |||
array: &'a FFI_ArrowArray, | |||
schema: &'a FFI_ArrowSchema, | |||
owner: Arc<FFI_ArrowArray>, | |||
owner: &'a Arc<FFI_ArrowArray>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just avoids an unnecessary clone
@@ -1526,24 +1108,6 @@ mod tests { | |||
Ok(()) | |||
} | |||
|
|||
#[test] | |||
fn null_array_n_buffers() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to arrow-data
cbd1c03
to
1122caf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also add arrow-ffi
to the crate topology here https://docs.rs/arrow/32.0.0/arrow/ ?
I think the movement parts of the PR make sense to me,
I am not as confident about the other changes (like read_unaligned
and forget
) and would like someone else to review them too prior to merge
Another alternative would be split the PR into 2 pices: 1 "code movement into crates" and 2 "change in code" parts
// Safety: | ||
// If children is not null must be valid for reads up to num_children | ||
unsafe { | ||
let child = std::ptr::read_unaligned(self.children.add(index)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the rationale for this change?
https://doc.rust-lang.org/std/ptr/fn.read_unaligned.html
Is it because the data (may) come from an unaligned location?
schema.release = None; | ||
Ok(schema) => { | ||
unsafe { std::ptr::copy(addr_of!(schema), out, 1) }; | ||
std::mem::forget(schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @viirya
It doesn't add a new crate, just moves the ffi logic into the existing sub-crates
These were workarounds to avoid making mutation methods public, which would be problematic on many levels 😄 |
17e8299
to
c32b0a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me (✅ ) thanks
Benchmark runs are scheduled for baseline = 560ebaa and contender = 3761ac5. 3761ac5 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
As part of pola-rs/polars#6735 I would like to use the FFI interfaces without needing to bring in the top-level crate, this PR therefore moves FFI into arrow-schema and arrow-data.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?