-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dictionary support for C data interface #1407
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1407 +/- ##
==========================================
+ Coverage 83.13% 83.16% +0.02%
==========================================
Files 182 182
Lines 53321 53394 +73
==========================================
+ Hits 44330 44404 +74
+ Misses 8991 8990 -1
Continue to review full report at Codecov.
|
cc @alamb @jorgecarleitao @kszucs could you help to review this? thanks! |
We should also be able to pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -122,14 +123,6 @@ def test_type_roundtrip_raises(pyarrow_type): | |||
with pytest.raises(pa.ArrowException): | |||
rust.round_trip_type(pyarrow_type) | |||
|
|||
|
|||
def test_dictionary_type_roundtrip(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
// verify | ||
let new_values = vec!["a", "aaa", "aaa", "a", "aaa", "aaa"]; | ||
let expected: DictionaryArray<Int8Type> = new_values.into_iter().collect(); | ||
assert_eq!(actual, &expected); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🆗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @sunchao . Overall looks great! Left 3 comments below.
arrow/src/ffi.rs
Outdated
let data_type = &self.data_type()?; | ||
// Special handling for dictionary type as we only care about the key type in the case. | ||
let data_type = match &self.data_type()? { | ||
DataType::Dictionary(key_data_type, _) => key_data_type.as_ref().clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it would be possible to not clone here by returning a reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea we can do this. I need to move &self.data_type()?
into a separate assignment though since otherwise I'm getting "temporary value dropped while borrowed" error.
@@ -127,4 +128,14 @@ mod tests { | |||
let data = array.data(); | |||
test_round_trip(data) | |||
} | |||
|
|||
#[test] | |||
fn test_dictionary() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be worth to have an example with validity (in both the keys and values), so that we cover the most complex use-case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Will add.
""" | ||
Python -> Rust -> Python | ||
""" | ||
a = pa.array(["a", "b", "a"], type=pa.dictionary(pa.int8(), pa.string())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here - with validities?
.map(|i| { | ||
let child = self.child(i); | ||
child.to_data() | ||
}) | ||
.map(|d| d.unwrap()) | ||
.collect(); | ||
|
||
if let Some(d) = self.dictionary() { | ||
// For dictionary type there should only be a single child, so we don't need to worry if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add assert for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
|
Hmm you are right. Looks like |
Thanks @jorgecarleitao @alamb @viirya ! I've addressed your comments. Let me know if you have more feedback. Otherwise I'll merge this tonight. |
Merged, thanks! |
Which issue does this PR close?
Closes #1397.
Rationale for this change
Currently the Rust implementation of C data interface doesn't support dictionary type yet, which is necessary if we want to pass data between Rust and other languages such as Java/C++.
What changes are included in this PR?
I kept the
DictionaryArray
untouched so the dictionary (i.e., values) is still stored as the only child of theArrayData
it is associated to.Are there any user-facing changes?
Yes, now we can use C data interface with dictionary type. There is one API change on
ArrowSchema.try_new
:This PR added one parameter
dictionary
.