-
Notifications
You must be signed in to change notification settings - Fork 930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Fix issue with extracting nested column data & dtype preservation #11671
[REVIEW] Fix issue with extracting nested column data & dtype preservation #11671
Conversation
expected = pd.read_parquet(buffer) | ||
actual = cudf.read_parquet(buffer) | ||
assert_eq(expected, actual) | ||
assert_eq(df.a.dtype, actual.a.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current branch-22.10
issue can be seen in this error:
(Pdb) assert_eq(df.a.dtype, actual.a.dtype)
*** AssertionError:
Items are not equal:
ACTUAL: StructDtype({'Domain': StructDtype({'Id': StructDtype({'Name': dtype('O'), 'Value': dtype('O')}), 'Name': dtype('O')}), 'Duration': dtype('int64'), 'Offset': dtype('int64'), 'Resource': ListDtype(StructDtype({'Name': dtype('O'), 'Value': dtype('O')})), 'StreamId': dtype('O')})
DESIRED: StructDtype({'Domain': StructDtype({'Id': StructDtype({'Name': dtype('O'), 'Value': dtype('O')}), 'Name': dtype('O')}), 'Duration': dtype('int64'), 'Offset': dtype('int64'), 'Resource': ListDtype(StructDtype({'0': dtype('O'), '1': dtype('O')})), 'StreamId': dtype('O')})
Codecov ReportBase: 86.39% // Head: 86.41% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## branch-22.10 #11671 +/- ##
================================================
+ Coverage 86.39% 86.41% +0.02%
================================================
Files 145 145
Lines 23005 23009 +4
================================================
+ Hits 19875 19884 +9
+ Misses 3130 3125 -5
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, with minor queries about some aspects, and then a more general question about whether the API of interop.to_arrow
is the most natural one.
Co-authored-by: Lawrence Mitchell <[email protected]>
…into nested_issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@gpucibot merge |
Description
This PR:
Fixes: #11670
column_metadata
for nested scenarios.children
in aListColumn
. See the pytest below.Checklist