-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: don't raise NotImplementedError
when reading empty array from Parquet
#2194
fix: don't raise NotImplementedError
when reading empty array from Parquet
#2194
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a different solution from the one I had been thinking of, and if it were not for the fact that the extensionarray=True
case leads to
File "<stdin>", line 1, in <module>
File "pyarrow/table.pxi", line 2460, in pyarrow.lib.RecordBatch.from_arrays
File "pyarrow/table.pxi", line 1406, in pyarrow.lib._sanitize_arrays
File "pyarrow/array.pxi", line 343, in pyarrow.lib.asarray
File "pyarrow/array.pxi", line 317, in pyarrow.lib.array
File "pyarrow/array.pxi", line 39, in pyarrow.lib._sequence_to_array
File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 121, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: extension
I would have gone with it (with the minor correction of passing the arguments generate_bitmasks
and pass_empty_field
through the recursive call to handle_arrow
). It's a good idea.
The other method, #2193 (comment), doesn't involve Arrow, so at least we won't run into any ArrowNotImplementedError
that way.
Also, the PR needs tests. Other Arrow and Parquet conversion tests are parameterized for extensionarray in (False, True)
so this would have been caught.
It might be enough to just return this: form_handle_arrow(obj.schema, pass_empty_field=pass_empty_field).length_zero_array() |
Codecov Report
Additional details and impacted files
|
Here is a test that triggers the error: ak.from_arrow(ak.to_arrow_table(ak.Array([{"x": 1, "y": 2.2}])[0:0], extensionarray=extensionarray)) (not actually Parquet-related). Tests would either go in a file named
or into the existing
Your choice, although we usually just make new files for new issues, and you can use the above as a template for parameterizing by |
That does seems to do the trick! Looks like it needs to be
Right, but putting the test in |
You're right about both points! Since I think you're at an inflection point, I'm going to do an "update branch" to make sure that it will work when up-to-date on |
I've approved the changes, so if it passes tests, it's ready to be merged! |
@all-contributors please add @dsavoiu for code |
I've put up a pull request to add @dsavoiu! 🎉 |
This PR proposes an implementation for reading an zero-length
ak.RecordArray
/ ArrowTable
from Parquet.Since for zero-length there are no batches to use for conversion, an empty
RecordBatch
is created matching the input schema, and is propagated through thehandle_arrow
function.There might be better ways to do this (maybe avoiding the list-of-empty lists?), so let me know.
Closes #2193.