-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++][Python] Unable to cast date{32,64}
to date{32,64}
#43183
Comments
Fokko
added a commit
to Fokko/arrow
that referenced
this issue
Jul 8, 2024
Fokko
added a commit
to Fokko/arrow
that referenced
this issue
Jul 8, 2024
Fokko
added a commit
to Fokko/arrow
that referenced
this issue
Jul 8, 2024
Fokko
added a commit
to Fokko/arrow
that referenced
this issue
Jul 8, 2024
Fokko
added a commit
to Fokko/arrow
that referenced
this issue
Jul 8, 2024
date32
to date32
date{32,64}
to date{32,64}
pitrou
pushed a commit
that referenced
this issue
Jul 10, 2024
### Rationale for this change This one seems to be missing, see #43183 ### What changes are included in this PR? ### Are these changes tested? I'm not sure what the best place is to test this, please advise ### Are there any user-facing changes? * GitHub Issue: #43183 Lead-authored-by: Fokko <[email protected]> Co-authored-by: Fokko Driesprong <[email protected]> Co-authored-by: Hyunseok Seo <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Issue resolved by pull request 43192 |
date{32,64}
to date{32,64}
date{32,64}
to date{32,64}
Fokko
added a commit
to Fokko/iceberg-python
that referenced
this issue
Feb 16, 2025
When reading a Parquet file using PyArrow, there is some metadata stored in the Parquet file to either make it a large type (eg `large_string`, or a normal type (`string`). The difference is that the large types use a 64 bit offset to encode their arrays. This is not always needed, and we can could first check all the in the types of which it is stored, and let PyArrow decide here: https://github.com/apache/iceberg-python/blob/300b8405a0fe7d0111321e5644d704026af9266b/pyiceberg/io/pyarrow.py#L1579 In PyArrow today we just bump everything to a large type, which might lead to additional memory consumption because it allocates a int64 array to allocate the offsets, instead of an int32. I thought we would be good to go for this now with the new lower bound of PyArrow to 17. But, it looks like we still have to wait for Arrow 18 to fix the issue with the `date` types: apache/arrow#43183
Fokko
added a commit
to Fokko/iceberg-python
that referenced
this issue
Feb 16, 2025
When reading a Parquet file using PyArrow, there is some metadata stored in the Parquet file to either make it a large type (eg `large_string`, or a normal type (`string`). The difference is that the large types use a 64 bit offset to encode their arrays. This is not always needed, and we can could first check all the in the types of which it is stored, and let PyArrow decide here: https://github.com/apache/iceberg-python/blob/300b8405a0fe7d0111321e5644d704026af9266b/pyiceberg/io/pyarrow.py#L1579 In PyArrow today we just bump everything to a large type, which might lead to additional memory consumption because it allocates a int64 array to allocate the offsets, instead of an int32. I thought we would be good to go for this now with the new lower bound of PyArrow to 17. But, it looks like we still have to wait for Arrow 18 to fix the issue with the `date` types: apache/arrow#43183 Fixes: apache#1049
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug, including details regarding any error messages, version, and platform.
It looks like I'm able to cast ints/string:
But it seems to fail with a
date32
:Same for
date64
:This looks like a valid cast operation to me. Please advise. Happy to create a PR, if someone can point out the place where I should add the test would be very helpful, since I'm not familiar with the codebase :)
Component(s)
C++
The text was updated successfully, but these errors were encountered: