-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support type coercion in Parquet Reader #6458
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @e1ijah1 ! This is looking very cool
I think to really complete this feature we should have an "end to end test" -- like actually creating parquet files with two different schemas and showing how they can be read as a single table using this feature
Perhaps we could add a test to https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/parquet, following the model of
@tustvold since you filed #6427 do you have some idea about how this feature would be used?
Marking as draft as we are waiting on feedback |
The use-case is where you have one or more parquet files, with different schema. You can provide a file_schema to FileScanConfig (or to ListingTableConfig) and have underlying data coerced to that schema on read. An example might be if a column has changed from a |
a853cce
to
6eb5353
Compare
5287f6d
to
030a631
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you this looks good to me
Getting this in to avoid conflicts with #6374 |
I filed a follow on PR #6563 that avoids needing to recompute the mapping for each batch |
Which issue does this PR close?
Closes #6427 .
Rationale for this change
What changes are included in this PR?
Support type coercion in Parquet Reader
Are these changes tested?
Are there any user-facing changes?