-
-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ProjectionSchema self-inconsistency with partitioned source #354
Comments
That's pretty expected behaviour, because the |
Thank you for answer and explanations. I have looked through code and I understand how this "unexpected" value comes into result row. My question is wouldn't it be better to refer to projection schema (if any) in com.github.mjakubowski84.parquet4s.ParquetReader.BuilderImpl#setPartitionValues? |
The issue is more with the other way around. If the user wants to see partition values in the output they have to specify it in the schema. And then Parguet4s has to remove this column from the projection when reading files. |
Hi,
I have faced self-inconsistency with handling of projection schema.
When reading partitioned parquet using
ParquetReader.projectedGeneric(expectedSchema).options(...).read
the output rows contain partitioning columns even ifexpectedSchema
doesn't.When reading not-partitioned parquet the output rows contain columns listed in
expectedSchema
.I am not sure what the "proper" behavior is, but observed one looks not self-consistent.
Parquet4s version 2.18.0
Here is a test case:
The output is:
The text was updated successfully, but these errors were encountered: