-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Parquet writer versions V1 and V2 #20957
Support Parquet writer versions V1 and V2 #20957
Conversation
|
9d2fc6d
to
43348ee
Compare
@yingsu00 Can you please take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nmahadevuni Thanks for the nice work. It'll be best if the release note can include an example so the users know exactly how the session property should be configured. Other than that it's just a couple of nits.
implements ColumnWriter | ||
{ | ||
private final Type type; | ||
private final ColumnDescriptor columnDescriptor; | ||
protected final ColumnDescriptor columnDescriptor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move the private ones to be after the protected ones
|
||
if (getBufferedBytes() >= pageSizeThreshold) { | ||
flushCurrentPageToBuffer(); | ||
} | ||
} | ||
|
||
protected abstract void writeDefinitionAndRepetitionLevels(ColumnChunk current) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this method to be after all public ones
@nmahadevuni there is an easyCLA issue. |
That's for the co-author who is not registered in LF easyCLA. There is no way to fix that as my previous issue created with Linux foundation about it was closed saying that its expected and there is no way to remove that. |
43348ee
to
86db17d
Compare
@nmahadevuni Make yourself as the main author. If that doesn't work, remove the co-author information and add a line in the commit message "co-authored by ..." |
86db17d
to
f037a07
Compare
Cherry-pick of trinodb/trino#9497 and trinodb/trino#9611 co-authored by Josh Howard <[email protected]>
f037a07
to
7cbbc0e
Compare
@zhenxiao @shangxinli Can you please review this PR? |
This change seems to have collided with another change and together they are causing compilation failures. See #19983 (comment) |
prestodb#20957 and prestodb#19983 were merged at similar times, and when taken together broke the build (as changes required by prestodb#20957 aren't present in prestodb#19983).
I raised #21043 to fix the build |
prestodb#20957 and prestodb#19983 were merged at similar times, and when taken together broke the build (as changes required by prestodb#20957 aren't present in prestodb#19983).
@nmahadevuni, @yingsu00 I see from the Trino commit in the trinodb/trino#9611 description that Trino switched the default to PARQUET_1_0. Is there a reason why Presto chose PARQUET_2_0? |
This PR merely added a config property that allows the user to choose v1 or v2. Before this change, Presto always write V2 and user cannot change it. To avoid disruption to existing users we didn't change the default version. |
Cherry-pick of trinodb/trino#9497 and trinodb/trino#9611
Description
Adds support for writing Parquet data versions V1 and V2 using a config and session property.
Motivation and Context
Fixes #20926
Impact
No external changes. Default Parquet version is set to V2 which is currently used in Presto.
Test Plan
Amended the ParquetTester.java to run round trip tests for both writer versions.