Build: Bump Apache Parquet 1.14.4 #11502

Fokko · 2024-11-08T20:56:13Z

No description provided.

…)" (apache#11462)" This reverts commit 7cc16fa.

singhpk234 · 2024-11-09T05:23:48Z

...18/flink/src/test/java/org/apache/iceberg/flink/source/TestMetadataTableReadableMetrics.java

+    Row booleanCol = Row.of(36L, 4L, 0L, null, false, true);
+    Row decimalCol = Row.of(91L, 4L, 1L, null, new BigDecimal("1.00"), new BigDecimal("2.00"));
+    Row doubleCol = Row.of(91L, 4L, 0L, 1L, 1.0D, 2.0D);


[optional] should we refactor this to pick file_size from the Datafiles themselve like we did we did in JDK 17 upgrade PR #7391 (comment)

Never the less looks like size in bytes is increasing in this version is it because they are more accurate now ?

Hey @singhpk234, that's an excellent suggestion. I've copied your approach here as well. Parquet now also tracks how large the data is in memory after compression (this is handy for strings where you don't know that upfront) so you can allocate buffers directly to the right size.

how large the data is in memory after compression (this is handy for strings where you don't know that upfront) so you can allocate buffers directly to the right size.

This is precisely what we needed in Redshift as well, our CBO was falling behind with variable length data types, will give them HeadsUp ! Thankyou @Fokko

jbonofre

The Parquet update looks good, I'm just wondering about the row size increase in the test. I would add at least in the comment in the test to explain the reason.

build.gradle

...18/flink/src/test/java/org/apache/iceberg/flink/source/TestMetadataTableReadableMetrics.java

build.gradle

singhpk234

LGTM, Thanks @Fokko !

nastra · 2024-11-19T12:12:52Z

...18/flink/src/test/java/org/apache/iceberg/flink/source/TestMetadataTableReadableMetrics.java

+    // size of the column to increase. For example, with Parquet 1.14.x the
+    // uncompressed size has been added to allow for better allocation of memory upfront.
+    // Therefore, we look the sizes up, rather than hardcoding them
+    DataFile dataFile = table.currentSnapshot().addedDataFiles(table.io()).iterator().next();


it seems that we're assuming only a single file, so we might as well use Iterables.getOnlyElement(table.currentSnapshot().addedDataFiles(table.io()))

* Revert "Revert "Build: Bump parquet from 1.13.1 to 1.14.3 (apache#11264)" (apache#11462)" This reverts commit 7cc16fa. * Bump to Parquet 1.14.4 * Lookup sizes instead * Update build.gradle

Fokko added 2 commits November 8, 2024 21:45

Revert "Revert "Build: Bump parquet from 1.13.1 to 1.14.3 (apache#11264…

d5f6087

…)" (apache#11462)" This reverts commit 7cc16fa.

Bump to Parquet 1.14.4

665487a

github-actions bot added flink build labels Nov 8, 2024

Fokko changed the title ~~Test out Apache Parquet 1.14.4~~ Test out Apache Parquet 1.14.4 RC2 Nov 8, 2024

singhpk234 reviewed Nov 9, 2024

View reviewed changes

jbonofre self-requested a review November 9, 2024 09:38

jbonofre reviewed Nov 9, 2024

View reviewed changes

Lookup sizes instead

68645be

Fokko force-pushed the fd-parq branch from 057a067 to 68645be Compare November 9, 2024 21:55

Fokko mentioned this pull request Nov 11, 2024

Build: Bump parquet from 1.13.1 to 1.14.3 #11507

Closed

Fokko changed the title ~~Test out Apache Parquet 1.14.4 RC2~~ Build: Bump Apache Parquet 1.14.4 Nov 12, 2024

Fokko marked this pull request as ready for review November 12, 2024 12:45

Fokko commented Nov 12, 2024

View reviewed changes

build.gradle Outdated Show resolved Hide resolved

Update build.gradle

48e9524

Fokko mentioned this pull request Nov 13, 2024

Remove iceberg-pig #11380

Merged

Merge branch 'main' into fd-parq

7d43a19

singhpk234 approved these changes Nov 17, 2024

View reviewed changes

Fokko mentioned this pull request Nov 17, 2024

Build: Bump parquet from 1.13.1 to 1.14.4 #11570

Closed

Merge branch 'main' into fd-parq

51cda6e

Fokko requested a review from nastra November 19, 2024 11:14

nastra reviewed Nov 19, 2024

View reviewed changes

nastra approved these changes Nov 19, 2024

View reviewed changes

Fokko merged commit 657fa86 into apache:main Nov 20, 2024
49 checks passed

Fokko deleted the fd-parq branch November 20, 2024 08:35

nastra mentioned this pull request Nov 20, 2024

TestMetadataTableReadableMetrics get expected size from the underlyin… #11598

Closed

ajreid21 mentioned this pull request Dec 9, 2024

Flink: TestMetadataTableReadableMetrics relies on Hardcoded File Sizes #11465

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build: Bump Apache Parquet 1.14.4 #11502

Build: Bump Apache Parquet 1.14.4 #11502

Fokko commented Nov 8, 2024

singhpk234 Nov 9, 2024

Fokko Nov 9, 2024

singhpk234 Nov 11, 2024

jbonofre left a comment

singhpk234 left a comment

nastra Nov 19, 2024

Build: Bump Apache Parquet 1.14.4 #11502

Build: Bump Apache Parquet 1.14.4 #11502

Conversation

Fokko commented Nov 8, 2024

singhpk234 Nov 9, 2024

Choose a reason for hiding this comment

Fokko Nov 9, 2024

Choose a reason for hiding this comment

singhpk234 Nov 11, 2024

Choose a reason for hiding this comment

jbonofre left a comment

Choose a reason for hiding this comment

singhpk234 left a comment

Choose a reason for hiding this comment

nastra Nov 19, 2024

Choose a reason for hiding this comment