Skip to content

Commit

Permalink
[SPARK-34542][BUILD] Upgrade Parquet to 1.12.0
Browse files Browse the repository at this point in the history
Parquet 1.12.0 New Feature
- PARQUET-41 - Add bloom filters to parquet statistics
- PARQUET-1373 - Encryption key management tools
- PARQUET-1396 - Example of using EncryptionPropertiesFactory and DecryptionPropertiesFactory
- PARQUET-1622 - Add BYTE_STREAM_SPLIT encoding
- PARQUET-1784 - Column-wise configuration
- PARQUET-1817 - Crypto Properties Factory
- PARQUET-1854 - Properties-Driven Interface to Parquet Encryption

Parquet 1.12.0 release notes:
https://github.com/apache/parquet-mr/blob/apache-parquet-1.12.0/CHANGES.md

- Bloom filters to improve filter performance
- ZSTD enhancement

No.

Existing unit test.

Closes apache#31649 from wangyum/SPARK-34542.

Lead-authored-by: Yuming Wang <[email protected]>
Co-authored-by: Yuming Wang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
  • Loading branch information
2 people authored and shipenglei committed Feb 14, 2022
1 parent 2460a1e commit 25a53f1
Show file tree
Hide file tree
Showing 4 changed files with 15 additions and 15 deletions.
12 changes: 6 additions & 6 deletions dev/deps/spark-deps-hadoop-2.7-hive-2.3
Original file line number Diff line number Diff line change
Expand Up @@ -201,12 +201,12 @@ orc-shims/1.6.11//orc-shims-1.6.11.jar
oro/2.0.8//oro-2.0.8.jar
osgi-resource-locator/1.0.3//osgi-resource-locator-1.0.3.jar
paranamer/2.8//paranamer-2.8.jar
parquet-column/1.10.1//parquet-column-1.10.1.jar
parquet-common/1.10.1//parquet-common-1.10.1.jar
parquet-encoding/1.10.1//parquet-encoding-1.10.1.jar
parquet-format/2.4.0//parquet-format-2.4.0.jar
parquet-hadoop/1.10.1//parquet-hadoop-1.10.1.jar
parquet-jackson/1.10.1//parquet-jackson-1.10.1.jar
parquet-column/1.12.0//parquet-column-1.12.0.jar
parquet-common/1.12.0//parquet-common-1.12.0.jar
parquet-encoding/1.12.0//parquet-encoding-1.12.0.jar
parquet-format-structures/1.12.0//parquet-format-structures-1.12.0.jar
parquet-hadoop/1.12.0//parquet-hadoop-1.12.0.jar
parquet-jackson/1.12.0//parquet-jackson-1.12.0.jar
protobuf-java/2.5.0//protobuf-java-2.5.0.jar
py4j/0.10.9//py4j-0.10.9.jar
pyrolite/4.30//pyrolite-4.30.jar
Expand Down
12 changes: 6 additions & 6 deletions dev/deps/spark-deps-hadoop-3.2-hive-2.3
Original file line number Diff line number Diff line change
Expand Up @@ -214,12 +214,12 @@ orc-shims/1.6.11//orc-shims-1.6.11.jar
oro/2.0.8//oro-2.0.8.jar
osgi-resource-locator/1.0.3//osgi-resource-locator-1.0.3.jar
paranamer/2.8//paranamer-2.8.jar
parquet-column/1.10.1//parquet-column-1.10.1.jar
parquet-common/1.10.1//parquet-common-1.10.1.jar
parquet-encoding/1.10.1//parquet-encoding-1.10.1.jar
parquet-format/2.4.0//parquet-format-2.4.0.jar
parquet-hadoop/1.10.1//parquet-hadoop-1.10.1.jar
parquet-jackson/1.10.1//parquet-jackson-1.10.1.jar
parquet-column/1.12.0//parquet-column-1.12.0.jar
parquet-common/1.12.0//parquet-common-1.12.0.jar
parquet-encoding/1.12.0//parquet-encoding-1.12.0.jar
parquet-format-structures/1.12.0//parquet-format-structures-1.12.0.jar
parquet-hadoop/1.12.0//parquet-hadoop-1.12.0.jar
parquet-jackson/1.12.0//parquet-jackson-1.12.0.jar
protobuf-java/2.5.0//protobuf-java-2.5.0.jar
py4j/0.10.9//py4j-0.10.9.jar
pyrolite/4.30//pyrolite-4.30.jar
Expand Down
4 changes: 2 additions & 2 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@
<!-- note that this should be compatible with Kafka brokers version 0.10 and up -->
<kafka.version>2.6.0</kafka.version>
<derby.version>10.12.1.1</derby.version>
<parquet.version>1.10.1</parquet.version>
<parquet.version>1.12.0</parquet.version>
<orc.version>1.6.11</orc.version>
<jetty.version>9.4.36.v20210114</jetty.version>
<jakartaservlet.version>4.0.3</jakartaservlet.version>
Expand Down Expand Up @@ -2131,7 +2131,7 @@
<groupId>${hive.group}</groupId>
<artifactId>hive-service-rpc</artifactId>
</exclusion>
<!-- parquet-hadoop-bundle:1.8.1 conflict with 1.10.1 -->
<!-- parquet-hadoop-bundle:1.8.1 conflict with 1.12.0 -->
<exclusion>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-hadoop-bundle</artifactId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1529,7 +1529,7 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
Seq(tbl, ext_tbl).foreach { tblName =>
sql(s"INSERT INTO $tblName VALUES (1, 'a', '2019-12-13')")

val expectedSize = 601
val expectedSize = 657
// analyze table
sql(s"ANALYZE TABLE $tblName COMPUTE STATISTICS NOSCAN")
var tableStats = getTableStats(tblName)
Expand Down

0 comments on commit 25a53f1

Please sign in to comment.