[Feature Request] Add support for Storage Partitioned Joins (SPJ) introduced in Spark v3.3 #1698

shrukul · 2023-04-18T17:12:05Z

Feature request

Feature request to support Storage Partitioned Joins (SPJ) introduced in Spark 3.3.

Overview

Spark 3.3 added support for Storage Partitioned Joins (SPJ). A partitioned join (or partition wise join) uses data partitions to split a join into a series of smaller independent joins.

Motivation

If two tables are partitioned by same (or subset?) set of columns, this feature can improve the performance of join/merge operations. For example - Two tables that are partitioned by hour could be joined hour-by-hour. This can especially be helpful for MERGE INTO operations.

This has the potential to benefit users that perform joins/merge on partitioned tables.

Further details

The Feature SPIP very well documents the feature.
Spark support Umbrella JIRA: https://issues.apache.org/jira/browse/SPARK-37375
YouTube demo: https://www.youtube.com/watch?v=ioLeHZDMSuU
Iceberg supports SPJ, PR: Spark 3.3: Support storage-partitioned joins apache/iceberg#6371

Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

Yes. I can contribute this feature independently.
Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
No. I cannot contribute this feature at this time.

The text was updated successfully, but these errors were encountered:

gzagarwal · 2023-10-31T07:14:30Z

Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.

hjohnss6 · 2024-01-16T20:13:12Z

Any more work on this? Would be very appreciated

shrukul added the enhancement New feature or request label Apr 18, 2023

melin mentioned this issue May 28, 2024

[Feature] Add support for Storage Partitioned Joins (SPJ) apache/paimon#3410

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add support for Storage Partitioned Joins (SPJ) introduced in Spark v3.3 #1698

[Feature Request] Add support for Storage Partitioned Joins (SPJ) introduced in Spark v3.3 #1698

shrukul commented Apr 18, 2023

gzagarwal commented Oct 31, 2023

hjohnss6 commented Jan 16, 2024

[Feature Request] Add support for Storage Partitioned Joins (SPJ) introduced in Spark v3.3 #1698

[Feature Request] Add support for Storage Partitioned Joins (SPJ) introduced in Spark v3.3 #1698

Comments

shrukul commented Apr 18, 2023

Feature request

Overview

Motivation

Further details

Willingness to contribute

gzagarwal commented Oct 31, 2023

hjohnss6 commented Jan 16, 2024