[Feature Request] Add support for Storage Partitioned Joins (SPJ) introduced in Spark v3.3 #1698
Open
1 of 3 tasks
Labels
enhancement
New feature or request
Feature request
Feature request to support Storage Partitioned Joins (SPJ) introduced in Spark 3.3.
Overview
Spark 3.3 added support for Storage Partitioned Joins (SPJ). A partitioned join (or partition wise join) uses data partitions to split a join into a series of smaller independent joins.
Motivation
If two tables are partitioned by same (or subset?) set of columns, this feature can improve the performance of join/merge operations. For example - Two tables that are partitioned by hour could be joined hour-by-hour. This can especially be helpful for
MERGE INTO
operations.This has the potential to benefit users that perform joins/merge on partitioned tables.
Further details
Willingness to contribute
The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?
The text was updated successfully, but these errors were encountered: