Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Add support for Storage Partitioned Joins (SPJ) introduced in Spark v3.3 #1698

Open
1 of 3 tasks
shrukul opened this issue Apr 18, 2023 · 2 comments
Open
1 of 3 tasks
Labels
enhancement New feature or request

Comments

@shrukul
Copy link

shrukul commented Apr 18, 2023

Feature request

Feature request to support Storage Partitioned Joins (SPJ) introduced in Spark 3.3.

Overview

Spark 3.3 added support for Storage Partitioned Joins (SPJ). A partitioned join (or partition wise join) uses data partitions to split a join into a series of smaller independent joins.

Motivation

If two tables are partitioned by same (or subset?) set of columns, this feature can improve the performance of join/merge operations. For example - Two tables that are partitioned by hour could be joined hour-by-hour. This can especially be helpful for MERGE INTO operations.

This has the potential to benefit users that perform joins/merge on partitioned tables.

Further details

  1. The Feature SPIP very well documents the feature.
  2. Spark support Umbrella JIRA: https://issues.apache.org/jira/browse/SPARK-37375
  3. YouTube demo: https://www.youtube.com/watch?v=ioLeHZDMSuU
  4. Iceberg supports SPJ, PR: Spark 3.3: Support storage-partitioned joins apache/iceberg#6371

Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

  • Yes. I can contribute this feature independently.
  • Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
  • No. I cannot contribute this feature at this time.
@shrukul shrukul added the enhancement New feature or request label Apr 18, 2023
@gzagarwal
Copy link

Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.

@hjohnss6
Copy link

Any more work on this? Would be very appreciated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants