Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: normalize the join types #662

Merged
merged 12 commits into from
Aug 8, 2024
17 changes: 12 additions & 5 deletions proto/substrait/algebra.proto
Original file line number Diff line number Diff line change
Expand Up @@ -173,11 +173,12 @@ message JoinRel {
JOIN_TYPE_OUTER = 2;
JOIN_TYPE_LEFT = 3;
JOIN_TYPE_RIGHT = 4;
JOIN_TYPE_SEMI = 5;
JOIN_TYPE_ANTI = 6;
// This join is useful for nested sub-queries where we need exactly one record in output (or throw exception)
// See Section 3.2 of https://15721.courses.cs.cmu.edu/spring2018/papers/16-optimizer2/hyperjoins-btw2017.pdf
JOIN_TYPE_SINGLE = 7;
JOIN_TYPE_LEFT_SEMI = 5;
JOIN_TYPE_LEFT_ANTI = 6;
JOIN_TYPE_LEFT_SINGLE = 7;
JOIN_TYPE_RIGHT_SEMI = 8;
JOIN_TYPE_RIGHT_ANTI = 9;
EpsilonPrime marked this conversation as resolved.
Show resolved Hide resolved
JOIN_TYPE_RIGHT_SINGLE = 10;
}

substrait.extensions.AdvancedExtension advanced_extension = 10;
Expand Down Expand Up @@ -654,6 +655,8 @@ message HashJoinRel {
JOIN_TYPE_RIGHT_SEMI = 6;
JOIN_TYPE_LEFT_ANTI = 7;
JOIN_TYPE_RIGHT_ANTI = 8;
JOIN_TYPE_LEFT_SINGLE = 9;
JOIN_TYPE_RIGHT_SINGLE = 10;
}

substrait.extensions.AdvancedExtension advanced_extension = 10;
Expand Down Expand Up @@ -700,6 +703,8 @@ message MergeJoinRel {
JOIN_TYPE_RIGHT_SEMI = 6;
JOIN_TYPE_LEFT_ANTI = 7;
JOIN_TYPE_RIGHT_ANTI = 8;
JOIN_TYPE_LEFT_SINGLE = 9;
JOIN_TYPE_RIGHT_SINGLE = 10;
}

substrait.extensions.AdvancedExtension advanced_extension = 10;
Expand All @@ -726,6 +731,8 @@ message NestedLoopJoinRel {
JOIN_TYPE_RIGHT_SEMI = 6;
JOIN_TYPE_LEFT_ANTI = 7;
JOIN_TYPE_RIGHT_ANTI = 8;
JOIN_TYPE_LEFT_SINGLE = 9;
JOIN_TYPE_RIGHT_SINGLE = 10;
}

substrait.extensions.AdvancedExtension advanced_extension = 10;
Expand Down
13 changes: 8 additions & 5 deletions site/docs/relations/logical_relations.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,15 +222,18 @@ The join operation will combine two separate inputs into a single output, based

### Join Types

| Type | Description |
| ----- | ------------------------------------------------------------ |
| Type | Description |
| ----- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Inner | Return records from the left side only if they match the right side. Return records from the right side only when they match the left side. For each cross input match, return a record including the data from both sides. Non-matching records are ignored. |
| Outer | Return all records from both the left and right inputs. For each cross input match, return a record including the data from both sides. For any remaining non-match records, return the record from the corresponding input along with nulls for the opposite input. |
| Left | Return all records from the left input. For each cross input match, return a record including the data from both sides. For any remaining non-matching records from the left input, return the left record along with nulls for the right input. |
| Right | Return all records from the right input. For each cross input match, return a record including the data from both sides. For any remaining non-matching records from the right input, return the right record along with nulls for the left input. |
| Semi | Returns records from the left input. These are returned only if the records have a join partner on the right side. |
| Anti | Return records from the left input. These are returned only if the records do not have a join partner on the right side. |
| Single | Returns one join partner per entry on the left input. If more than one join partner exists, there are two valid semantics. 1) Only the first match is returned. 2) The system throws an error. If there is no match between the left and right inputs, NULL is returned. |
| Left Semi | Returns records from the left input. These are returned only if the records have a join partner on the right side. |
| Right Semi | Returns records from the right input. These are returned only if the records have a join partner on the left side. |
jacques-n marked this conversation as resolved.
Show resolved Hide resolved
| Left Anti | Return records from the left input. These are returned only if the records do not have a join partner on the right side. |
| Right Anti | Return records from the right input. These are returned only if the records do not have a join partner on the left side. |
| Left Single | Return all records from the left input with no join expansion. If at least one record from the right input matches the left, return one arbitrary matching record from the right input. For any left records without matching right records, return the left record along with nulls for the right input. Similar to a left outer join but only returns one right match at most. Useful for nested sub-queries where we need exactly one record in output (or throw exception). See Section 3.2 of https://15721.courses.cs.cmu.edu/spring2018/papers/16-optimizer2/hyperjoins-btw2017.pdf for more information. |
| Right Single | Same as left single except that the right and left inputs are switched. |


=== "JoinRel Message"
Expand Down
Loading