Skip to content

Commit

Permalink
started updating docs for mark join type
Browse files Browse the repository at this point in the history
  • Loading branch information
EpsilonPrime committed Jul 11, 2024
1 parent 22930e8 commit 0f615a9
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 38 deletions.
19 changes: 4 additions & 15 deletions proto/substrait/algebra.proto
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,7 @@ message JoinRel {
// This join is useful for nested sub-queries where we need exactly one record in output (or throw exception)
// See Section 3.2 of https://15721.courses.cs.cmu.edu/spring2018/papers/16-optimizer2/hyperjoins-btw2017.pdf
JOIN_TYPE_SINGLE = 7;
JOIN_TYPE_MARK = 10;
}

substrait.extensions.AdvancedExtension advanced_extension = 10;
Expand Down Expand Up @@ -450,7 +451,6 @@ message Rel {
HashJoinRel hash_join = 13;
MergeJoinRel merge_join = 14;
NestedLoopJoinRel nested_loop_join = 18;
MarkJoinRel mark_join = 23;
ConsistentPartitionWindowRel window = 17;
ExchangeRel exchange = 15;
ExpandRel expand = 16;
Expand Down Expand Up @@ -655,6 +655,7 @@ message HashJoinRel {
JOIN_TYPE_RIGHT_SEMI = 6;
JOIN_TYPE_LEFT_ANTI = 7;
JOIN_TYPE_RIGHT_ANTI = 8;
JOIN_TYPE_MARK = 10;
}

substrait.extensions.AdvancedExtension advanced_extension = 10;
Expand Down Expand Up @@ -701,6 +702,7 @@ message MergeJoinRel {
JOIN_TYPE_RIGHT_SEMI = 6;
JOIN_TYPE_LEFT_ANTI = 7;
JOIN_TYPE_RIGHT_ANTI = 8;
JOIN_TYPE_MARK = 10;
}

substrait.extensions.AdvancedExtension advanced_extension = 10;
Expand All @@ -727,25 +729,12 @@ message NestedLoopJoinRel {
JOIN_TYPE_RIGHT_SEMI = 6;
JOIN_TYPE_LEFT_ANTI = 7;
JOIN_TYPE_RIGHT_ANTI = 8;
JOIN_TYPE_MARK = 10;
}

substrait.extensions.AdvancedExtension advanced_extension = 10;
}

// A mark join internally scans the left side, constructing a hash table that
// is used to mark the right side as having a join partner on the left side. A
// mark is a nullable boolean field. The mark join operator is used to
// implement semi-joins, anti-joins, and other join types that are not equijoins.
message MarkJoinRel {
RelCommon common = 1;
Rel left = 2;
Rel right = 3;
// optional, defaults to true (a cartesian join)
Expression expression = 4;

substrait.extensions.AdvancedExtension advanced_extension = 10;
}

// The argument of a function
message FunctionArgument {
oneof arg_type {
Expand Down
1 change: 1 addition & 0 deletions site/docs/relations/logical_relations.md
Original file line number Diff line number Diff line change
Expand Up @@ -231,6 +231,7 @@ The join operation will combine two separate inputs into a single output, based
| Semi | Returns records from the left input. These are returned only if the records have a join partner on the right side. |
| Anti | Return records from the left input. These are returned only if the records do not have a join partner on the right side. |
| Single | Returns one join partner per entry on the left input. If more than one join partner exists, there are two valid semantics. 1) Only the first match is returned. 2) The system throws an error. If there is no match between the left and right inputs, NULL is returned. |
| Mark | Returns one record for each of the left inputs??? |


=== "JoinRel Message"
Expand Down
23 changes: 0 additions & 23 deletions site/docs/relations/physical_relations.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,29 +71,6 @@ The merge equijoin does a join by taking advantage of two sets that are sorted o
| Post Join Predicate | An additional expression that can be used to reduce the output of the join operation post the equality condition. Minimizes the overhead of secondary join conditions that cannot be evaluated using the equijoin keys. | Optional, defaults true. |
| Join Type | One of the join types defined in the Join operator. | Required |



## Mark Join Operator

A mark join internally scans the left side, constructing a hash table that is used to mark the right side as having a join partner on the left side. This mark can end up being True, False, or NULL. The NULL mark is used to indicate that the right side does not have a join partner on the left side. The mark join operator is used to implement semi-joins, anti-joins, and other join types that are not equijoins.

| Signature | Value |
| -------------------- | ------------------------------------------------------------ |
| Inputs | 2 |
| Outputs | 1 |
| Property Maintenance | Distribution is maintained. Orderedness is eliminated. |
| Direct Output Order | Same as the [Join](logical_relations.md#join-operator) operator. |

### Mark Join Properties

| Property | Description | Required |
|-----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------|
| Left Input | A relational input. | Required |
| Right Input | A relational input. | Required |
| Join Expression | A boolean condition that describes whether each record from the left set "match" the record from the right set. Field references correspond to the direct output order of the data. | Required. Can be (but not expected to be) the literal True. |



## Exchange Operator

The exchange operator will redistribute data based on an exchange type definition. Applying this operation will lead to an output that presents the desired distribution.
Expand Down

0 comments on commit 0f615a9

Please sign in to comment.