Skip to content

Commit

Permalink
Updated documentation to be more clear, removed the mark reference as…
Browse files Browse the repository at this point in the history
… the act of marking is internal to the join.
  • Loading branch information
EpsilonPrime committed Jul 11, 2024
1 parent a0c23e6 commit 22930e8
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 7 deletions.
9 changes: 4 additions & 5 deletions proto/substrait/algebra.proto
Original file line number Diff line number Diff line change
Expand Up @@ -732,18 +732,17 @@ message NestedLoopJoinRel {
substrait.extensions.AdvancedExtension advanced_extension = 10;
}

// A mark join utilizes a previously applied mark to greatly reduce the
// input to be processed. A mark is a nullable boolean field.
// A mark join internally scans the left side, constructing a hash table that
// is used to mark the right side as having a join partner on the left side. A
// mark is a nullable boolean field. The mark join operator is used to
// implement semi-joins, anti-joins, and other join types that are not equijoins.
message MarkJoinRel {
RelCommon common = 1;
Rel left = 2;
Rel right = 3;
// optional, defaults to true (a cartesian join)
Expression expression = 4;

// A reference to the mark field.
Expression.FieldReference mark_field = 6;

substrait.extensions.AdvancedExtension advanced_extension = 10;
}

Expand Down
3 changes: 1 addition & 2 deletions site/docs/relations/physical_relations.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ The merge equijoin does a join by taking advantage of two sets that are sorted o

## Mark Join Operator

A mark join utilizes a previously applied mark to greatly reduce the input to be processed.
A mark join internally scans the left side, constructing a hash table that is used to mark the right side as having a join partner on the left side. This mark can end up being True, False, or NULL. The NULL mark is used to indicate that the right side does not have a join partner on the left side. The mark join operator is used to implement semi-joins, anti-joins, and other join types that are not equijoins.

| Signature | Value |
| -------------------- | ------------------------------------------------------------ |
Expand All @@ -90,7 +90,6 @@ A mark join utilizes a previously applied mark to greatly reduce the input to be
|-----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------|
| Left Input | A relational input. | Required |
| Right Input | A relational input. | Required |
| Mark Reference | A nullable boolean field reference that is used to filter the right input. If the mark is null, the row is not included in the join. | Required. |
| Join Expression | A boolean condition that describes whether each record from the left set "match" the record from the right set. Field references correspond to the direct output order of the data. | Required. Can be (but not expected to be) the literal True. |


Expand Down

0 comments on commit 22930e8

Please sign in to comment.