Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: define SetRel output nullability derivation (#558) #654

Merged
merged 7 commits into from
Jul 5, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 37 additions & 11 deletions site/docs/relations/logical_relations.md
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@ The set operation encompasses several set-level operations that support combinin
| Inputs | 2 or more |
| Outputs | 1 |
| Property Maintenance | Maintains distribution if all inputs have the same ordinal distribution. Orderedness is not maintained. |
| Direct Output Order | The field order of the inputs. All inputs must have identical fields. |
| Direct Output Order | The field order of the inputs. All inputs must have identical field *types*, but field nullabilities may vary. |

### Set Properties

Expand All @@ -261,14 +261,42 @@ The set operation encompasses several set-level operations that support combinin

### Set Operation Types

| Property | Description |
| ----------------------- | ------------------------------------------------------------ |
| Minus (Primary) | Returns the primary input excluding any matching records from secondary inputs. |
| Minus (Multiset) | Returns the primary input minus any records that are included in all sets. |
| Intersection (Primary) | Returns all rows primary rows that intersect at least one secondary input. |
| Intersection (Multiset) | Returns all rows that intersect at least one record from each secondary inputs. |
| Union Distinct | Returns all the records from each set, removing any rows that are duplicated (within or across sets). |
| Union All | Returns all records from each set, allowing duplicates. |
The set operation type determines both the records that are emitted and the type of the output record.

| Property | Description | Output Nullability
| ----------------------- | ------------------------------------------------------------------------------------------------------------- | ----------------------------- |
| Minus (Primary) | Returns all records from the primary input excluding any matching records from secondary inputs. | The same as the primary input.
| Minus (Multiset) | Returns all records from the primary input excluding any records that are included in *all* secondary inputs. | The same as the primary input.
| Intersection (Primary) | Returns all records from the primary input that match at least one record from *any* secondary inputs. | If a field is nullable in the primary input and in any of the secondary inputs, it is nullable in the output.
| Intersection (Multiset) | Returns all records from the primary input that match at least one record from *all* secondary inputs. | If a field is required in any of the inputs, it is required in the output.
| Union Distinct | Returns all the records from each set, removing any rows that are duplicated (within or across sets). | If a field is nullable in any of the inputs, it is nullable in the output.
| Union All | Returns all records from each set, allowing duplicates. | If a field is nullable in any of the inputs, it is nullable in the output. |

Note that for set operations, NULL matches NULL. That is
```
{NULL, 1, 3} MINUS {NULL, 2, 4} === (1), (3)
{NULL, 1, 3} INTERSECTION {NULL, 2, 3} === (NULL)
{NULL, 1, 3} UNION DISTINCT {NULL, 2, 4} === (NULL), (1), (2), (3), (4)
```

#### Output Type Derivation Examples
Given the following inputs, where R is Required and N is Nullable:
```
Input 1: (R, R, R, R, N, N, N, N) Primary Input
Input 2: (R, R, N, N, R, R, N, N) Secondary Input
Input 3: (R, N, R, N, R, N, R, N) Secondary Input
```

The output type is as follows for the various operations

| Property | Output Type
| ----------------------- | -----------------------------------------------------------------------------------------------------
| Minus (Primary) | (R, R, R, R, N, N, N, N)
| Minus (Multiset) | (R, R, R, R, N, N, N, N)
| Intersection (Primary) | (R, R, R, R, R, N, N, N)
| Intersection (Multiset) | (R, R, R, R, R, R, R, N)
| Union Distinct | (R, N, N, N, N, N, N, N)
| Union All | (R, N, N, N, N, N, N, N)


=== "SetRel Message"
Expand All @@ -289,8 +317,6 @@ The fetch operation eliminates records outside a desired window. Typically corre
| Property Maintenance | Maintains distribution and orderedness. |
| Direct Output Order | Unchanged from input. |



### Fetch Properties

| Property | Description | Required |
Expand Down