-
Notifications
You must be signed in to change notification settings - Fork 164
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: clarify the meaning of plans (#616)
#612 and #613 highlighted a number of ambiguities in plan interpretation. This should address those ambiguities. Co-authored-by: Ingo Müller <[email protected]>
- Loading branch information
1 parent
e4f5b68
commit c1553df
Showing
3 changed files
with
38 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,5 @@ | ||
arrange: | ||
- binary_serialization.md | ||
- text_serialization.md | ||
|
||
- basics.md | ||
- binary_serialization.md | ||
- text_serialization.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# Basics | ||
|
||
Substrait is designed to be serialized into various different formats. Currently we support a binary serialization for | ||
transmission of plans between programs (e.g. IPC or network communication) and a text serialization for debugging and human readability. Other formats may be added in the future. | ||
|
||
These formats serialize a collection of plans. Substrait does not define how a collection of plans is to be interpreted. | ||
For example, the following scenarios are all valid uses of a collection of plans: | ||
|
||
- A query engine receives a plan and executes it. It receives a collection of plans with a single root plan. The | ||
top-level node of the root plan defines the output of the query. Non-root plans may be included as common subplans | ||
which are referenced from the root plan. | ||
- A transpiler may convert plans from one dialect to another. It could take, as input, a single root plan. Then | ||
it could output a serialized binary containing multiple root plans. Each root plan is a representation of the | ||
input plan in a different dialect. | ||
- A distributed scheduler might expect 1+ root plans. Each root plan describes a different stage of computation. | ||
|
||
Libraries should make sure to thoroughly describe the way plan collections will be produced or consumed. | ||
|
||
## Root plans | ||
|
||
We often refer to query plans as a graph of nodes (typically a DAG unless the query is recursive). However, we | ||
encode this graph as a collection of trees with a single root tree that references other trees (which may also | ||
transitively reference other trees). Plan serializations all have some way to indicate which plan(s) are "root" | ||
plans. Any plan that is not a root plan and is not referenced (directly or transitively) by some root plan | ||
can safely be ignored. |