Skip to content

Commit

Permalink
revert spec change as the general consensus is that file scan task JS…
Browse files Browse the repository at this point in the history
…ON serialization shouldn't be included in table spec.

we are still discussing removing the exisiting spec on file scan task
  • Loading branch information
stevenzwu committed Feb 21, 2024
1 parent 14c4cbe commit b8b581d
Showing 1 changed file with 11 additions and 31 deletions.
42 changes: 11 additions & 31 deletions format/spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -1237,37 +1237,17 @@ Content file (data or delete) is serialized as a JSON object according to the fo
| **`equality-ids`** |`JSON list of int: Field ids used to determine row equality in equality delete files`|`[1]`|
| **`sort-order-id`** |`JSON int`|`1`|

### Task Serialization

There are different task implementations, e.g. `BaseFileScanTask` and `StaticDataTask` in Java.
A `task-type` field is needed to distinguish different task types.

| Metadata field | JSON representation | Example |
|-----------------|---------------------|------------------------------------------------------------------------------------------------|
| **`task-type`** | `JSON string` | `file-scan-task`, `data-task`. Absence of this field should be interpreted as `file-scan-task` |

`file-scan-task` represents a scan task with a data file and
optional delete files that should be applied to the data file.
It is serialized according to the following table (in addition to the `task-type` field).

| Metadata field |JSON representation| Example |
|------------------------|--- |-----------------------------------------------------------|
| **`schema`** |`JSON object`| `See "Schemas" section above` |
| **`spec`** |`JSON object`| `See "Partition Specs" section above` |
| **`data-file`** |`JSON object`| `See "Content File" section above` |
| **`delete-files`** |`JSON list of objects`| `See "Content File" section above for delete file object` |
| **`residual-filter`** |`JSON object: residual filter expression`| `{"type":"eq","term":"id","value":1}` |

`data-task` represents a task with data rows embedded.
It is typically used for metadata tables rows (like manifests, snapshots, partitions etc.).
It is serialized according to the following table (in addition to the `task-type` field).

| Metadata field | JSON representation | Example |
|---------------------|-------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **`schema`** | `JSON object: row schema` | `See "Schemas" section above` |
| **`projection`** | `JSON object: read schema` | `See "Schemas" section above` |
| **`metadata-file`** | `JSON object: Iceberg root metadata file` | `See "Content File" section above` |
| **`rows`** | `JSON list of objects: each row is serialized via struct JSON single-value serialization below (using field ID as JSON field name)` | `[`<br />&nbsp;&nbsp;`{`<br />&nbsp;&nbsp;`"1": 2023-11-16T22:31:08.123456+00:00,`<br />&nbsp;&nbsp;`"2": 3051729675574597004,`<br />&nbsp;&nbsp;`"3": 12345678901234567,`<br />&nbsp;&nbsp;`"4": "append"`<br />&nbsp;&nbsp;`"5": "s3://b/wh/.../s1.avro"`<br />&nbsp;&nbsp;`}`<br />`]` |
### File Scan Task Serialization

File scan task is serialized as a JSON object according to the following table.

| Metadata field |JSON representation|Example|
|--------------------------|--- |--- |
| **`schema`** |`JSON object`|`See above, read schemas instead`|
| **`spec`** |`JSON object`|`See above, read partition specs instead`|
| **`data-file`** |`JSON object`|`See above, read content file instead`|
| **`delete-files`** |`JSON list of objects`|`See above, read content file instead`|
| **`residual-filter`** |`JSON object: residual filter expression`|`{"type":"eq","term":"id","value":1}`|

## Appendix D: Single-value serialization

Expand Down

0 comments on commit b8b581d

Please sign in to comment.