Skip to content

Commit

Permalink
Merge branch 'main' into extensions
Browse files Browse the repository at this point in the history
  • Loading branch information
jcamachor authored Jan 8, 2025
2 parents 9069630 + 7434e2f commit 72092fe
Show file tree
Hide file tree
Showing 2 changed files with 38 additions and 0 deletions.
24 changes: 24 additions & 0 deletions proto/substrait/algebra.proto
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ message ReadRel {
LocalFiles local_files = 6;
NamedTable named_table = 7;
ExtensionTable extension_table = 8;
IcebergTable iceberg_table = 9;
}

// A base table. The list of string is used to represent namespacing (e.g., mydb.mytable).
Expand All @@ -123,6 +124,29 @@ message ReadRel {
substrait.extensions.AdvancedExtension advanced_extension = 10;
}

// Read an Iceberg Table
message IcebergTable {
oneof table_type {
MetadataFileRead direct = 1;
// future: add catalog table types (e.g. rest api, latest metadata in path, etc)
}

// Read an Iceberg table using a metadata file. Implicit assumption: required credentials are already known by plan consumer.
message MetadataFileRead {
// the specific uri of a metadata file (e.g. s3://mybucket/mytable/<ver>-<uuid>.metadata.json)
string metadata_uri = 1;

// snapshot options. if none set, uses the current snapshot listed in the metadata file
oneof snapshot {
// the snapshot id to read.
string snapshot_id = 2;

// the timestamp that should be used to select the snapshot (Time passed in microseconds since 1970-01-01 00:00:00.000000 in UTC)
int64 snapshot_timestamp = 3;
}
}
}

// A table composed of expressions.
message VirtualTable {
repeated Expression.Literal.Struct values = 1 [deprecated = true];
Expand Down
14 changes: 14 additions & 0 deletions site/docs/relations/logical_relations.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,20 @@ possible approach is that a chunk should only be read if the midpoint of the chu
%%% proto.algebra.ReadRel %%%
```

#### Iceberg Table Type

A Iceberg Table is a table built on [Apache Iceberg](https://iceberg.apache.org/). Iceberg tables can be read by either directly reading a [metadata file](https://iceberg.apache.org/spec/#table-metadata) or by consulting a [catalog](https://iceberg.apache.org/concepts/catalog/).

##### Metadata File Reading

Points to an [Iceberg metadata file](https://iceberg.apache.org/spec/#table-metadata) and uses that as a starting point for reading an Iceberg table. This is the simplest form of Iceberg table access but should be limited to use for reads. (Writes often also need to update an external catalog.)

| Property | Description | Required |
| -------- | ---------------------------------------------------------------- | ----------------------- |
| metadata_uri | A URI for an Iceberg metadata file. This current snapshot will be read from this file. | Required |
| snapshot_id | The snapshot that should be read using id. If not provided, the current snapshot is read. Only one of snapshot_id or snapshot_timestamp should be set. | Optional |
| snapshot_timestamp | The snapshot that should be read using timestamp. If not provided, the current snapshot is read. | Optional |


## Filter Operation

Expand Down

0 comments on commit 72092fe

Please sign in to comment.