Read in the Iceberg metadata #28

Fokko · 2023-08-09T07:35:12Z

Every query in Iceberg starts with the metadata. This is the JSON file that's created at each commit on an Iceberg table.

There are two versions (number three is underway):

Describes Iceberg tables
Everything from version 1, with support for merge-on-read deletes.

What I would suggest is reading both V1 and V2 and merging them into a common structure in memory. This includes merging some fields:

schemas is optional in V1, and schema is removed in V2. For V1 only the current schema was kept, but for V2 all the historical schemas are preserved as well. When reading a V1 table, the schema from schema would be added to schemas, and it would set the current-schema-id to the newly added schema.
Same applies to partition-specs
When we read a V1 table, we'll add a main ref to the refs dict, pointing to the current snapshot.

There are also example manifests available from the Java repository: https://github.com/apache/iceberg/tree/master/core/src/test/resources

Ps. on a tangent, but related, I'm also thinking of creating a jsonschema, would that be helpful for rust?

The text was updated successfully, but these errors were encountered:

liurenjie1024 · 2023-08-09T08:48:47Z

@Fokko Thanks for writing up. I think we are quite close to defining table metadata. About json schema, I think the idea is quite great, and there exists some rust tools for it:
1.jsonschema crate can help to validate data and provide better error message.
2. this crate can help to generate code for us.

But I haven't used them before, so no idea how much help they can provide. cc @JanKaul @Xuanwo Any ideas.

JanKaul · 2023-08-09T13:47:46Z

If no one else is interested I could prepare a PR for the table metadata.

Regarding the jsonschema, I don't see the benefit of using jsonschema. Aren't we specifying the schema by defining the types that constitute the iceberg spec.

liurenjie1024 · 2023-08-10T02:36:18Z

Hi @JanKaul I think we are missing two parts before starting table metadata:

PartitionSpec
SortOrder

Both depend on #26 , after completing them, we can start with TableMetadata.

liurenjie1024 mentioned this issue Aug 12, 2023

feat: Table metadata #29

Merged

liurenjie1024 closed this as completed Aug 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read in the Iceberg metadata #28

Read in the Iceberg metadata #28

Fokko commented Aug 9, 2023

liurenjie1024 commented Aug 9, 2023 •

edited

Loading

JanKaul commented Aug 9, 2023

liurenjie1024 commented Aug 10, 2023

Read in the Iceberg metadata #28

Read in the Iceberg metadata #28

Comments

Fokko commented Aug 9, 2023

liurenjie1024 commented Aug 9, 2023 • edited Loading

JanKaul commented Aug 9, 2023

liurenjie1024 commented Aug 10, 2023

liurenjie1024 commented Aug 9, 2023 •

edited

Loading