Skip to content

Commit

Permalink
GH-459: Add Variant logical type annotation (#460)
Browse files Browse the repository at this point in the history
* Add Variant as logical type

* Update shredding to use

* Revert changes to the spec

* Update logical types Variant details

* Clarify BYTE_ARRAY and remove underscore fields
  • Loading branch information
gene-db authored Nov 6, 2024
1 parent 1d81b7a commit dff0b3e
Show file tree
Hide file tree
Showing 2 changed files with 44 additions and 0 deletions.
36 changes: 36 additions & 0 deletions LogicalTypes.md
Original file line number Diff line number Diff line change
Expand Up @@ -563,6 +563,42 @@ defined by the [BSON specification][bson-spec].

The sort order used for `BSON` is unsigned byte-wise comparison.

### VARIANT

`VARIANT` is used for a Variant value. It must annotate a group. The group must
contain a field named `metadata` and a field named `value`. Both fields must have
type `binary`, which is also called `BYTE_ARRAY` in the Parquet thrift definition.
The `VARIANT` annotated group can be used to store either an unshredded Variant
value, or a shredded Variant value.

* The Variant group must be annotated with the `VARIANT` logical type.
* Both fields `value` and `metadata` must be of type `binary` (called `BYTE_ARRAY`
in the Parquet thrift definition).
* The `metadata` field is required and must be a valid Variant metadata component,
as defined by the [Variant binary encoding specification](VariantEncoding.md).
* When present, the `value` field must be a valid Variant value component,
as defined by the [Variant binary encoding specification](VariantEncoding.md).
* The `value` field is required for unshredded Variant values.
* The `value` field is optional and may be null only when parts of the Variant
value are shredded according to the [Variant shredding specification](VariantShredding.md).

This is the expected representation of an unshredded Variant in Parquet:
```
optional group variant_unshredded (VARIANT) {
required binary metadata;
required binary value;
}
```

This is an example representation of a shredded Variant in Parquet:
```
optional group variant_shredded (VARIANT) {
required binary metadata;
optional binary value;
optional int64 typed_value;
}
```

## Nested Types

This section specifies how `LIST` and `MAP` can be used to encode nested types
Expand Down
8 changes: 8 additions & 0 deletions src/main/thrift/parquet.thrift
Original file line number Diff line number Diff line change
Expand Up @@ -380,6 +380,12 @@ struct JsonType {
struct BsonType {
}

/**
* Embedded Variant logical type annotation
*/
struct VariantType {
}

/**
* LogicalType annotations to replace ConvertedType.
*
Expand Down Expand Up @@ -410,6 +416,7 @@ union LogicalType {
13: BsonType BSON // use ConvertedType BSON
14: UUIDType UUID // no compatible ConvertedType
15: Float16Type FLOAT16 // no compatible ConvertedType
16: VariantType VARIANT // no compatible ConvertedType
}

/**
Expand Down Expand Up @@ -980,6 +987,7 @@ union ColumnOrder {
* ENUM - unsigned byte-wise comparison
* LIST - undefined
* MAP - undefined
* VARIANT - undefined
*
* In the absence of logical types, the sort order is determined by the physical type:
* BOOLEAN - false, true
Expand Down

0 comments on commit dff0b3e

Please sign in to comment.