Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-459: Add Variant logical type annotation #460

Merged
merged 5 commits into from
Nov 6, 2024

Conversation

gene-db
Copy link
Contributor

@gene-db gene-db commented Oct 19, 2024

Rationale for this change

Add a variant logical type.

What changes are included in this PR?

Additions to the types thrift definition, and the description of the logical type. The actual Variant spec documents are unchanged, and will be addressed later in a separate PR.

Closes #459

LogicalTypes.md Outdated

* The top level must be a group annotated with `VARIANT` that contains a
`binary` field named `metadata`, and a `binary` field named `value`.
* Additional fields which start with `_` (underscore) can be ignored.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? None of the other types allow writing columns that should be ignored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was desired in case there were some additional (but redundant) metadata or values we might store, and still allow it to be a valid Variant value (group).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we want to add ignored columns. If we need to update the spec because something is missing, we should just do that directly instead of working around it with unspecified columns that only work in certain proprietary cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I was worried that future evolution could break existing stored Variants, but simply adding a new field with optional or redundant semantics achieves the same compatibility story. This is removed.

LogicalTypes.md Outdated Show resolved Hide resolved
LogicalTypes.md Outdated Show resolved Hide resolved
VariantEncoding.md Outdated Show resolved Hide resolved
/**
* Embedded Variant logical type annotation
*/
struct VariantType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Copy link
Contributor Author

@gene-db gene-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdblue Thanks! I updated the PR.

LogicalTypes.md Outdated Show resolved Hide resolved
LogicalTypes.md Outdated

* The top level must be a group annotated with `VARIANT` that contains a
`binary` field named `metadata`, and a `binary` field named `value`.
* Additional fields which start with `_` (underscore) can be ignored.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was desired in case there were some additional (but redundant) metadata or values we might store, and still allow it to be a valid Variant value (group).

LogicalTypes.md Outdated Show resolved Hide resolved
VariantEncoding.md Outdated Show resolved Hide resolved
@gene-db gene-db requested a review from rdblue October 22, 2024 15:45
LogicalTypes.md Outdated Show resolved Hide resolved
@rdblue
Copy link
Contributor

rdblue commented Oct 23, 2024

This looks close to me. I think we just need to fix two things:

  1. The use of binary vs BYTE_ARRAY that @wgtmac pointed out
  2. The addition of ignored fields starting with _

Copy link
Contributor Author

@gene-db gene-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdblue Thanks! I updated the PR.

LogicalTypes.md Outdated Show resolved Hide resolved
LogicalTypes.md Outdated

* The top level must be a group annotated with `VARIANT` that contains a
`binary` field named `metadata`, and a `binary` field named `value`.
* Additional fields which start with `_` (underscore) can be ignored.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I was worried that future evolution could break existing stored Variants, but simply adding a new field with optional or redundant semantics achieves the same compatibility story. This is removed.

@gene-db gene-db requested a review from wgtmac October 24, 2024 16:49
Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! This look good to me.

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

@aihuaxu aihuaxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@Fokko Fokko merged commit dff0b3e into apache:master Nov 6, 2024
3 checks passed
@Fokko
Copy link
Contributor

Fokko commented Nov 6, 2024

Thanks @gene-db for adding this, and thanks @rdblue @wgtmac @aihuaxu for the review 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Variant logical type annotation
5 participants