Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table features interfaces/utilities for Kernel #4

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

vkorukanti
Copy link
Owner

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

How was this patch tested?

Does this PR introduce any user-facing changes?

Copy link

@scottsand-db scottsand-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Super detailed and thorough. Thanks for this! Left some comments.

true);

/** Whether commands modifying this Delta table are allowed to create new deletion vectors. */
public static final TableConfig<Boolean> ENABLE_DELETION_VECTORS_CREATION =

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would it be nice if all of our table-feature-enablement-related TableConfigs ended with _ENABLED?

e.g. APPEND_ONLY_ENABLED, CHANGE_DATA_FEED_ENABLED, DELETION_VECTORS_CREATION_ENBABLED

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

*
* <p>Separately, a feature can be automatically supported by a table's metadata when certain
* feature-specific table properties are set. For example, `changeDataFeed` is automatically
* supported when there's a table property `delta.enableChangeDataFeed=true`. See {@link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean that in the metadata / protocol that is written to the lob, delta.enableChangeDataFeed=true is in the metadata (table property) but changeDataFeed is not in the protocol's reader/writer features?

so do we just, implicitly, when we create the Protocol action, we add that changeDataFeed reader/write feature?

and then do we write out the new-and-improved-and-corrected Protocol next time we write?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean that in the metadata / protocol that is written to the lob, delta.enableChangeDataFeed=true is in the metadata (table property) but changeDataFeed is not in the protocol's reader/writer features?

Thats is correct.

so do we just, implicitly, when we create the Protocol action, we add that changeDataFeed reader/write feature?

Yep, if all the metadata requirements are satisfied.

and then do we write out the new-and-improved-and-corrected Protocol next time we write?

Yes

* @param featureName a globally-unique string indicator to represent the feature. All characters
* must be letters (a-z, A-Z), digits (0-9), '-', or '_'. Words must be in camelCase.
* @param minReaderVersion the minimum reader version this feature requires. For a feature that
* can only be explicitly supported, this is either `0` or `3` (the reader protocol version

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also remind us what 0 means -->

0 (this is a writer feature, not a reader feature) or 3 (the reader protocol ....)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calrified.

checkArgument(
featureName.chars().allMatch(c -> Character.isLetterOrDigit(c) || c == '-' || c == '_'),
"name contains invalid characters: " + featureName);
this.minReaderVersion = minReaderVersion;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkArgument >= 0?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

featureName.chars().allMatch(c -> Character.isLetterOrDigit(c) || c == '-' || c == '_'),
"name contains invalid characters: " + featureName);
this.minReaderVersion = minReaderVersion;
this.minWriterVersion = minWriterVersion;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the min? checkArgument >= 2? or whaterver it is supposed to be

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to have writer version 1.

}
}

public static final TableFeature VARIANT_FEATURE = new VariantTypeTableFeature("variantType");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are your thoughts on VARIANT_RW_FEATURE?

and then DOMAIN_METADATA_W_FEATURE ?

i.e. include the readerWriter or writer-ness in the variable name itself? this could help throughout the code to re-enforce understanding of the features?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no prefernece. Added the prefixes.

// can be explicitly added to the protocol's `readerFeatures`
supportsReaderFeatures();

boolean shouldAddToWriterFeatures = supportsWriterFeatures();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we need to check if the feature is a writer feature?

* Determine whether this protocol can be safely upgraded to a new protocol `to`. This means: -
* all features supported by this protocol are supported by `to`.
*
* <p>Examples regarding feature status: - from `[appendOnly]` to `[appendOnly]` => allowed. -

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the - bullet points should be on newlines

*/
public Protocol normalized() {
// Normalization can only be applied to table feature protocols.
if (!supportsWriterFeatures()) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about readerFeatures? is there a relationship between them? i.e. does supportsReaderFeatures imply supportsWriterFeatures?

* <p>For example: (1, 7, AppendOnly, Invariants, CheckConstraints) -> (1, 3) (3, 7, RowTracking)
* -> (1, 7, RowTracking)
*/
public Protocol normalized() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is normalize the right/best word here? wasn't obvious to me. what about minimized?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then maximized for denormalized

}

/** A base class for all table legacy reader-writer features. */
public abstract static class LegacyReaderWriterFeature extends TableFeature

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this implement ReaderWriterFeatureType?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it must? Otherwise isReaderWriterFeature is wrong?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, added it.

Comment on lines +92 to +94
public boolean isReaderWriterFeature() {
return this instanceof TableFeatures.ReaderWriterFeatureType;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a silly question, but do we need this interface at all? Can we just infer this from minReaderVersion?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in your doc you mentioned this might do a check and require that minReaderVersion is overriden to be non-zero?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I guess I see it is used below in validate. Still wondering what value the interface uses though and if we would just always use this method rather than checking instanceof

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a descriptive way. When you see a table feature and what is inheirting, we know it is a readerwriter or writer only feature.

we could depend on minReaderVersion = 0, but that's just hacky.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants