Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feast Metadata #363

Closed
zhilingc opened this issue Dec 13, 2019 · 9 comments
Closed

Feast Metadata #363

zhilingc opened this issue Dec 13, 2019 · 9 comments

Comments

@zhilingc
Copy link
Collaborator

zhilingc commented Dec 13, 2019

Currently there is no distinction between user set values and system information about a feature set.

Proposing to add a larger FeatureSet object that holds both the user-specified spec and system-populated metadata of the feature set.

message FeatureSet {
    // User-specified specifications of this feature set.
    FeatureSetSpec spec = 1;

    // System-populated metadata for this feature set.
    FeatureSetMeta metadata = 2;
}

message FeatureSetSpec {
    // same
}

message FeatureSetMeta {
    // Created timestamp of this specific feature set.
    google.protobuf.Timestamp created_timestamp = 1;

    // Status of the feature set.
    // Used to indicate whether the feature set is ready for consumption or ingestion.
    // Currently supports 2 states:
    // 1) PENDING - A feature set is in pending state if Feast has not spun up the jobs
    // necessary to push rows for this feature set to stores subscribing to this feature set.
    // 2) READY - Feature set is ready for consumption or ingestion
    string status = 2;

    // .. project, etc
}
@woop
Copy link
Member

woop commented Dec 13, 2019

This is great, thank you. A few comments

  1. Should we change the metadata field name to meta for convenience?
  2. Is there a reason why status is a string and not an enum?

@woop
Copy link
Member

woop commented Dec 13, 2019

@ches @smadarasmi

@zhilingc
Copy link
Collaborator Author

  1. sure
  2. No special reason. It could be an enum as well

@woop
Copy link
Member

woop commented Dec 13, 2019

  1. sure
  2. No special reason. It could be an enum as well

I think overall it looks good. So fields would be mapped like

Spec

  • Any user provided (writable) fields or objects.
  • Name can only be provided once, and not updated.

Meta

  • Any system provided (read-only) fields or objects.
  • Version
  • Status
  • Project

The implication of this is that project would not typically be bundled inside the FeatureSet object when it is sent to ApplyFeatureSet, but would be external. This is currently how its being developed.

@ches
Copy link
Member

ches commented Dec 16, 2019

Separating system (meta)data from user specs makes sense to me, but I'm not sure about details of what's proposed if system-set versus user-set is the delineating factor.

Do we consider Project to be metadata? It's part of schema and changing the project of existing features/sets is probably disallowed, so I can see it in that sense, but aren't the features and entities of a feature set then very similar? Except that more members maybe be added to them over time. Project isn't a property or child of a feature set—closer to other way around—but project is set by the user, no? And it feels odd to me that some schema parts of the canonical address of a feature are "metadata" (project) and others are not (feature and entity names).

Also still thinking about the possibility of user metadata—I'm due to write up a feature request for this, but the rough idea I've mentioned in chat is for loosely-coupled extensibility of the system: possible use case examples are controlling replication selectively, tagging PII, etc.

That needs a more fleshed-out spec to take into account fully here, but I think the two are orthogonal enough that this should not need to wait for it, with possibly one exception: naming.

The example cases I've thought of so far for user metadata could be stored in the spec registry only, not a serving store. So in that sense I think calling it "metadata" is apt.

There could be meta fields of different types at different levels, but I worry it's a bit confusing (pseudo-IDL with nesting for brevity):

message FeatureSet {

    // User-specified specifications of this feature set.
    FeatureSetSpec spec {
        repeated EntitySpec entities;
        repeated FeatureSpec features;
        // etc.

        // User-specified custom metadata for this feature set.
        repeated SpecMeta meta;
    }

    // System-populated metadata for this feature set.
    FeatureSetMeta meta;
}

Thoughts? Maybe that isn't so bad. Or maybe there are better names for either one of them, but I believe I need to understand my questions at the start of this post better to suggest any.

Could we update the issue title to specify something like "system metadata", to distinguish for now?

@woop
Copy link
Member

woop commented Dec 17, 2019

Separating system (meta)data from user specs makes sense to me, but I'm not sure about details of what's proposed if system-set versus user-set is the delineating factor.

Do we consider Project to be metadata? It's part of schema and changing the project of existing features/sets is probably disallowed, so I can see it in that sense, but aren't the features and entities of a feature set then very similar? Except that more members maybe be added to them over time. Project isn't a property or child of a feature set—closer to other way around—but project is set by the user, no? And it feels odd to me that some schema parts of the canonical address of a feature are "metadata" (project) and others are not (feature and entity names).

Also still thinking about the possibility of user metadata—I'm due to write up a feature request for this, but the rough idea I've mentioned in chat is for loosely-coupled extensibility of the system: possible use case examples are controlling replication selectively, tagging PII, etc.

That needs a more fleshed-out spec to take into account fully here, but I think the two are orthogonal enough that this should not need to wait for it, with possibly one exception: naming.

The example cases I've thought of so far for user metadata could be stored in the spec registry only, not a serving store. So in that sense I think calling it "metadata" is apt.

There could be meta fields of different types at different levels, but I worry it's a bit confusing (pseudo-IDL with nesting for brevity):

message FeatureSet {

    // User-specified specifications of this feature set.
    FeatureSetSpec spec {
        repeated EntitySpec entities;
        repeated FeatureSpec features;
        // etc.

        // User-specified custom metadata for this feature set.
        repeated SpecMeta meta;
    }

    // System-populated metadata for this feature set.
    FeatureSetMeta meta;
}

Thoughts? Maybe that isn't so bad. Or maybe there are better names for either one of them, but I believe I need to understand my questions at the start of this post better to suggest any.

Could we update the issue title to specify something like "system metadata", to distinguish for now?

Perhaps the wording is a bit off in terms of the delineation. The proposal takes inspiration from Kubernetes manifests, where things like name and namespace are user provided but then become fixed metadata. A better delineation might be

  • Spec: Configuration used to describe the functional aspects of the feature set. Feature sets with identical specifications are functionally identical.
  • Meta: Non-functional or supplementary information about a feature set.

Project is part of the registration of a feature set and will become fixed metadata, just like the name. It has no bearing on functional aspects. Features and entities would. They are different in that copying and registering the spec into a different project with a different name shouldnt affect how it functions.

User coupled metadata is similar to how either annotations or labels work in Kube, so that would be metadata I believe.

@woop woop changed the title Add Feature set metadata Feast Metadata Dec 17, 2019
@woop
Copy link
Member

woop commented Dec 17, 2019

So to my final point: I don't see a need to differentiate system and user metadata. I think we should only have feature set metadata, but these could come from the system (version, status), or they could come from the user name, project, tag. At the end of the day this information will be propagated with the feature set wherever it goes throughout a Feast deployment. I don't see a strong case to split user and system metadata.

@woop
Copy link
Member

woop commented Dec 19, 2019

Update on this

We have decided to leave the following keys within the spec for the time being

  • Version
  • Status
  • Project

At least until we are sure that moving them out of the FeatureSetSpec message is necessary. Currently FeatureSetMeta will only contain the created timestamp and status.

@zhilingc
Copy link
Collaborator Author

Closing this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants