Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add storage interfaces, basic file structure #529

Merged
merged 9 commits into from
Mar 18, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,6 @@
*/
package feast.ingestion.transform;

import static org.junit.Assert.*;

import feast.core.FeatureSetProto.EntitySpec;
import feast.core.FeatureSetProto.FeatureSet;
import feast.core.FeatureSetProto.FeatureSetSpec;
Expand Down
2 changes: 2 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@
<module>core</module>
<module>serving</module>
<module>sdk/java</module>
<module>storage/api</module>
<module>storage/connectors</module>
</modules>

<properties>
Expand Down
72 changes: 72 additions & 0 deletions storage/api/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

<parent>
<groupId>dev.feast</groupId>
<artifactId>feast-parent</artifactId>
<version>${revision}</version>
<relativePath>../..</relativePath>
</parent>

<modelVersion>4.0.0</modelVersion>
<artifactId>feast-storage-api</artifactId>

<name>Feast Storage API</name>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<configuration>
<!-- Required for generated code to compile; annotations are common false positive -->
<ignoredUnusedDeclaredDependencies>
javax.annotation
</ignoredUnusedDeclaredDependencies>
</configuration>
</plugin>
</plugins>
</build>

<dependencies>

<dependency>
<groupId>dev.feast</groupId>
<artifactId>datatypes-java</artifactId>
<version>${project.version}</version>
</dependency>

<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-core</artifactId>
<version>${org.apache.beam.version}</version>
</dependency>

<dependency>
<groupId>com.google.auto.value</groupId>
<artifactId>auto-value-annotations</artifactId>
<version>1.6.6</version>
</dependency>

<dependency>
<groupId>com.google.auto.value</groupId>
<artifactId>auto-value</artifactId>
<version>1.6.6</version>
<scope>provided</scope>
ches marked this conversation as resolved.
Show resolved Hide resolved
</dependency>

<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.9</version>
</dependency>

<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can omit the JUnit version from all of these, 4.12 is the one applied by our <dependencyManagement> via the imported one from Spring.

Apache commons-lang3 is also there, at version 3.7.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha

<scope>test</scope>
</dependency>
</dependencies>

</project>
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
/*
* SPDX-License-Identifier: Apache-2.0
* Copyright 2018-2020 The Feast Authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* https://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package feast.storage.api.retrieval;

import feast.serving.ServingAPIProto;
import java.util.List;

/** Interface for implementing user defined retrieval functionality from Batch/historical stores. */
public interface BatchRetriever {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: "Interface for implementing" is redundant, it is (statically) an interface and those are always for implementing ☺️

Personally I find it helpful, as a reader and writer, to read/think in terms of the abstraction that's represented, as a noun—what an instance would be/do. Some of the first sentences of thejava.nio.channels.Channel interface Javadoc, for instance:

A channel represents an open connection to an entity such as a hardware device, a file, a network socket, or a program component that is capable of performing one or more distinct I/O operations, for example reading or writing.

This is harder when you get into things that are architectural glue more than domain concepts. I try to defer defining those for as long as possible… we're necessarily near the edge of it here.

I'll take a riff:

A BatchRetriever fetches historical feature values from storage.

Thinking about it, I wonder if HistoricalRetriever is a more expressive name. It's really what conceptually—and technically—distinguishes it from online retrieval in Feast. "Offline" and "batch" are more overloaded, fuzzier terms. Something like Druid or various streaming SQL engines could make online historical usage feasible. Whether that has a place in Feast is beyond the subject here, but hopefully my point stands regardless.

Honestly I liked your first iteration with getFeatures naming and I agree with your instinct that single implementations like class SQLLiteRetriever implements BatchRetriever, OnlineRetriever are going to be the exception much more than the norm. And they would still be possible with the overloads, even elegant. It's subjective taste at that point though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it, I wonder if HistoricalRetriever is a more expressive name.

Makes sense. We used to call it historical serving but for some reason it just didn't have the right ring to it. It's definitely more accurate than simply batch.


/**
* Get all features corresponding to the provided batch features request.
*
* @param request {@link ServingAPIProto.GetBatchFeaturesRequest} containing requested features
* and file containing entity columns.
* @param featureSetRequests List of {@link FeatureSetRequest} to feature references in the
* request tied to that feature set.
* @return {@link ServingAPIProto.Job} if successful, contains the location of the results, else
* contains the error to be returned to the user.
*/
ServingAPIProto.Job getBatchFeatures(
ServingAPIProto.GetBatchFeaturesRequest request, List<FeatureSetRequest> featureSetRequests);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be controversial because it might imply many new classes that are very close to proto generated ones, and a translation layer somewhere with some tedious boilerplate, but I would really like to see "api" modules working with domain model types that are not proto wire format—like the newly-defined FeatureSetRequest in this PR as part of "api" is a step in the right direction to me, but it's essentially just a wrapper of ImmutableSet<FeatureReference> so I wonder if we could go further and use that directly here instead of a wrapper (although FeatureReference is still coming from proto, but maybe it's acceptable when they're more of a fundamental data model type and not request objects, DTOs, etc.).

Concretely I'm skeptical of ServingAPIProto.GetBatchFeaturesRequest and ServingAPIProto.Job at this layer: the former is a request object that should go no deeper than the RPC controller layer of serving IMO. Could the latter be served here by feast.core.model.Job? Granted that having storage-api depend on core is not ideal, but it may be that ubiquitous models that cross these new layers need to move (perhaps datatypes-java starts to carry source code beyond only generated, or "job" is really fundamentally an Ingestion domain concept and a Job model could be in an "ingestion-api" that helps bridge us from Core to Serving doing job management…).

It's a bit odd to me that ServingAPIProto.Jobis defined where it is currently anyhow, though I guess this might also be part of the inclination to move job management. And I acknowledge it might be justified to have different contracts for what a client making a historical query cares about status, versus more internal concerns of ingestion job management. But it's like a serving-specific View Model of the core domain model then. I digress, this is a tangent from this already rambling comment, but a good one to carry on elsewhere.

It feels redundant that this interface method has both the GetBatchFeaturesRequest and List<FeatureSetRequest> as arguments—there's substantial overlap isn't there? Seems like incidental complexity for the caller to provide both, all the information should be in the feature references. If we boil this down to essentials, forgetting any existing types that happen to be defined, I think the method requires these things:

  • Set/list of (qualified) feature references
  • Set of entities, paired with a datetime

(Do we also handle/support a query-supplied max age per feature set, for batch? Proto schemas allow it but Python API doesn't seem to).

As an API interface, what would it look like to simplify this to domain model and basic collection types?

Sorry for such a long and meandering comment…

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
/*
* SPDX-License-Identifier: Apache-2.0
* Copyright 2018-2019 The Feast Authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* https://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package feast.storage.api.retrieval;

import com.google.auto.value.AutoValue;
import com.google.common.collect.ImmutableSet;
import feast.core.FeatureSetProto.FeatureSetSpec;
import feast.serving.ServingAPIProto.FeatureReference;
import java.util.List;

@AutoValue
public abstract class FeatureSetRequest {
public abstract FeatureSetSpec getSpec();

public abstract ImmutableSet<FeatureReference> getFeatureReferences();

public static Builder newBuilder() {
return new AutoValue_FeatureSetRequest.Builder();
}

@AutoValue.Builder
public abstract static class Builder {
public abstract Builder setSpec(FeatureSetSpec spec);

abstract ImmutableSet.Builder<FeatureReference> featureReferencesBuilder();

public Builder addAllFeatureReferences(List<FeatureReference> featureReferenceList) {
featureReferencesBuilder().addAll(featureReferenceList);
return this;
}

public Builder addFeatureReference(FeatureReference featureReference) {
featureReferencesBuilder().add(featureReference);
return this;
}

public abstract FeatureSetRequest build();
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
/*
* SPDX-License-Identifier: Apache-2.0
* Copyright 2018-2020 The Feast Authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* https://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package feast.storage.api.retrieval;

import feast.serving.ServingAPIProto.GetOnlineFeaturesRequest.EntityRow;
import feast.types.FeatureRowProto.FeatureRow;
import java.util.List;

/** Interface for implementing user defined retrieval functionality from Online stores. */
public interface OnlineRetriever {

/**
* Get all values corresponding to the request.
*
* @param entityRows list of entity rows in the feature request
* @param featureSetRequests List of {@link FeatureSetRequest} to feature references in the
* request tied to that feature set.
* @return list of {@link OnlineRetrieverResponse} for each entity row
*/
List<List<FeatureRow>> getOnlineFeatures(
List<EntityRow> entityRows, List<FeatureSetRequest> featureSetRequests);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the types used in the signature here, in contrast to getBatchFeatures, lend support to my arguments there.

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
/*
* SPDX-License-Identifier: Apache-2.0
* Copyright 2018-2020 The Feast Authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* https://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package feast.storage.api.retrieval;

import feast.types.FeatureRowProto;

/** Response from an online store. */
public interface OnlineRetrieverResponse {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my earlier remark about FeatureSetRequest, I'm questioning the value at this layer of a wrapper over a simple collection of FeatureRows. They carry the information of what feature set they're associated with.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I think this is an artifact of a previous iteration that i forgot to remove.


/**
* Checks whether the response is empty, i.e. feature does not exist in the store
*
* @return boolean
*/
boolean isEmpty();

/**
* Get the featureset associated with this response.
*
* @return String featureset reference in format featureSet:version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've alluded to the same in TestUtil, but the references dearly need to become first-class types instead of strings, IMO. But again, something we should address outside this PR (but quite possibly before storage-refactor merges to mainline, their absence is bearing on these fundamental interfaces we're designing).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! It will be added in @mrzzy's PR #548, in proto format, at least.

*/
String getFeatureSet();

/**
* Parse response to FeatureRow
*
* @return {@link FeatureRowProto.FeatureRow}
*/
FeatureRowProto.FeatureRow toFeatureRow();
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
/*
* SPDX-License-Identifier: Apache-2.0
* Copyright 2018-2020 The Feast Authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* https://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package feast.storage.api.write;

import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.PDone;

/** Interface for for implementing user defined deadletter sinks to write failed elements to. */
public interface DeadletterSink {

/**
* Set up the deadletter sink for writes. This method will be called once during pipeline
* initialisation.
*/
void prepareWrite();

/**
* Get a {@link PTransform} that writes a collection of FailedElements to the deadletter sink.
*
* @return {@link PTransform}
*/
PTransform<PCollection<FailedElement>, PDone> write();
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
/*
* SPDX-License-Identifier: Apache-2.0
* Copyright 2018-2019 The Feast Authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* https://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package feast.storage.api.write;

import com.google.auto.value.AutoValue;
import javax.annotation.Nullable;
import org.apache.beam.sdk.schemas.AutoValueSchema;
import org.apache.beam.sdk.schemas.annotations.DefaultSchema;
import org.joda.time.Instant;

@AutoValue
// Use DefaultSchema annotation so this AutoValue class can be serialized by Beam
// https://issues.apache.org/jira/browse/BEAM-1891
// https://github.com/apache/beam/pull/7334
@DefaultSchema(AutoValueSchema.class)
public abstract class FailedElement {
public abstract Instant getTimestamp();

@Nullable
public abstract String getJobName();

@Nullable
public abstract String getProjectName();

@Nullable
public abstract String getFeatureSetName();

@Nullable
public abstract String getFeatureSetVersion();

@Nullable
public abstract String getTransformName();

@Nullable
public abstract String getPayload();

@Nullable
public abstract String getErrorMessage();

@Nullable
public abstract String getStackTrace();

public static Builder newBuilder() {
return new AutoValue_FailedElement.Builder().setTimestamp(Instant.now());
}

@AutoValue.Builder
public abstract static class Builder {
public abstract Builder setTimestamp(Instant timestamp);

public abstract Builder setProjectName(String projectName);

public abstract Builder setFeatureSetName(String featureSetName);

public abstract Builder setFeatureSetVersion(String featureSetVersion);

public abstract Builder setJobName(String jobName);

public abstract Builder setTransformName(String transformName);

public abstract Builder setPayload(String payload);

public abstract Builder setErrorMessage(String errorMessage);

public abstract Builder setStackTrace(String stackTrace);

public abstract FailedElement build();
}
}
Loading