Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add storage interfaces, basic file structure #529
Add storage interfaces, basic file structure #529
Changes from all commits
2bf2fa3
966a14b
0411608
bef13ab
1bd20c5
0153cce
6db5e3e
d3861f6
3359ced
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can omit the JUnit version from all of these, 4.12 is the one applied by our
<dependencyManagement>
via the imported one from Spring.Apache commons-lang3 is also there, at version 3.7.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: "Interface for implementing" is redundant, it is (statically) an☺️
interface
and those are always for implementingPersonally I find it helpful, as a reader and writer, to read/think in terms of the abstraction that's represented, as a noun—what an instance would be/do. Some of the first sentences of the
java.nio.channels.Channel
interface Javadoc, for instance:This is harder when you get into things that are architectural glue more than domain concepts. I try to defer defining those for as long as possible… we're necessarily near the edge of it here.
I'll take a riff:
Thinking about it, I wonder if
HistoricalRetriever
is a more expressive name. It's really what conceptually—and technically—distinguishes it from online retrieval in Feast. "Offline" and "batch" are more overloaded, fuzzier terms. Something like Druid or various streaming SQL engines could make online historical usage feasible. Whether that has a place in Feast is beyond the subject here, but hopefully my point stands regardless.Honestly I liked your first iteration with
getFeatures
naming and I agree with your instinct that single implementations likeclass SQLLiteRetriever implements BatchRetriever, OnlineRetriever
are going to be the exception much more than the norm. And they would still be possible with the overloads, even elegant. It's subjective taste at that point though.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. We used to call it
historical serving
but for some reason it just didn't have the right ring to it. It's definitely more accurate than simplybatch
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be controversial because it might imply many new classes that are very close to proto generated ones, and a translation layer somewhere with some tedious boilerplate, but I would really like to see "api" modules working with domain model types that are not proto wire format—like the newly-defined
FeatureSetRequest
in this PR as part of "api" is a step in the right direction to me, but it's essentially just a wrapper ofImmutableSet<FeatureReference>
so I wonder if we could go further and use that directly here instead of a wrapper (althoughFeatureReference
is still coming from proto, but maybe it's acceptable when they're more of a fundamental data model type and not request objects, DTOs, etc.).Concretely I'm skeptical of
ServingAPIProto.GetBatchFeaturesRequest
andServingAPIProto.Job
at this layer: the former is a request object that should go no deeper than the RPC controller layer ofserving
IMO. Could the latter be served here byfeast.core.model.Job
? Granted that havingstorage-api
depend oncore
is not ideal, but it may be that ubiquitous models that cross these new layers need to move (perhapsdatatypes-java
starts to carry source code beyond only generated, or "job" is really fundamentally an Ingestion domain concept and aJob
model could be in an "ingestion-api" that helps bridge us from Core to Serving doing job management…).It's a bit odd to me that
ServingAPIProto.Job
is defined where it is currently anyhow, though I guess this might also be part of the inclination to move job management. And I acknowledge it might be justified to have different contracts for what a client making a historical query cares about status, versus more internal concerns of ingestion job management. But it's like a serving-specific View Model of the core domain model then. I digress, this is a tangent from this already rambling comment, but a good one to carry on elsewhere.It feels redundant that this interface method has both the
GetBatchFeaturesRequest
andList<FeatureSetRequest>
as arguments—there's substantial overlap isn't there? Seems like incidental complexity for the caller to provide both, all the information should be in the feature references. If we boil this down to essentials, forgetting any existing types that happen to be defined, I think the method requires these things:(Do we also handle/support a query-supplied max age per feature set, for batch? Proto schemas allow it but Python API doesn't seem to).
As an API interface, what would it look like to simplify this to domain model and basic collection types?
Sorry for such a long and meandering comment…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the types used in the signature here, in contrast to
getBatchFeatures
, lend support to my arguments there.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to my earlier remark about
FeatureSetRequest
, I'm questioning the value at this layer of a wrapper over a simple collection ofFeatureRow
s. They carry the information of what feature set they're associated with.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. I think this is an artifact of a previous iteration that i forgot to remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've alluded to the same in
TestUtil
, but the references dearly need to become first-class types instead of strings, IMO. But again, something we should address outside this PR (but quite possibly beforestorage-refactor
merges to mainline, their absence is bearing on these fundamental interfaces we're designing).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! It will be added in @mrzzy's PR #548, in proto format, at least.