-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
direct ingestion into the Online Store #1863
Comments
@vas28r13 Thanks for writing this up. Overall it makes sense to me.
How would you ensure that streaming processes/jobs stay in sync with feature views in the feature store? It seems like there are broadly two approaches
(1) adds less responsibility to Feast, but also makes it harder to keep jobs in sync. In (2) Feast would be able to launch new consuming jobs, take down old ones, or update the schema of events that the job processes.
Correct, we should have logic that ensures we arent overwriting newer data with older data. |
@woop great point! Although I like the flexibility with (approach 1), keeping the streaming processes/jobs on our end in sync with iterations on FeatureViews in the feature store would be an issue. |
Note that this has been implemented, but for online stores aside from Redis, there still remains extra wiring work to check event timestamps. This work kind of exists in adchia#1, but I paused it because it seemed like we'd cause materialization to slow down. One natural way to get around this would be to support multiple versions of the same data and then we pull the latest version in serving. This would only make sense though if we have some TTL logic like #1988 |
Problem
We'd like to be able to ingest data into the Online store from streaming sources. We think that supporting direct ingestion into the Online store could be lightweight solution to support ingesting data from streaming sources in the newer Feast versions.
In Feast v0.12, materialization from the Offline store is the way to get data into the Online store.
We believe we can manage the streaming processes on our end but would love to have a way to ingest data into the Online Store directly.
Use Case
a model process/service will get a stream of data and would need to keep track of the new/updated features in the Online feature store
Code example
feature_store_object.ingest_df("feature_view_name", dataframe_object)
Process 1
Process 2 (job)
Potential Issue
Since there are now 2 ways to ingest the data into the online store (materialization and direct),
online_write_batch
in the online store implementation needs to have logic to look at theevent_timestamp
since event time could be delayed. For example,Process 2
happens afterProcess 1
, but the event timestamp in the features ingested byProcess 1
are actually more recent so the features in Process 2 should not overwrite what is in the Online store even though it happens afterProcess 1
Let us know what you think. We implemented this locally for our use case but wonder if this is something that could be useful in the official Feast version as well.
The text was updated successfully, but these errors were encountered: