Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Adding support to return additional features from vector retrieval for Milvus db #4971

Merged
merged 5 commits into from
Jan 28, 2025

Conversation

franciscojavierarceo
Copy link
Member

@franciscojavierarceo franciscojavierarceo commented Jan 27, 2025

What this PR does / why we need it:

This PR adds support for returning multiple fields when doing vector similarity search with Milvus.

The behavior is now as simple as:

# Define your FeatureView
document_embeddings = FeatureView(
    name="embedded_documents",
    entities=[item, author],
    schema=[
       # notice that we define key index and metric search information at the field level
        Field(
            name="vector",
            dtype=Array(Float32),
            vector_index=True,
            vector_search_metric="COSINE",
        ),
        Field(name="item_id", dtype=Int64),
        Field(name="author_id", dtype=String),
        Field(name="sentence_chunks", dtype=String),
        Field(name="created_timestamp", dtype=UnixTimestamp),
        Field(name="event_timestamp", dtype=UnixTimestamp),
    ],
    source=rag_documents_source,
    ttl=timedelta(hours=24),
)

# Get the documents
result = store.retrieve_online_documents_v2(
    features=[
        "embedded_documents:vector",
        "embedded_documents:item_id",
        "embedded_documents:author_id",
        "embedded_documents:sentence_chunks",
    ],
    query=query_embedding,
    top_k=3,
).to_dict()
  1. Feature Store Enhancements

    • Added retrieve_online_documents_v2 method to FeatureStore for improved document retrieval using embeddings.
    • Modified retrieve_online_documents to enhance performance and support additional features.
    • Introduced _retrieve_from_online_store_v2 to support the new document retrieval method internally.
    • Added _get_feature_view_vector_field_metadata to fetch metadata for vector fields within a FeatureView.
  2. Utility Functions

    • Added _get_unique_entities_from_values to obtain unique composite entities and their indexes.
    • Introduced _populate_response_from_feature_data_v2 to populate responses with feature data.
    • Enhanced _extract_proto_values_to_dict to handle serialization to string and vector columns.
  3. Provider and Online Store Updates

    • Updated PassthroughProvider and Provider interfaces to include retrieve_online_documents_v2.
    • Enhanced MilvusOnlineStore and OnlineStore with support for the new document retrieval method and vector field handling.
    • Modified MilvusOnlineStoreConfig to default to COSINE metric type for vector searches.
  4. Testing and Example Repositories

    • Updated test cases in test_online_retrieval.py to include scenarios for the new document retrieval method.
    • Enhanced example repositories to demonstrate the use of vector fields and the new retrieval method.

Key Modifications

  • sdk/python/feast/feature_store.py

    • Added Field import.
    • Introduced retrieve_online_documents_v2 method for enhanced document retrieval.
    • Updated _retrieve_from_online_store to handle vector field metadata.
    • Added _retrieve_from_online_store_v2 for internal support of the new retrieval method.
  • sdk/python/feast/infra/online_stores/milvus_online_store/milvus.py

    • Added support for deserializing entity keys.
    • Modified _get_collection to handle vector fields.
    • Enhanced online_write_batch to serialize vector fields and handle entity keys.
    • Introduced retrieve_online_documents_v2 for Milvus online store.
  • sdk/python/feast/infra/online_stores/online_store.py

    • Added retrieve_online_documents_v2 method for online document retrieval using embeddings.
  • sdk/python/feast/infra/passthrough_provider.py

    • Added retrieve_online_documents_v2 method to passthrough provider.
  • sdk/python/feast/infra/provider.py

    • Added retrieve_online_documents_v2 abstract method to provider interface.
  • sdk/python/feast/utils.py

    • Introduced _get_unique_entities_from_values utility function.
    • Added _populate_response_from_feature_data_v2 for populating response with feature data.
    • Enhanced _extract_proto_values_to_dict to handle vector columns and serialization.
  • sdk/python/tests/example_repos/example_rag_feature_repo.py

    • Added author_id entity and updated schema to include vector fields.
  • sdk/python/tests/foo_provider.py

    • Added retrieve_online_documents_v2 method to the provider.
  • sdk/python/tests/unit/online_store/test_online_retrieval.py

    • Updated test cases to include scenarios for retrieve_online_documents_v2.

Which issue(s) this PR fixes:

Another one for #4364

Misc

N/A

…val of features from vector similarity search

Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
@franciscojavierarceo
Copy link
Member Author

@HaoXuAI take a look at this PR, I've moved the retrieval implementation to retrieve_online_documents_v2 instead of using retrieve_online_documents.

My recommendation is we eventually wipe both of these methods and embed retrieve_online_documents_v2 in get_online_features and have retrieve_online_documents call get_online_features.

Based on the Milvus documentation and allowing more features to be returned, it makes sense to just incorporate this into get_online_features in my opinion.

@@ -89,7 +92,7 @@ class MilvusOnlineStoreConfig(FeastConfigBaseModel, VectorStoreConfig):
host: Optional[StrictStr] = "localhost"
port: Optional[int] = 19530
index_type: Optional[str] = "FLAT"
metric_type: Optional[str] = "L2"
metric_type: Optional[str] = "COSINE"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to make it an enum now since there are multiple values

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sounds good, mind if I do that in a follow up PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, for sure

Copy link
Collaborator

@shuchu shuchu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@franciscojavierarceo franciscojavierarceo merged commit 6ce08d3 into master Jan 28, 2025
26 checks passed
franciscojavierarceo pushed a commit that referenced this pull request Feb 4, 2025
# [0.44.0](v0.43.0...v0.44.0) (2025-02-04)

### Bug Fixes

* Adding periodic check to fix the sporadic failures of the operator e2e tests.  ([#4952](#4952)) ([1d086be](1d086be))
* Adding the feast-operator/bin to the .gitignore directory. Somehow it… ([#5005](#5005)) ([1a027ee](1a027ee))
* Changed Env Vars for e2e tests ([#4975](#4975)) ([fa0084f](fa0084f))
* Fix GitHub Actions to pass authentication ([#4963](#4963)) ([22b9138](22b9138)), closes [#4937](#4937) [#4939](#4939) [#4941](#4941) [#4940](#4940) [#4943](#4943) [#4944](#4944) [#4945](#4945) [#4946](#4946) [#4947](#4947) [#4948](#4948) [#4951](#4951) [#4954](#4954) [#4957](#4957) [#4958](#4958) [#4959](#4959) [#4960](#4960) [#4962](#4962)
* Fix showing selected navigation item in UI sidebar ([#4969](#4969)) ([8ac6a85](8ac6a85))
* Invalid column names in get_historical_features when there are field mappings on join keys ([#4886](#4886)) ([c9aca2d](c9aca2d))
* Read project data from the 'projects' key while loading the registry state in the Feast UI ([#4772](#4772)) ([cb81939](cb81939))
* Remove grpcurl dependency from Operator ([#4972](#4972)) ([439e0b9](439e0b9))
* Removed the dry-run flag to test and we will add it back later. ([#5007](#5007)) ([d112b52](d112b52))
* Render UI navigation items as links instead of buttons ([#4970](#4970)) ([1267703](1267703))
* Resolve Operator CRD bloat due to long field descriptions ([#4985](#4985)) ([7593bb3](7593bb3))
* Update manifest to add feature server image for odh ([#4973](#4973)) ([6a1c102](6a1c102))
* Updating release workflows to refer to yml instead of yaml ([#4935](#4935)) ([02b0a68](02b0a68))
* Use locally built feast-ui package in dev feature-server image ([#4998](#4998)) ([0145e55](0145e55))

### Features

* Added OWNERS file for OpenshiftCI ([#4991](#4991)) ([86a2ee8](86a2ee8))
* Adding Milvus demo to examples ([#4910](#4910)) ([2daf852](2daf852))
* Adding retrieve_online_documents endpoint ([#5002](#5002)) ([6607d3d](6607d3d))
* Adding support to return additional features from vector retrieval for Milvus db ([#4971](#4971)) ([6ce08d3](6ce08d3))
* Creating/updating the stable branch after the release. ([#5003](#5003)) ([e9b53cc](e9b53cc))
* Implementing online_read for MilvusOnlineStore ([#4996](#4996)) ([92dde13](92dde13))
* Improve exception message for unsupported Snowflake data types ([#4779](#4779)) ([5992364](5992364))
* Operator add feast ui deployment ([#4930](#4930)) ([b026d0c](b026d0c))
* Updating documents to highlight v2 api for Vector Similarity Se… ([#5000](#5000)) ([32b82a4](32b82a4))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants