Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data generation script for nested field #1388

Merged

Conversation

heemin32
Copy link
Collaborator

Description

Add data generation script for nested field. The script is similar to add-filters-to-dataset.py but it adds parent id as attributes to an existing vector data so that the value can be used to construct document with nested field.

Issues Resolved

N/A

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@junqiu-lei
Copy link
Member

The test CI is failing.

@heemin32
Copy link
Collaborator Author

The test CI is failing.

Rebased.

Copy link

codecov bot commented Jan 16, 2024

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (0abed23) 85.20% compared to head (510ac1a) 84.91%.
Report is 6 commits behind head on feature/multi-vector.

❗ Current head 510ac1a differs from pull request most recent head 3328f92. Consider uploading reports for the commit 3328f92 to get more accurate results

Files Patch % Lines
...ec/KNN990Codec/KNN990PerFieldKnnVectorsFormat.java 80.00% 1 Missing ⚠️
...java/org/opensearch/knn/index/query/KNNWeight.java 93.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@                    Coverage Diff                     @@
##             feature/multi-vector    #1388      +/-   ##
==========================================================
- Coverage                   85.20%   84.91%   -0.29%     
- Complexity                   1260     1261       +1     
==========================================================
  Files                         163      165       +2     
  Lines                        5115     5143      +28     
  Branches                      479      480       +1     
==========================================================
+ Hits                         4358     4367       +9     
- Misses                        552      571      +19     
  Partials                      205      205              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@martin-gaievski martin-gaievski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please add a few lines in a README on how to use the script

@heemin32 heemin32 merged commit 1ab9305 into opensearch-project:feature/multi-vector Jan 16, 2024
44 of 47 checks passed
@heemin32 heemin32 deleted the perf-nested branch January 16, 2024 23:58
heemin32 added a commit that referenced this pull request Jan 19, 2024
* Add patch to support multi vector in faiss (#1358)

Signed-off-by: Heemin Kim <[email protected]>

* Initialize id_map as null (#1363)

Signed-off-by: Heemin Kim <[email protected]>

* Add support of multi vector in jni (#1364)

Signed-off-by: Heemin Kim <[email protected]>

* Multi vector support for Faiss HNSW (#1371)

Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Signed-off-by: Heemin Kim <[email protected]>

* Add data generation script for nested field (#1388)

Signed-off-by: Heemin Kim <[email protected]>

* Add perf test for nested field (#1394)

Signed-off-by: Heemin Kim <[email protected]>

---------

Signed-off-by: Heemin Kim <[email protected]>
heemin32 added a commit to heemin32/k-NN that referenced this pull request Jan 19, 2024
* Add patch to support multi vector in faiss (opensearch-project#1358)

Signed-off-by: Heemin Kim <[email protected]>

* Initialize id_map as null (opensearch-project#1363)

Signed-off-by: Heemin Kim <[email protected]>

* Add support of multi vector in jni (opensearch-project#1364)

Signed-off-by: Heemin Kim <[email protected]>

* Multi vector support for Faiss HNSW (opensearch-project#1371)

Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Signed-off-by: Heemin Kim <[email protected]>

* Add data generation script for nested field (opensearch-project#1388)

Signed-off-by: Heemin Kim <[email protected]>

* Add perf test for nested field (opensearch-project#1394)

Signed-off-by: Heemin Kim <[email protected]>

---------

Signed-off-by: Heemin Kim <[email protected]>
(cherry picked from commit 709b448)
heemin32 added a commit to heemin32/k-NN that referenced this pull request Jan 19, 2024
* Add patch to support multi vector in faiss (opensearch-project#1358)

Signed-off-by: Heemin Kim <[email protected]>

* Initialize id_map as null (opensearch-project#1363)

Signed-off-by: Heemin Kim <[email protected]>

* Add support of multi vector in jni (opensearch-project#1364)

Signed-off-by: Heemin Kim <[email protected]>

* Multi vector support for Faiss HNSW (opensearch-project#1371)

Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Signed-off-by: Heemin Kim <[email protected]>

* Add data generation script for nested field (opensearch-project#1388)

Signed-off-by: Heemin Kim <[email protected]>

* Add perf test for nested field (opensearch-project#1394)

Signed-off-by: Heemin Kim <[email protected]>

---------

Signed-off-by: Heemin Kim <[email protected]>
(cherry picked from commit 709b448)
heemin32 added a commit that referenced this pull request Jan 19, 2024
* Add patch to support multi vector in faiss (#1358)

Signed-off-by: Heemin Kim <[email protected]>

* Initialize id_map as null (#1363)

Signed-off-by: Heemin Kim <[email protected]>

* Add support of multi vector in jni (#1364)

Signed-off-by: Heemin Kim <[email protected]>

* Multi vector support for Faiss HNSW (#1371)

Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Signed-off-by: Heemin Kim <[email protected]>

* Add data generation script for nested field (#1388)

Signed-off-by: Heemin Kim <[email protected]>

* Add perf test for nested field (#1394)

Signed-off-by: Heemin Kim <[email protected]>

---------

Signed-off-by: Heemin Kim <[email protected]>
(cherry picked from commit 709b448)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants