Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add perf test for nested field #1394

Merged

Conversation

heemin32
Copy link
Collaborator

@heemin32 heemin32 commented Jan 18, 2024

Description

Add perf test for nested field

Issues Resolved

N/A

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Test

index.json

{
  "settings": {
    "index": {
      "knn": true,
      "number_of_shards": 24,
      "number_of_replicas": 1,
      "knn.algo_param.ef_search": 100
    }
  },
  "mappings": {
    "properties": {
      "nested_field": {
        "type": "nested",
        "properties": {
          "target_field": {
            "type": "knn_vector",
            "dimension": 16,
            "method": {
              "name": "hnsw",
              "space_type": "l2",
              "engine": "faiss",
              "parameters": {
                "ef_construction": 256,
                "m": 16
              }
            }
          }
        }
      }
    }
  }
}

test.yml

endpoint: localhost
port: 9200
test_name: "Faiss HNSW Nested Field Test"
test_id: "Faiss HNSW Nested Field Test"
num_runs: 1
show_runs: false
steps:
  - name: delete_index
    index_name: target_index
  - name: create_index
    index_name: target_index
    index_spec: release-configs/faiss-hnsw/nested/simple/index.json
  - name: ingest_nested_field
    index_name: target_index
    field_name: target_field
    dataset_format: hdf5
    dataset_path: dataset/sift-128-euclidean-nested.hdf5
    attributes_dataset_name: attributes
    attribute_spec: [ { name: 'color', type: 'str' }, { name: 'taste', type: 'str' }, { name: 'age', type: 'int' }, { name: 'parent_id', type: 'int'} ]
  - name: refresh_index
    index_name: target_index
  - name: force_merge
    index_name: target_index
    max_num_segments: 1
  - name: warmup_operation
    index_name: target_index
  - name: query_nested_field
    k: 100
    r: 1
    calculate_recall: true
    index_name: target_index
    field_name: target_field
    dataset_format: hdf5
    dataset_path: dataset/sift-128-euclidean-nested.hdf5
    neighbors_format: hdf5
    neighbors_path: dataset/sift-128-euclidean-nested.hdf5
    neighbors_dataset: neighbour_nested

Result

{
  "metadata": {
    "test_name": "Faiss HNSW Nested Field Test",
    "test_id": "Faiss HNSW Nested Field Test",
    "date": "01/18/2024 14:29:17",
    "python_version": "3.9.18 (main, Sep 11 2023, 08:38:23) \n[Clang 14.0.6 ]",
    "os_version": "macOS-10.16-x86_64-i386-64bit",
    "processor": "i386, 10 cores",
    "memory": "2393055232 (used) / 1749037056 (available) / 34359738368 (total)"
  },
  "results": {
    "test_took": 6275.669917,
    "delete_index_took_total": 461.2337919999998,
    "create_index_took_total": 2765.4128749999995,
    "ingest_nested_field_took_total": 591.4032919999999,
    "refresh_index_store_kb_total": 340.2109375,
    "refresh_index_took_total": 649.8910840000001,
    "force_merge_took_total": 100.17316600000026,
    "warmup_operation_took_total": 68.55570799999988,
    "query_nested_field_took_total": 1639.0,
    "query_nested_field_took_p50": 12.0,
    "query_nested_field_took_p90": 28.0,
    "query_nested_field_took_p99": 158.0,
    "query_nested_field_took_p100": 158.0,
    "query_nested_field_client_time_total": 2656.0,
    "query_nested_field_client_time_p50": 24.0,
    "query_nested_field_client_time_p90": 43.0,
    "query_nested_field_client_time_p99": 179.0,
    "query_nested_field_client_time_p100": 179.0,
    "query_nested_field_memory_kb_total": 127.0,
    "query_nested_field_recall@K_total": 1.0,
    "query_nested_field_recall@1_total": 1.0
  }
}

Copy link

codecov bot commented Jan 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (1ab9305) 84.85% compared to head (2d975c9) 84.95%.

Additional details and impacted files
@@                    Coverage Diff                     @@
##             feature/multi-vector    #1394      +/-   ##
==========================================================
+ Coverage                   84.85%   84.95%   +0.09%     
  Complexity                   1261     1261              
==========================================================
  Files                         165      165              
  Lines                        5143     5143              
  Branches                      480      480              
==========================================================
+ Hits                         4364     4369       +5     
+ Misses                        574      569       -5     
  Partials                      205      205              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@martin-gaievski martin-gaievski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add to PR description a test config with the new step and results of running that config is local. Can be very simple dataset and cluster, nothing fancy

@@ -429,6 +429,118 @@ def bulk_transform_with_attributes(self, partition: np.ndarray, partition_attr,
return actions


class IngestNestedFieldStep(BaseIngestStep):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heemin32 heemin32 merged commit 03b65ec into opensearch-project:feature/multi-vector Jan 18, 2024
48 checks passed
heemin32 added a commit that referenced this pull request Jan 19, 2024
* Add patch to support multi vector in faiss (#1358)

Signed-off-by: Heemin Kim <[email protected]>

* Initialize id_map as null (#1363)

Signed-off-by: Heemin Kim <[email protected]>

* Add support of multi vector in jni (#1364)

Signed-off-by: Heemin Kim <[email protected]>

* Multi vector support for Faiss HNSW (#1371)

Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Signed-off-by: Heemin Kim <[email protected]>

* Add data generation script for nested field (#1388)

Signed-off-by: Heemin Kim <[email protected]>

* Add perf test for nested field (#1394)

Signed-off-by: Heemin Kim <[email protected]>

---------

Signed-off-by: Heemin Kim <[email protected]>
heemin32 added a commit to heemin32/k-NN that referenced this pull request Jan 19, 2024
* Add patch to support multi vector in faiss (opensearch-project#1358)

Signed-off-by: Heemin Kim <[email protected]>

* Initialize id_map as null (opensearch-project#1363)

Signed-off-by: Heemin Kim <[email protected]>

* Add support of multi vector in jni (opensearch-project#1364)

Signed-off-by: Heemin Kim <[email protected]>

* Multi vector support for Faiss HNSW (opensearch-project#1371)

Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Signed-off-by: Heemin Kim <[email protected]>

* Add data generation script for nested field (opensearch-project#1388)

Signed-off-by: Heemin Kim <[email protected]>

* Add perf test for nested field (opensearch-project#1394)

Signed-off-by: Heemin Kim <[email protected]>

---------

Signed-off-by: Heemin Kim <[email protected]>
(cherry picked from commit 709b448)
heemin32 added a commit to heemin32/k-NN that referenced this pull request Jan 19, 2024
* Add patch to support multi vector in faiss (opensearch-project#1358)

Signed-off-by: Heemin Kim <[email protected]>

* Initialize id_map as null (opensearch-project#1363)

Signed-off-by: Heemin Kim <[email protected]>

* Add support of multi vector in jni (opensearch-project#1364)

Signed-off-by: Heemin Kim <[email protected]>

* Multi vector support for Faiss HNSW (opensearch-project#1371)

Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Signed-off-by: Heemin Kim <[email protected]>

* Add data generation script for nested field (opensearch-project#1388)

Signed-off-by: Heemin Kim <[email protected]>

* Add perf test for nested field (opensearch-project#1394)

Signed-off-by: Heemin Kim <[email protected]>

---------

Signed-off-by: Heemin Kim <[email protected]>
(cherry picked from commit 709b448)
heemin32 added a commit that referenced this pull request Jan 19, 2024
* Add patch to support multi vector in faiss (#1358)

Signed-off-by: Heemin Kim <[email protected]>

* Initialize id_map as null (#1363)

Signed-off-by: Heemin Kim <[email protected]>

* Add support of multi vector in jni (#1364)

Signed-off-by: Heemin Kim <[email protected]>

* Multi vector support for Faiss HNSW (#1371)

Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Signed-off-by: Heemin Kim <[email protected]>

* Add data generation script for nested field (#1388)

Signed-off-by: Heemin Kim <[email protected]>

* Add perf test for nested field (#1394)

Signed-off-by: Heemin Kim <[email protected]>

---------

Signed-off-by: Heemin Kim <[email protected]>
(cherry picked from commit 709b448)
@heemin32 heemin32 deleted the perf-nested branch July 18, 2024 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants