Skip to content
This repository has been archived by the owner on Sep 1, 2021. It is now read-only.

implement calc_distance() #250

Merged
merged 2 commits into from
Jul 13, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions pymilvus_orm/utility.py
Original file line number Diff line number Diff line change
Expand Up @@ -260,3 +260,53 @@ def list_collections(timeout=None, using="default") -> list:
>>> utility.list_collections()
"""
return _get_connection(using).list_collections()


def calc_distance(vectors_left, vectors_right, params=None, timeout=None, using="default"):
"""
Calculate distance between two vector arrays.

:param vectors_left: The vectors on the left of operator.
:type vectors_left: dict
`{"ids": [1, 2, 3, .... n], "collection": "c_1", "partition": "p_1", "field": "v_1"}`
or
`{"float_vectors": [[1.0, 2.0], [3.0, 4.0], ... [9.0, 10.0]]}`
or
`{"bin_vectors": [b'\x94', b'N', ... b'\xca']}`

:param vectors_right: The vectors on the right of operator.
:type vectors_right: dict
`{"ids": [1, 2, 3, .... n], "collection": "col_1", "partition": "p_1", "field": "v_1"}`
or
`{"float_vectors": [[1.0, 2.0], [3.0, 4.0], ... [9.0, 10.0]]}`
or
`{"bin_vectors": [b'\x94', b'N', ... b'\xca']}`

:param params: parameters, currently only support "metric_type", default value is "L2"
extra parameter for "L2" distance: "sqrt", true or false, default is false
extra parameter for "HAMMING" and "TANIMOTO": "dim", set this value if dimension is not a multiple of 8, otherwise the dimension will be calculted by list length
:type params: dict
There are examples of supported metric_type:
`{"metric": "L2"}`
`{"metric": "IP"}`
`{"metric": "HAMMING"}`
`{"metric": "TANIMOTO"}`
Note: "L2", "IP", "HAMMING", "TANIMOTO" are case insensitive

:return: 2-d array distances
:rtype: list[list[int]] for "HAMMING" or list[list[float]] for others
Assume the vectors_left: L_1, L_2, L_3
Assume the vectors_right: R_a, R_b
Distance between L_n and R_m we called "D_n_m"
The returned distances are arranged like this:
[D_1_a, D_1_b, D_2_a, D_2_b, D_3_a, D_3_b]

:example:
>>> vectors_l = [[random.random() for _ in range(64)] for _ in range(5)]
>>> vectors_r = [[random.random() for _ in range(64)] for _ in range(10)]
>>> op_l = {"float_vectors": vectors_l}
>>> op_r = {"float_vectors": vectors_r}
>>> params = {"metric": "L2", "sqrt": True}
>>> results = utility.calc_distance(vectors_left=op_l, vectors_right=op_r, params=params)
"""
return _get_connection(using).calc_distance(vectors_left, vectors_right, params, timeout)