Full text search with limit seems to be giving incorrect answer in this sample case #3264

westonpace · 2024-12-17T15:17:53Z

In the below example we do a FTS on two strings. If we don't apply a limit (or set the limit to 2) then we can see the shorter string has a higher score (presumably because a higher percentage of words are contained in the match?). However, if we set the limit to 1 then we get the document with the lower score back.

import lance
import pyarrow as pa
import shutil

shutil.rmtree("/tmp/foo.lance", ignore_errors=True)

tab = pa.table({
    "text": ["this is some text", "this is some other text"]
})
ds = lance.write_dataset(tab, "/tmp/foo.lance")
ds.create_scalar_index("text", index_type="INVERTED")

print("Results with limit(1)")
print("---------------------")
print(ds.to_table(full_text_query="some text", limit=1))

print("Results with limit(2)")
print("---------------------")
print(ds.to_table(full_text_query="some text", limit=2))

The text was updated successfully, but these errors were encountered:

wjones127 added this to the Lance Papercuts milestone Dec 20, 2024

BubbleCal mentioned this issue Dec 23, 2024

fix: full text search with limit may return an incorrect results #3284

Merged

wjones127 closed this as completed in #3284 Dec 23, 2024

wjones127 closed this as completed in efdea24 Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full text search with limit seems to be giving incorrect answer in this sample case #3264

Full text search with limit seems to be giving incorrect answer in this sample case #3264

westonpace commented Dec 17, 2024

Full text search with limit seems to be giving incorrect answer in this sample case #3264

Full text search with limit seems to be giving incorrect answer in this sample case #3264

Comments

westonpace commented Dec 17, 2024