Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(search_family): Process wrong field types in indexes for the FT.SEARCH and FT.AGGREGATE commands #4070

Merged

Conversation

BagritsevichStepan
Copy link
Contributor

fixes #3986

Copy link
Collaborator

@romange romange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will wait for @dranikpg to review , meanwhile some minor comments so I could understand the PR better

src/core/search/base.cc Outdated Show resolved Hide resolved
src/core/search/indices.cc Outdated Show resolved Hide resolved
src/core/search/indices.cc Outdated Show resolved Hide resolved
src/core/search/search.h Outdated Show resolved Hide resolved
src/core/search/sort_indices.cc Outdated Show resolved Hide resolved
src/core/search/sort_indices.cc Outdated Show resolved Hide resolved
src/core/search/sort_indices.cc Outdated Show resolved Hide resolved
src/server/search/doc_accessors.cc Show resolved Hide resolved
@romange romange removed the request for review from adiholden November 5, 2024 17:50
@BagritsevichStepan BagritsevichStepan force-pushed the df.search/fix-wrong-field-type branch from 1b4f480 to 1b4625a Compare November 6, 2024 13:32
@BagritsevichStepan
Copy link
Contributor Author

@romange I've added more tests. After adding support for nullable values, a more tests will be added. Please take a look at the PR.

Copy link
Contributor

@dranikpg dranikpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My two cents:

  1. Please write next time what you changed in the PR description, so I can dive into the changes quicker
  2. Maybe look into unifying Matches() and Add(), this would simplify code
  3. Please check vectors or add a TODO

If we hurry for a release I can approve 🙂

src/core/search/base.h Outdated Show resolved Hide resolved
src/core/search/indices.cc Outdated Show resolved Hide resolved
src/core/search/search.cc Outdated Show resolved Hide resolved
src/server/search/doc_index.cc Outdated Show resolved Hide resolved
src/core/search/sort_indices.h Outdated Show resolved Hide resolved
src/core/search/indices.cc Outdated Show resolved Hide resolved
@BagritsevichStepan BagritsevichStepan force-pushed the df.search/fix-wrong-field-type branch from 6a98c85 to c6bbca9 Compare November 8, 2024 16:45
@BagritsevichStepan
Copy link
Contributor Author

BagritsevichStepan commented Nov 8, 2024

I found several new bugs and fixed them in the latest commit:

  1. Bug:
JSON.SET j1 . '{"data":1}'
JSON.SET j2 . '{"data":"1"}'
FT.CREATE i1 ON JSON SCHEMA $.data AS data NUMERIC
FT.CREATE i2 ON JSON SCHEMA $.data AS data TEXT

FT.SEARCH i1 "*" and FT.SEARCH i2 "*" both return j1 and j2. However, it should return j1 for i1 and j2 for i2. This happens because we are parsing string and numeric fields in the same way in the DocumentAccessor.

  1. Bug related to how we parse FtVector in the GetVector function in DocumentAccessor.
  2. Bug was crash during JsonAccessor::GetVector when the field type is not an array

I’ve also added more tests

@BagritsevichStepan BagritsevichStepan force-pushed the df.search/fix-wrong-field-type branch 2 times, most recently from 3ca4592 to d8dd8cb Compare November 9, 2024 11:25
@BagritsevichStepan
Copy link
Contributor Author

Found and fixed another bug:

We should support queries like this:

JSON.SET j1 . '{"data":["first value", "second_value"]}'
FT.CREATE index ON JSON SCHEMA $.data AS data TEXT

FT.SEARCH index "*" should return j1

src/core/search/base.h Outdated Show resolved Hide resolved
src/core/search/base.h Outdated Show resolved Hide resolved
dranikpg
dranikpg previously approved these changes Nov 9, 2024
Copy link
Contributor

@dranikpg dranikpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job fixing bugs around vector parsing, it wasn't safe before.

Still, once we merge this for 1.25, I'd suggest to unite IsValidFieldType and Add, because we have safe/unsafe parsing functions, optionals that are dereferenced "by implicit contract" and double effort indexing documents (we parse it two times)

src/core/search/base.h Outdated Show resolved Hide resolved
@romange romange merged commit 503bb4e into dragonflydb:main Nov 10, 2024
9 checks passed
romange pushed a commit that referenced this pull request Nov 10, 2024
…EARCH and FT.AGGREGATE commands (#4070)

* fix(search_family): Process wrong field types in indexes for the FT.SEARCH and FT.AGGREGATE commands

fixes  #3986
---------

Signed-off-by: Stepan Bagritsevich <[email protected]>
@BagritsevichStepan BagritsevichStepan deleted the df.search/fix-wrong-field-type branch November 10, 2024 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong field types in indexes
3 participants