Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for phrase slop in query language #1393

Merged
merged 6 commits into from
Jun 28, 2022

Conversation

saroh
Copy link
Contributor

@saroh saroh commented Jun 19, 2022

Breaking changes to the existing query language:

  • ~ is now allowed unescaped in the field name

The first commit still allows ~ in words, might be a bit flimsy.
Second one exposes distance for all leaves. Do we need to explicitly fail for leaves that do not handle slop/distance + implement it for those who do ? (Term maybe, I haven't checked)

Not realy sure on the name, slop is hard to understand, distance is maybe a little to vague.

closes #1390

@codecov-commenter
Copy link

codecov-commenter commented Jun 19, 2022

Codecov Report

Merging #1393 (19f695a) into main (83d0c13) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1393      +/-   ##
==========================================
+ Coverage   94.29%   94.33%   +0.03%     
==========================================
  Files         236      236              
  Lines       43418    43643     +225     
==========================================
+ Hits        40942    41170     +228     
+ Misses       2476     2473       -3     
Impacted Files Coverage Δ
query-grammar/src/query_grammar.rs 99.67% <100.00%> (+0.01%) ⬆️
query-grammar/src/user_input_ast.rs 97.87% <100.00%> (+0.09%) ⬆️
src/query/phrase_query/phrase_query.rs 91.54% <100.00%> (+0.37%) ⬆️
src/query/query_parser/logical_ast.rs 88.37% <100.00%> (+1.19%) ⬆️
src/query/query_parser/query_parser.rs 94.98% <100.00%> (+0.11%) ⬆️
src/aggregation/agg_result.rs 73.46% <0.00%> (-19.79%) ⬇️
src/schema/facet.rs 89.88% <0.00%> (-0.06%) ⬇️
src/indexer/segment_serializer.rs 98.07% <0.00%> (-0.04%) ⬇️
src/fastfield/mod.rs 94.50% <0.00%> (-0.02%) ⬇️
src/indexer/segment_writer.rs 96.40% <0.00%> (-0.01%) ⬇️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 83d0c13...19f695a. Read the comment docs.

@fulmicoton fulmicoton requested a review from PSeitz June 21, 2022 01:28
@PSeitz
Copy link
Contributor

PSeitz commented Jun 21, 2022

For naming I think matching distance is much better than slop

@saroh saroh requested a review from PSeitz June 22, 2022 22:03
@saroh saroh force-pushed the 1390-phrase-query-slop branch from a51d3eb to 1008bd5 Compare June 22, 2022 22:16
@saroh
Copy link
Contributor Author

saroh commented Jun 22, 2022

I gave a quick look, I should be able to plug in the fuzzy query for bare words with the ~ operator soon.

@fulmicoton
Copy link
Collaborator

@PSeitz @saroh I recommend we stick to slop.

It has the merit of

  • being unambiguous with fuzzy matching.
  • being in line with Lucene's vocabulary

@saroh
Copy link
Contributor Author

saroh commented Jun 23, 2022

@fulmicoton @PSeitz so we keep slop in the phrase_query mod but it's ok to use matching distance for the grammar ?

Can we define "it" as being an u8 in the query grammar as it is the type that was chosen for the fuzzy query distance and 255 seems to be good enough of a max distance whatever the Query impl for the time being ?

@saroh saroh force-pushed the 1390-phrase-query-slop branch from 132ef50 to 09b5c47 Compare June 23, 2022 19:30
@saroh saroh force-pushed the 1390-phrase-query-slop branch from 09b5c47 to df475b1 Compare June 23, 2022 20:56
@fulmicoton
Copy link
Collaborator

Right now let's not handle fuzzy query in the grammar, it would break quickwit.
For the naming, I'd stick to slop in the grammar for the moment.

@saroh saroh force-pushed the 1390-phrase-query-slop branch from 5060251 to b2e5dc8 Compare June 24, 2022 18:00
@saroh saroh force-pushed the 1390-phrase-query-slop branch from b2e5dc8 to 53fcc33 Compare June 24, 2022 18:19
@saroh saroh changed the title add support for matching distance in phrase query add support for phrase slop in query language Jun 24, 2022
@saroh saroh requested a review from PSeitz June 24, 2022 19:02
@saroh
Copy link
Contributor Author

saroh commented Jun 25, 2022

I've got this PR https://github.com/saroh/tantivy/pull/1/files which I can push in here or open later. Adds support for "foo"~1, no changes to the query language.

@fulmicoton fulmicoton merged commit 437cd35 into quickwit-oss:main Jun 28, 2022
@saroh saroh deleted the 1390-phrase-query-slop branch June 28, 2022 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add slop phrase query to the query grammar.
4 participants