-
-
Notifications
You must be signed in to change notification settings - Fork 698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
extend FuzzyTermQuery to support json field #2173
Conversation
@PSeitz can you review? |
src/query/fuzzy_query.rs
Outdated
)) | ||
|
||
if let Some(json_path_bytes) = term_value.as_json_path_bytes() { | ||
return Ok(AutomatonWeight::new_for_json_path( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: drop return
and semicolon
Codecov ReportPatch coverage:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #2173 +/- ##
========================================
Coverage 94.42% 94.42%
========================================
Files 322 322
Lines 63104 63207 +103
========================================
+ Hits 59583 59682 +99
- Misses 3521 3525 +4
☔ View full report in Codecov by Sentry. |
src/schema/term.rs
Outdated
/// Returns the json path bytes (including the JSON_END_OF_PATH byte) | ||
/// | ||
/// Returns `None` if the value is not JSON. | ||
pub(crate) fn as_json_path_bytes(&self) -> Option<&[u8]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need an extra method, as_json
already covers the path. as_json
seems unused currently, you can edit it to include JSON_END_OF_PATH if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The json_path_bytes are used here
fn automaton_stream<'a>(
&'a self,
term_dict: &'a TermDictionary,
) -> io::Result<TermStreamer<'a, &'a A>> {
let automaton: &A = &self.automaton;
let mut term_stream_builder = term_dict.search(automaton);
if let Some(json_path_bytes) = &self.json_path_bytes {
term_stream_builder = term_stream_builder.ge(json_path_bytes);
if let Some(end) = prefix_end(json_path_bytes) {
term_stream_builder = term_stream_builder.lt(&end);
}
}
term_stream_builder.into_stream()
}
Two reasons for this new method
- We need to include the
JSON_END_OF_PATH
byte. Otherwise, the automaton_stream will include pathaa
while we're only interested in patha
- The StreamBuilder.ge / lt methods require byte references, not a string for the path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as_json seems unused currently, you can edit it to include JSON_END_OF_PATH if needed
FYI it is used in debug_value_bytes.
@@ -6,7 +6,7 @@ pub use phrase_prefix_query::PhrasePrefixQuery; | |||
pub use phrase_prefix_scorer::PhrasePrefixScorer; | |||
pub use phrase_prefix_weight::PhrasePrefixWeight; | |||
|
|||
fn prefix_end(prefix_start: &[u8]) -> Option<Vec<u8>> { | |||
pub(crate) fn prefix_end(prefix_start: &[u8]) -> Option<Vec<u8>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the u8::MAX
logic here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I referred to the phrasePrefixQuery implementation, which also filters the term based on the term value prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it attempts to find the next larger prefix. Typically, this involves incrementing the last u8 value by 1. However, there is an edge case to consider when the last u8 value is u8::MAX.
LGTM. Thanks! |
Summary
Discord discussion here: https://discord.com/channels/908281611840282624/908286403086024724/1148718777651970126
Test Plan