Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List of terms for field #2567

Closed
guilload opened this issue Dec 9, 2022 · 3 comments · Fixed by #2740
Closed

List of terms for field #2567

guilload opened this issue Dec 9, 2022 · 3 comments · Fixed by #2740
Assignees
Labels
enhancement New feature or request

Comments

@guilload
Copy link
Member

guilload commented Dec 9, 2022

Given a field f, a prefix p, and a maximum number of returned terms n, returns at most n unique terms ordered lexicographically starting with the prefix p for the field f. Returns all the terms in f if p is the empty string and n is None.

Use case: for the Jaeger integration, we want to efficiently retrieve the list of services for which traces have been ingested in the last 24h. Since the term dictionary will be small for this field, we will likely want to store it in the hotcache.

@guilload guilload added the enhancement New feature or request label Dec 9, 2022
@fulmicoton fulmicoton changed the title Completion query List of terms for field Dec 12, 2022
@fulmicoton
Copy link
Contributor

fulmicoton commented Dec 12, 2022

I took the liberty to change the name of the ticket.

This ticket is related to #2266.

I think we want this to work for any field, and in a separate ticket do some extra work specific to the service field to have it into the hotcache. This might mean some work in the doc mapper.

Anyway in this ticket, we will want to have some proper warmup logic to load the one or two blocks term dictionary blocks necessary to address this queyr. That warmup routine can be reused in #2266.

@fulmicoton
Copy link
Contributor

We might need to land quickwit-oss/tantivy#1734 before this.

@fulmicoton
Copy link
Contributor

More information:
First important realization:
A prefix query is also a range query.

For instance, if the prefix being search is blabla @ last_char, then the prefix is equivalent to searching for
[blabla @ last_byte, blabla @ last_char)

I leave it to you to sort out how to deal with last_char == 255

Then we already have code to identify file slice that need to be warmed up to go through a range:
Dictionary::file_slice_for_range

We can be a tiny bit more tight fit if needed by adapting it to only cover enough blocks to make sure we cover n terms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants