Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Updated the building queries with Query objects section #254

Merged
merged 1 commit into from
Apr 25, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 85 additions & 1 deletion docs/tutorials.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,9 @@ writer.wait_merging_threads()
Note that `wait_merging_threads()` must come at the end, because
the `writer` object will not be usable after this call.

## Building and Executing Queries
## Building and Executing Queries with the Query Parser

With the Query Parser, you can easily build simple queries for your index.

First you need to get a searcher for the index

Expand All @@ -89,8 +91,90 @@ best_doc = searcher.doc(best_doc_address)
assert best_doc["title"] == ["The Old Man and the Sea"]
```

The `parse_query` method takes in a query string (visit [reference](reference.md#valid-query-formats) for more details on the syntax) and create a `Query` object that can be used to search the index.

In Tantivy, hit documents during search will return a `DocAddress` object that can be used to retrieve the document from the searcher, rather than returning the document directly.

## Building and Executing Queries with Query Objects

> *This is an advanced topic. Only consider this if you need very fine-grained control over your queries, or existing query parsers do not meet your needs.*

If you have a Lucene / ElasticSearch background, you might be more comfortable building nested queries programmatically. Also, some queries (e.g. ConstQuery, DisjunctionMaxQuery) are not supported by the query parser due to their complexity in expression.

Consider the following query in ElasticSearch:

```json
{
"query": {
"bool": {
"must": [
{
"dis_max": {
"queries": [
{
"match": {
"title": {
"query": "fish",
"boost": 2
}
}
},
{
"match": {
"body": {
"query": "eighty-four days",
"boost": 1.5
}
}
}
],
"tie_breaker": 0.3
}
}
]
}
}
}
```

It is impossible to express this query using the query parser. Instead, you can build the query programmatically mixing with the query parser:

```python
from tantivy import Query, Occur, Index

...

complex_query = Query.boolean_query(
[
(
Occur.Must,
Query.disjunction_max_query(
[
Query.boost_query(
# by default, only the query parser will analyze
# your query string
index.parse_query("fish", ["title"]),
2.0
),
Query.boost_query(
index.parse_query("eighty-four days", ["body"]),
1.5
),
],
0.3,
),
)
]
)

```

<!--TODO: Update the reference link to the query parser docs when available.-->

## Using the snippet generator

Let's revisit the query `"fish days"` in our [example](#building-and-executing-queries-with-the-query-parser):

```python
hit_text = best_doc["body"][0]
print(f"{hit_text=}")
Expand Down
Loading