Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding boost/weight to fields #50

Closed
AliFlux opened this issue Jun 29, 2022 · 4 comments · Fixed by #202
Closed

Adding boost/weight to fields #50

AliFlux opened this issue Jun 29, 2022 · 4 comments · Fixed by #202
Labels
feature-parity Feature parity with upstream tantivy good first issue Good for newcomers help wanted Extra attention is needed

Comments

@AliFlux
Copy link

AliFlux commented Jun 29, 2022

Hi,

Is there a way to add custom weight score to a specific field in tantivy-py. For example if our schema has both title and description fields. And we want the item that matches title to prioritize in the list, we can add weight to title

In whoosh, we can achieve this via field_boost:

title = whoosh.fields.TEXT(stored=True, field_boost=5.0)
description = whoosh.fields.TEXT(stored=True)

Tantivy natively supports boosting:
https://github.com/quickwit-oss/tantivy/blob/db1836691ef9b4f963070bfd9ef13c6d44d2a074/src/query/query_parser/query_parser.rs#L164
https://docs.rs/tantivy/0.16.0/tantivy/query/struct.QueryParser.html#method.set_field_boost

@VsevolodZakharov
Copy link

At least
index.parse_query("title:t^2 OR content:i^3")
gives repr
Query(BooleanQuery { subqueries: [(Should, Boost(query=TermQuery(Term(field=1,bytes=[116])), boost=2)), (Should, Boost(query=TermQuery(Term(field=2,bytes=[105])), boost=3))] })

But construction queries from Query classes is missing in tantivy-py. That wokrs fine in Whoosh. Otherwise parse_query should support fuzzy search, wildcards etc.

@AliFlux
Copy link
Author

AliFlux commented Jul 11, 2022

@VsevolodZakharov thanks for the reply.
Is document based boosting available in tantivy-py? Something like _boost in whoosh

@VsevolodZakharov
Copy link

@AliFlux AFAIK No, it is not available.

@fulmicoton
Copy link
Contributor

fulmicoton commented Jul 11, 2022

It exists in tantivy:
https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html#method.set_field_boost

Someone just needs to plug it in tantivy-py

@fulmicoton fulmicoton added the good first issue Good for newcomers label Jul 13, 2022
Sidhant29 pushed a commit to Sidhant29/tantivy-py that referenced this issue Apr 17, 2023
…oss#50)

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.2.1 to 2.2.4.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@3ebbd71...17573ee)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sidhant29 added a commit to Sidhant29/tantivy-py that referenced this issue Apr 17, 2023
* Tantivy 0.19.2 (quickwit-oss#67)

* Adding __init__.py file to the tantivy folder to make maturin happy

Add Cargo.lock to the repo

Set the git-fetch-with-cli cargo flag so that we can override fetch settings

Renaming .cargo/config to .cargo/config.toml

Adding github-quiq-sh cargo registry

Point dependencies at our github-quiq-sh registry

Trying to resolve this build issue, pointing pyo3-build-config at our github-quiq-sh registry

SER-21487: Enable support for all standard Tantivy languages plus Chinese + Japanese in tantivy-py

SER-21487: Use uname rather than UNAME in the Makefile

SER-21487: Fix document date handling

SER-23013: Upgrade Tantivy and other dependencies

* Upgrade to Tantivy 0.19.1

* Apply rustfmt and fix bug when fast option = None

* Upgrade to tantivy-0.19.2

* Standardize around using 'cargo fmt' rather than 'rustfmt'

* Reverting to old style dependencies

* Linting with clippy

* Switching out hashmap for defining tokenizers for an array, and adding test for Spanish indexing

* Use cargo fmt instead of rustfmt on the Lint ci step

* Add python release build

* workflow dispatch

* simple

* add release

* fix publish pipeline

* update maturin args

* test

* maturin config

* build

* maturin

* build(deps): bump step-security/harden-runner from 1.4.4 to 2.0.0

Bumps [step-security/harden-runner](https://github.com/step-security/harden-runner) from 1.4.4 to 2.0.0.
- [Release notes](https://github.com/step-security/harden-runner/releases)
- [Commits](step-security/harden-runner@74b568e...ebacdc2)

---
updated-dependencies:
- dependency-name: step-security/harden-runner
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>

* build(deps): bump actions/checkout

Bumps [actions/checkout](https://github.com/actions/checkout) from d171c3b028d844f2bf14e9fdec0c58114451e4bf to 61b9e3751b92087fd0b06925ba6dd6314e06f089.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@d171c3b...61b9e37)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* build(deps): bump alexellis/upload-assets from 0.2.2 to 0.4.0

Bumps [alexellis/upload-assets](https://github.com/alexellis/upload-assets) from 0.2.2 to 0.4.0.
- [Release notes](https://github.com/alexellis/upload-assets/releases)
- [Commits](alexellis/upload-assets@eaab147...259de51)

---
updated-dependencies:
- dependency-name: alexellis/upload-assets
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* build(deps): bump messense/maturin-action from 1.28.3 to 1.34.0

Bumps [messense/maturin-action](https://github.com/messense/maturin-action) from 1.28.3 to 1.34.0.
- [Release notes](https://github.com/messense/maturin-action/releases)
- [Commits](PyO3/maturin-action@20111a7...7208c29)

---
updated-dependencies:
- dependency-name: messense/maturin-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* build(deps): bump actions-rs/toolchain

Bumps [actions-rs/toolchain](https://github.com/actions-rs/toolchain) from 63eb9591781c46a70274cb3ebdf190fce92702e8 to 16499b5e05bf2e26879000db0c1d13f7e13fa3af.
- [Release notes](https://github.com/actions-rs/toolchain/releases)
- [Changelog](https://github.com/actions-rs/toolchain/blob/master/CHANGELOG.md)
- [Commits](actions-rs/toolchain@63eb959...16499b5)

---
updated-dependencies:
- dependency-name: actions-rs/toolchain
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* testing

* Update publish.yaml

* Update publish.yaml

* Update publish.yaml

* build(deps): bump actions/upload-artifact from 3.1.0 to 3.1.2

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 3.1.0 to 3.1.2.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](actions/upload-artifact@3cea537...0b7f8ab)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

* build(deps): bump Swatinem/rust-cache from 2.0.0 to 2.2.0

Bumps [Swatinem/rust-cache](https://github.com/Swatinem/rust-cache) from 2.0.0 to 2.2.0.
- [Release notes](https://github.com/Swatinem/rust-cache/releases)
- [Changelog](https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md)
- [Commits](Swatinem/rust-cache@6720f05...359a70e)

---
updated-dependencies:
- dependency-name: Swatinem/rust-cache
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* build(deps): bump messense/maturin-action from 1.34.0 to 1.35.0

Bumps [messense/maturin-action](https://github.com/messense/maturin-action) from 1.34.0 to 1.35.0.
- [Release notes](https://github.com/messense/maturin-action/releases)
- [Commits](PyO3/maturin-action@7208c29...ac0a1ec)

---
updated-dependencies:
- dependency-name: messense/maturin-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* build(deps): bump actions/setup-python from 4.2.0 to 4.5.0

Bumps [actions/setup-python](https://github.com/actions/setup-python) from 4.2.0 to 4.5.0.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v4.2.0...d27e3f3)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* build(deps): bump step-security/harden-runner from 2.0.0 to 2.1.0

Bumps [step-security/harden-runner](https://github.com/step-security/harden-runner) from 2.0.0 to 2.1.0.
- [Release notes](https://github.com/step-security/harden-runner/releases)
- [Commits](step-security/harden-runner@ebacdc2...18bf8ad)

---
updated-dependencies:
- dependency-name: step-security/harden-runner
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* Added float fields to schema with tests (quickwit-oss#36)

* Added float fields to schema with tests

* Fixed typo

* [StepSecurity] Apply security best practices (quickwit-oss#38)

Signed-off-by: StepSecurity Bot <[email protected]>

Signed-off-by: StepSecurity Bot <[email protected]>

* Harden CI (quickwit-oss#39)

* Harden the egress and add dependabot cargo

* delete file

* harden codeql

* build(deps): bump messense/maturin-action from 1.35.0 to 1.35.2 (quickwit-oss#40)

Bumps [messense/maturin-action](https://github.com/messense/maturin-action) from 1.35.0 to 1.35.2.
- [Release notes](https://github.com/messense/maturin-action/releases)
- [Commits](PyO3/maturin-action@ac0a1ec...7559b9d)

---
updated-dependencies:
- dependency-name: messense/maturin-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump github/codeql-action from 2.1.39 to 2.2.1 (quickwit-oss#41)

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.1.39 to 2.2.1.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@a34ca99...3ebbd71)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* feat: create custom kapiche tokenizer (quickwit-oss#42)

* Added a custom Kapiche tokenizer that is inline with the current Tokenizer in Kapiche.

* Lint fixes

* build(deps): bump messense/maturin-action from 1.35.2 to 1.36.0 (quickwit-oss#47)

Bumps [messense/maturin-action](https://github.com/messense/maturin-action) from 1.35.2 to 1.36.0.
- [Release notes](https://github.com/messense/maturin-action/releases)
- [Commits](PyO3/maturin-action@7559b9d...7c85798)

---
updated-dependencies:
- dependency-name: messense/maturin-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Tantivy_0.19.1_upgrade (quickwit-oss#48)

* Adding __init__.py file to the tantivy folder to make maturin happy

Add Cargo.lock to the repo

Set the git-fetch-with-cli cargo flag so that we can override fetch settings

Renaming .cargo/config to .cargo/config.toml

Adding github-quiq-sh cargo registry

Point dependencies at our github-quiq-sh registry

Trying to resolve this build issue, pointing pyo3-build-config at our github-quiq-sh registry

SER-21487: Enable support for all standard Tantivy languages plus Chinese + Japanese in tantivy-py

SER-21487: Use uname rather than UNAME in the Makefile

SER-21487: Fix document date handling

SER-23013: Upgrade Tantivy and other dependencies

* Upgrade to Tantivy 0.19.1

* Added changes and fixed issues

* Formatting fixes

---------

Co-authored-by: Phill Mell-Davies <[email protected]>

* build(deps): bump pyo3-build-config from 0.18.0 to 0.18.1 (quickwit-oss#49)

Bumps [pyo3-build-config](https://github.com/pyo3/pyo3) from 0.18.0 to 0.18.1.
- [Release notes](https://github.com/pyo3/pyo3/releases)
- [Changelog](https://github.com/PyO3/pyo3/blob/main/CHANGELOG.md)
- [Commits](PyO3/pyo3@v0.18.0...v0.18.1)

---
updated-dependencies:
- dependency-name: pyo3-build-config
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump tantivy from 0.19.1 to 0.19.2 (quickwit-oss#53)

Bumps [tantivy](https://github.com/quickwit-oss/tantivy) from 0.19.1 to 0.19.2.
- [Release notes](https://github.com/quickwit-oss/tantivy/releases)
- [Changelog](https://github.com/quickwit-oss/tantivy/blob/main/CHANGELOG.md)
- [Commits](quickwit-oss/tantivy@0.19.1...0.19.2)

---
updated-dependencies:
- dependency-name: tantivy
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump github/codeql-action from 2.2.1 to 2.2.4 (quickwit-oss#50)

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.2.1 to 2.2.4.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@3ebbd71...17573ee)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump serde_json from 1.0.92 to 1.0.93 (quickwit-oss#51)

Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.92 to 1.0.93.
- [Release notes](https://github.com/serde-rs/json/releases)
- [Commits](serde-rs/json@v1.0.92...v1.0.93)

---
updated-dependencies:
- dependency-name: serde_json
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update Cargo.lock

* Update Makefile

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: StepSecurity Bot <[email protected]>
Co-authored-by: Phill Mell-Davies <[email protected]>
Co-authored-by: Cam Parry <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Cameron <[email protected]>
Co-authored-by: StepSecurity Bot <[email protected]>
Co-authored-by: Phill Mell-Davies <[email protected]>
@cjrh cjrh added the help wanted Extra attention is needed label Jan 30, 2024
@cjrh cjrh added the feature-parity Feature parity with upstream tantivy label Jan 30, 2024
@cjrh cjrh closed this as completed in #202 Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-parity Feature parity with upstream tantivy good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants