Skip to content

Commit

Permalink
Tantivy-0.19.2 updates (quickwit-oss#56)
Browse files Browse the repository at this point in the history
* Tantivy 0.19.2 (quickwit-oss#67)

* Adding __init__.py file to the tantivy folder to make maturin happy

Add Cargo.lock to the repo

Set the git-fetch-with-cli cargo flag so that we can override fetch settings

Renaming .cargo/config to .cargo/config.toml

Adding github-quiq-sh cargo registry

Point dependencies at our github-quiq-sh registry

Trying to resolve this build issue, pointing pyo3-build-config at our github-quiq-sh registry

SER-21487: Enable support for all standard Tantivy languages plus Chinese + Japanese in tantivy-py

SER-21487: Use uname rather than UNAME in the Makefile

SER-21487: Fix document date handling

SER-23013: Upgrade Tantivy and other dependencies

* Upgrade to Tantivy 0.19.1

* Apply rustfmt and fix bug when fast option = None

* Upgrade to tantivy-0.19.2

* Standardize around using 'cargo fmt' rather than 'rustfmt'

* Reverting to old style dependencies

* Linting with clippy

* Switching out hashmap for defining tokenizers for an array, and adding test for Spanish indexing

* Use cargo fmt instead of rustfmt on the Lint ci step

* Add python release build

* workflow dispatch

* simple

* add release

* fix publish pipeline

* update maturin args

* test

* maturin config

* build

* maturin

* build(deps): bump step-security/harden-runner from 1.4.4 to 2.0.0

Bumps [step-security/harden-runner](https://github.com/step-security/harden-runner) from 1.4.4 to 2.0.0.
- [Release notes](https://github.com/step-security/harden-runner/releases)
- [Commits](step-security/harden-runner@74b568e...ebacdc2)

---
updated-dependencies:
- dependency-name: step-security/harden-runner
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>

* build(deps): bump actions/checkout

Bumps [actions/checkout](https://github.com/actions/checkout) from d171c3b028d844f2bf14e9fdec0c58114451e4bf to 61b9e3751b92087fd0b06925ba6dd6314e06f089.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@d171c3b...61b9e37)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* build(deps): bump alexellis/upload-assets from 0.2.2 to 0.4.0

Bumps [alexellis/upload-assets](https://github.com/alexellis/upload-assets) from 0.2.2 to 0.4.0.
- [Release notes](https://github.com/alexellis/upload-assets/releases)
- [Commits](alexellis/upload-assets@eaab147...259de51)

---
updated-dependencies:
- dependency-name: alexellis/upload-assets
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* build(deps): bump messense/maturin-action from 1.28.3 to 1.34.0

Bumps [messense/maturin-action](https://github.com/messense/maturin-action) from 1.28.3 to 1.34.0.
- [Release notes](https://github.com/messense/maturin-action/releases)
- [Commits](PyO3/maturin-action@20111a7...7208c29)

---
updated-dependencies:
- dependency-name: messense/maturin-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* build(deps): bump actions-rs/toolchain

Bumps [actions-rs/toolchain](https://github.com/actions-rs/toolchain) from 63eb9591781c46a70274cb3ebdf190fce92702e8 to 16499b5e05bf2e26879000db0c1d13f7e13fa3af.
- [Release notes](https://github.com/actions-rs/toolchain/releases)
- [Changelog](https://github.com/actions-rs/toolchain/blob/master/CHANGELOG.md)
- [Commits](actions-rs/toolchain@63eb959...16499b5)

---
updated-dependencies:
- dependency-name: actions-rs/toolchain
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* testing

* Update publish.yaml

* Update publish.yaml

* Update publish.yaml

* build(deps): bump actions/upload-artifact from 3.1.0 to 3.1.2

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 3.1.0 to 3.1.2.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](actions/upload-artifact@3cea537...0b7f8ab)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

* build(deps): bump Swatinem/rust-cache from 2.0.0 to 2.2.0

Bumps [Swatinem/rust-cache](https://github.com/Swatinem/rust-cache) from 2.0.0 to 2.2.0.
- [Release notes](https://github.com/Swatinem/rust-cache/releases)
- [Changelog](https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md)
- [Commits](Swatinem/rust-cache@6720f05...359a70e)

---
updated-dependencies:
- dependency-name: Swatinem/rust-cache
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* build(deps): bump messense/maturin-action from 1.34.0 to 1.35.0

Bumps [messense/maturin-action](https://github.com/messense/maturin-action) from 1.34.0 to 1.35.0.
- [Release notes](https://github.com/messense/maturin-action/releases)
- [Commits](PyO3/maturin-action@7208c29...ac0a1ec)

---
updated-dependencies:
- dependency-name: messense/maturin-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* build(deps): bump actions/setup-python from 4.2.0 to 4.5.0

Bumps [actions/setup-python](https://github.com/actions/setup-python) from 4.2.0 to 4.5.0.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v4.2.0...d27e3f3)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* build(deps): bump step-security/harden-runner from 2.0.0 to 2.1.0

Bumps [step-security/harden-runner](https://github.com/step-security/harden-runner) from 2.0.0 to 2.1.0.
- [Release notes](https://github.com/step-security/harden-runner/releases)
- [Commits](step-security/harden-runner@ebacdc2...18bf8ad)

---
updated-dependencies:
- dependency-name: step-security/harden-runner
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* Added float fields to schema with tests (quickwit-oss#36)

* Added float fields to schema with tests

* Fixed typo

* [StepSecurity] Apply security best practices (quickwit-oss#38)

Signed-off-by: StepSecurity Bot <[email protected]>

Signed-off-by: StepSecurity Bot <[email protected]>

* Harden CI (quickwit-oss#39)

* Harden the egress and add dependabot cargo

* delete file

* harden codeql

* build(deps): bump messense/maturin-action from 1.35.0 to 1.35.2 (quickwit-oss#40)

Bumps [messense/maturin-action](https://github.com/messense/maturin-action) from 1.35.0 to 1.35.2.
- [Release notes](https://github.com/messense/maturin-action/releases)
- [Commits](PyO3/maturin-action@ac0a1ec...7559b9d)

---
updated-dependencies:
- dependency-name: messense/maturin-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump github/codeql-action from 2.1.39 to 2.2.1 (quickwit-oss#41)

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.1.39 to 2.2.1.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@a34ca99...3ebbd71)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* feat: create custom kapiche tokenizer (quickwit-oss#42)

* Added a custom Kapiche tokenizer that is inline with the current Tokenizer in Kapiche.

* Lint fixes

* build(deps): bump messense/maturin-action from 1.35.2 to 1.36.0 (quickwit-oss#47)

Bumps [messense/maturin-action](https://github.com/messense/maturin-action) from 1.35.2 to 1.36.0.
- [Release notes](https://github.com/messense/maturin-action/releases)
- [Commits](PyO3/maturin-action@7559b9d...7c85798)

---
updated-dependencies:
- dependency-name: messense/maturin-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Tantivy_0.19.1_upgrade (quickwit-oss#48)

* Adding __init__.py file to the tantivy folder to make maturin happy

Add Cargo.lock to the repo

Set the git-fetch-with-cli cargo flag so that we can override fetch settings

Renaming .cargo/config to .cargo/config.toml

Adding github-quiq-sh cargo registry

Point dependencies at our github-quiq-sh registry

Trying to resolve this build issue, pointing pyo3-build-config at our github-quiq-sh registry

SER-21487: Enable support for all standard Tantivy languages plus Chinese + Japanese in tantivy-py

SER-21487: Use uname rather than UNAME in the Makefile

SER-21487: Fix document date handling

SER-23013: Upgrade Tantivy and other dependencies

* Upgrade to Tantivy 0.19.1

* Added changes and fixed issues

* Formatting fixes

---------

Co-authored-by: Phill Mell-Davies <[email protected]>

* build(deps): bump pyo3-build-config from 0.18.0 to 0.18.1 (quickwit-oss#49)

Bumps [pyo3-build-config](https://github.com/pyo3/pyo3) from 0.18.0 to 0.18.1.
- [Release notes](https://github.com/pyo3/pyo3/releases)
- [Changelog](https://github.com/PyO3/pyo3/blob/main/CHANGELOG.md)
- [Commits](PyO3/pyo3@v0.18.0...v0.18.1)

---
updated-dependencies:
- dependency-name: pyo3-build-config
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump tantivy from 0.19.1 to 0.19.2 (quickwit-oss#53)

Bumps [tantivy](https://github.com/quickwit-oss/tantivy) from 0.19.1 to 0.19.2.
- [Release notes](https://github.com/quickwit-oss/tantivy/releases)
- [Changelog](https://github.com/quickwit-oss/tantivy/blob/main/CHANGELOG.md)
- [Commits](quickwit-oss/tantivy@0.19.1...0.19.2)

---
updated-dependencies:
- dependency-name: tantivy
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump github/codeql-action from 2.2.1 to 2.2.4 (quickwit-oss#50)

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.2.1 to 2.2.4.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@3ebbd71...17573ee)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump serde_json from 1.0.92 to 1.0.93 (quickwit-oss#51)

Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.92 to 1.0.93.
- [Release notes](https://github.com/serde-rs/json/releases)
- [Commits](serde-rs/json@v1.0.92...v1.0.93)

---
updated-dependencies:
- dependency-name: serde_json
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update Cargo.lock

* Update Makefile

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: StepSecurity Bot <[email protected]>
Co-authored-by: Phill Mell-Davies <[email protected]>
Co-authored-by: Cam Parry <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Cameron <[email protected]>
Co-authored-by: StepSecurity Bot <[email protected]>
Co-authored-by: Phill Mell-Davies <[email protected]>
  • Loading branch information
7 people authored Feb 15, 2023
1 parent 36edf27 commit 3e9d00c
Show file tree
Hide file tree
Showing 11 changed files with 168 additions and 101 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
components: rustfmt

- name: Check Formatting
run: rustfmt --check src/*.rs
run: cargo fmt --check

Test:
strategy:
Expand Down
45 changes: 23 additions & 22 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "tantivy"
version = "0.19.1"
version = "0.19.2"
readme = "README.md"
authors = ["Damir Jelić <[email protected]>"]
edition = "2018"
Expand All @@ -22,4 +22,4 @@ serde_json = { version = "1.0.93" }

[dependencies.pyo3]
version = "0.18.0"
features = ["extension-module"]
features = ["extension-module"]
7 changes: 5 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,18 @@ endif

source_files := $(wildcard src/*.rs)

all: build
all: format lint build test

PHONY: test format

lint:
cargo clippy

test: tantivy/tantivy.$(EXT)
python3 -m pytest

format:
rustfmt src/*.rs
cargo fmt

build:
maturin build --interpreter python3.7 python3.8 python3.9 python3.10 python3.11
Expand Down
2 changes: 1 addition & 1 deletion rustfmt.toml
Original file line number Diff line number Diff line change
@@ -1 +1 @@
max_width = 80
max_width = 80
55 changes: 29 additions & 26 deletions src/document.rs
Original file line number Diff line number Diff line change
Expand Up @@ -56,23 +56,26 @@ fn value_to_py(py: Python, value: &Value) -> PyResult<PyObject> {
// TODO implement me
unimplemented!();
}
Value::Date(d) => PyDateTime::new(
py,
d.into_utc().year(),
d.into_utc().month() as u8,
d.into_utc().day() as u8,
d.into_utc().hour() as u8,
d.into_utc().minute() as u8,
d.into_utc().second() as u8,
d.into_utc().microsecond() as u32,
None,
)?
.into_py(py),
Value::Date(d) => {
let utc = d.into_utc();
PyDateTime::new(
py,
utc.year(),
utc.month() as u8,
utc.day(),
utc.hour(),
utc.minute(),
utc.second(),
utc.microsecond(),
None,
)?
.into_py(py)
}
Value::Facet(f) => Facet { inner: f.clone() }.into_py(py),
Value::JsonObject(json_object) => {
let inner: HashMap<_, _> = json_object
.iter()
.map(|(k, v)| (k, value_to_object(&v, py)))
.map(|(k, v)| (k, value_to_object(v, py)))
.collect();
inner.to_object(py)
}
Expand All @@ -84,11 +87,11 @@ fn value_to_py(py: Python, value: &Value) -> PyResult<PyObject> {
fn value_to_string(value: &Value) -> String {
match value {
Value::Str(text) => text.clone(),
Value::U64(num) => format!("{}", num),
Value::I64(num) => format!("{}", num),
Value::F64(num) => format!("{}", num),
Value::Bytes(bytes) => format!("{:?}", bytes),
Value::Date(d) => format!("{:?}", d),
Value::U64(num) => format!("{num}"),
Value::I64(num) => format!("{num}"),
Value::F64(num) => format!("{num}"),
Value::Bytes(bytes) => format!("{bytes:?}"),
Value::Date(d) => format!("{d:?}"),
Value::Facet(facet) => facet.to_string(),
Value::PreTokStr(_pretok) => {
// TODO implement me
Expand All @@ -97,7 +100,7 @@ fn value_to_string(value: &Value) -> String {
Value::JsonObject(json_object) => {
serde_json::to_string(&json_object).unwrap()
}
Value::Bool(b) => format!("{}", b),
Value::Bool(b) => format!("{b}"),
Value::IpAddr(i) => format!("{}", *i),
}
}
Expand Down Expand Up @@ -145,10 +148,10 @@ impl fmt::Debug for Document {
.chars()
.take(10)
.collect();
format!("{}=[{}]", field_name, values_str)
format!("{field_name}=[{values_str}]")
})
.join(",");
write!(f, "Document({})", doc_str)
write!(f, "Document({doc_str})")
}
}

Expand Down Expand Up @@ -189,9 +192,9 @@ pub(crate) fn extract_value(any: &PyAny) -> PyResult<Value> {
)));
}
if let Ok(facet) = any.extract::<Facet>() {
return Ok(Value::Facet(facet.inner.clone()));
return Ok(Value::Facet(facet.inner));
}
Err(to_pyerr(format!("Value unsupported {:?}", any)))
Err(to_pyerr(format!("Value unsupported {any:?}")))
}

fn extract_value_single_or_list(any: &PyAny) -> PyResult<Vec<Value>> {
Expand Down Expand Up @@ -404,13 +407,13 @@ impl Document {
}

fn __getitem__(&self, field_name: &str) -> PyResult<Vec<PyObject>> {
return Python::with_gil(|py| -> PyResult<Vec<PyObject>> {
Python::with_gil(|py| -> PyResult<Vec<PyObject>> {
self.get_all(py, field_name)
});
})
}

fn __repr__(&self) -> PyResult<String> {
Ok(format!("{:?}", self))
Ok(format!("{self:?}"))
}
}

Expand Down
52 changes: 24 additions & 28 deletions src/index.rs
Original file line number Diff line number Diff line change
Expand Up @@ -133,20 +133,17 @@ impl IndexWriter {
Value::Facet(facet) => Term::from_facet(field, &facet),
Value::Bytes(_) => {
return Err(exceptions::PyValueError::new_err(format!(
"Field `{}` is bytes type not deletable.",
field_name
"Field `{field_name}` is bytes type not deletable."
)))
}
Value::PreTokStr(_pretok) => {
return Err(exceptions::PyValueError::new_err(format!(
"Field `{}` is pretokenized. This is not authorized for delete.",
field_name
"Field `{field_name}` is pretokenized. This is not authorized for delete."
)))
}
Value::JsonObject(_) => {
return Err(exceptions::PyValueError::new_err(format!(
"Field `{}` is json object type not deletable.",
field_name
"Field `{field_name}` is json object type not deletable."
)))
},
Value::Bool(b) => Term::from_field_bool(field, b),
Expand Down Expand Up @@ -365,16 +362,14 @@ impl Index {
if !field_entry.is_indexed() {
return Err(exceptions::PyValueError::new_err(
format!(
"Field `{}` is not set as indexed in the schema.",
default_field_name
"Field `{default_field_name}` is not set as indexed in the schema."
),
));
}
default_fields.push(field);
} else {
return Err(exceptions::PyValueError::new_err(format!(
"Field `{}` is not defined in the schema.",
default_field_name
"Field `{default_field_name}` is not defined in the schema."
)));
}
}
Expand All @@ -395,24 +390,25 @@ impl Index {

impl Index {
fn register_custom_text_analyzers(index: &tv::Index) {
let mut analyzers = HashMap::new();
analyzers.insert("ar_stem", Language::Arabic);
analyzers.insert("da_stem", Language::Danish);
analyzers.insert("nl_stem", Language::Dutch);
analyzers.insert("fi_stem", Language::Finnish);
analyzers.insert("fr_stem", Language::French);
analyzers.insert("de_stem", Language::German);
analyzers.insert("el_stem", Language::Greek);
analyzers.insert("hu_stem", Language::Hungarian);
analyzers.insert("it_stem", Language::Italian);
analyzers.insert("no_stem", Language::Norwegian);
analyzers.insert("pt_stem", Language::Portuguese);
analyzers.insert("ro_stem", Language::Romanian);
analyzers.insert("ru_stem", Language::Russian);
analyzers.insert("es_stem", Language::Spanish);
analyzers.insert("sv_stem", Language::Swedish);
analyzers.insert("ta_stem", Language::Tamil);
analyzers.insert("tr_stem", Language::Turkish);
let analyzers = [
("ar_stem", Language::Arabic),
("da_stem", Language::Danish),
("nl_stem", Language::Dutch),
("fi_stem", Language::Finnish),
("fr_stem", Language::French),
("de_stem", Language::German),
("el_stem", Language::Greek),
("hu_stem", Language::Hungarian),
("it_stem", Language::Italian),
("no_stem", Language::Norwegian),
("pt_stem", Language::Portuguese),
("ro_stem", Language::Romanian),
("ru_stem", Language::Russian),
("es_stem", Language::Spanish),
("sv_stem", Language::Swedish),
("ta_stem", Language::Tamil),
("tr_stem", Language::Turkish),
];

for (name, lang) in &analyzers {
let an = TextAnalyzer::from(SimpleTokenizer)
Expand Down
5 changes: 2 additions & 3 deletions src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
use pyo3::{exceptions, prelude::*};
#[rustfmt::skip]
use ::tantivy as tv;
use pyo3::{exceptions, prelude::*};

mod document;
mod facet;
Expand Down Expand Up @@ -90,8 +90,7 @@ pub(crate) fn get_field(
) -> PyResult<tv::schema::Field> {
let field = schema.get_field(field_name).ok_or_else(|| {
exceptions::PyValueError::new_err(format!(
"Field `{}` is not defined in the schema.",
field_name
"Field `{field_name}` is not defined in the schema."
))
})?;

Expand Down
Loading

0 comments on commit 3e9d00c

Please sign in to comment.