-
-
Notifications
You must be signed in to change notification settings - Fork 699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Panic from indexWriter.commit() call #2193
Comments
Can you provide a stack trace or ideally something to reproduce? |
Unfortunately we do not have a reliable repro, but we've captured this stack trace in production:
|
Can you share your schema, tokenizer and hardware info? |
Our schema just consists of two string fields and one JSON field. The JSON field is constructed as follows:
We're running on an x86 EC2 VM. Let me know if there is any specific hardware info that would be useful. |
I'm looking for anything that would suggest you run into some border case, e.g. tokens longer than Can you share an example document? Term is not supposed to be smaller than 5 bytes, so either it's passed incorrectly or it's read incorrectly. Can you apply this patch and see if the error occurs in this line?
|
I'm getting this error as well. In my case, I'm passing a It seems to be working in trivial unit tests, but I'm seeing the error above with more complex data. I'm trying to narrow down the cause, but would love any pointers on where to look. |
@neilyio Do you have the same stack trace? Can you apply the patch I posted? It would narrow it down if Term is constructed incorrectly or read incorrectly. Neither should not happen and I don't have clear pointers currently. Can you share your schema? |
@PSeitz I have a minimal reproduction for you. mod tests {
use std::io::Cursor;
use tantivy::{schema::Schema, Document, Index, IndexSettings, IndexSortByField, Order};
use tantivy_common::BinarySerializable;
#[test]
fn test_writer_commit() {
let serialized_schema = r#"
[{"name":"category","type":"text","options":{"indexing":{"record":"position","fieldnorms":true,"tokenizer":"default"},"stored":true,"fast":false}},{"name":"description","type":"text","options":{"indexing":{"record":"position","fieldnorms":true,"tokenizer":"default"},"stored":true,"fast":false}},{"name":"rating","type":"i64","options":{"indexed":true,"fieldnorms":false,"fast":true,"stored":true}},{"name":"in_stock","type":"bool","options":{"indexed":true,"fieldnorms":false,"fast":true,"stored":true}},{"name":"metadata","type":"json_object","options":{"stored":true,"indexing":{"record":"position","fieldnorms":true,"tokenizer":"default"},"fast":false,"expand_dots_enabled":true}},{"name":"id","type":"i64","options":{"indexed":true,"fieldnorms":true,"fast":true,"stored":true}},{"name":"ctid","type":"u64","options":{"indexed":true,"fieldnorms":true,"fast":true,"stored":true}}]
"#;
let schema: Schema = serde_json::from_str(&serialized_schema).unwrap();
let settings = IndexSettings {
sort_by_field: Some(IndexSortByField {
field: "id".into(),
order: Order::Asc,
}),
..Default::default()
};
let temp_dir = tempfile::Builder::new().tempdir().unwrap();
let index = Index::builder()
.schema(schema)
.settings(settings)
.create_in_dir(&temp_dir.path())
.unwrap();
let mut writer = index.writer(500_000_000).unwrap();
// This is a string representation of the document bytes that I am sending through IPC.
let document_bytes: Vec<u8> = serde_json::from_str("[135,5,0,0,0,2,1,0,0,0,0,0,0,0,1,0,0,0,0,152,69,114,103,111,110,111,109,105,99,32,109,101,116,97,108,32,107,101,121,98,111,97,114,100,2,0,0,0,2,4,0,0,0,0,0,0,0,0,0,0,0,0,139,69,108,101,99,116,114,111,110,105,99,115,3,0,0,0,9,1,4,0,0,0,8,123,34,99,111,108,111,114,34,58,34,83,105,108,118,101,114,34,44,34,108,111,99,97,116,105,111,110,34,58,34,85,110,105,116,101,100,32,83,116,97,116,101,115,34,125,5,0,0,0,1,1,0,0,0,0,0,0,0]").unwrap();
let document_from_bytes: Document =
BinarySerializable::deserialize(&mut Cursor::new(document_bytes)).unwrap();
// This is a json representation of the above that I'm including here for readability.
// This was generated with `println!(serde_json::to_string(document_from_bytes).unwrap())`.
let document_json = r#"
{"field_values":[{"field":5,"value":1},{"field":1,"value":"Ergonomic metal keyboard"},{"field":2,"value":4},{"field":0,"value":"Electronics"},{"field":3,"value":true},{"field":4,"value":{"color":"Silver","location":"United States"}},{"field":5,"value":1}]}
"#;
// To prove that the document_json and the document_from_bytes represent the same Document,
// we assert their equality here. This is expected to pass.
assert_eq!(
document_json.trim(),
serde_json::to_string(&document_from_bytes).unwrap().trim()
);
writer.add_document(document_from_bytes).unwrap();
// We expect an error here on commit: ErrorInThread("Any { .. }")
writer.commit().unwrap();
}
} |
I'd like to note that my Here's a Document {
field_values: [
FieldValue { field: Field(5), value: I64(1) },
FieldValue { field: Field(1), value: Str("Ergonomic metal keyboard") },
FieldValue { field: Field(2), value: I64(4) },
FieldValue { field: Field(0), value: Str("Electronics") },
FieldValue { field: Field(3), value: Bool(true) },
FieldValue { field: Field(4), value: JsonObject({"color": String("Silver"), "location": String("United States")}) },
FieldValue { field: Field(5), value: U64(1) }
]
} |
Also, for versions, I have : tantivy = "0.21.1"
tantivy-common = "0.6.0" |
Do you mean this error?
|
This error:
Running the minimal example I posted above consistently produces that. |
Can you provide a repo? I get
@JaydenNavarro-at @PingXia-at |
Yes, you're both right, I'm sorry for the distraction. I had some colleagues test the same code and they're seeing the same error as you. It's an issue with my serialization, and my specific test setup seems to be suppressing all but the |
Describe the bug
call
indexWriter.commit()
The following panic occured
which ultimately resulted in
An error occurred in a thread: 'Any { .. }'
hereExpect no panic
Which version of tantivy are you using?
v0.20.2 and this cherry-picked commit
To Reproduce
I don't have a minimal code to produce.
It only happens for certain customers, not all customers.
I'm adding a stack trace to the panic. Will share once I have it
The text was updated successfully, but these errors were encountered: