Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic from indexWriter.commit() call #2193

Open
PingXia-at opened this issue Sep 28, 2023 · 15 comments
Open

Panic from indexWriter.commit() call #2193

PingXia-at opened this issue Sep 28, 2023 · 15 comments

Comments

@PingXia-at
Copy link
Contributor

Describe the bug

  • What did you do?
    call indexWriter.commit()
  • What happened?
    The following panic occured
Panic from rust code! range end index 4 out of range for slice of length 0
Panic in thread from file .../src/schema/term.rs line 246"

which ultimately resulted in An error occurred in a thread: 'Any { .. }' here

  • What was expected?
    Expect no panic

Which version of tantivy are you using?
v0.20.2 and this cherry-picked commit

To Reproduce
I don't have a minimal code to produce.

It only happens for certain customers, not all customers.

I'm adding a stack trace to the panic. Will share once I have it

@PSeitz
Copy link
Contributor

PSeitz commented Oct 12, 2023

Can you provide a stack trace or ideally something to reproduce?

@JaydenNavarro-at
Copy link

Unfortunately we do not have a reliable repro, but we've captured this stack trace in production:

Panic from rust code! range end index 4 out of range for slice of length 3
Panic in thread from file cargo/git/checkouts/tantivy-2d24c0f6ab500020/880f80f/src/schema/term.rs line 246
Panic stack trace
   0: search_tantivy::main::{{closure}}
             at ./server_shared/rust/search_tantivy/src/lib.rs:40:25
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/alloc/src/boxed.rs:1999:9
   2: std::panicking::rust_panic_with_hook
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs:709:13
   3: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs:597:13
   4: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/sys_common/backtrace.rs:151:18
   5: rust_begin_unwind
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs:593:5
   6: core::panicking::panic_fmt
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/panicking.rs:67:14
   7: core::slice::index::slice_end_index_len_fail_rt
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/slice/index.rs:76:5
   8: core::slice::index::slice_end_index_len_fail
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/slice/index.rs:68:9
   9: <core::ops::range::Range<usize> as core::slice::index::SliceIndex<[T]>>::index
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/slice/index.rs:408:13
  10: <core::ops::range::RangeTo<usize> as core::slice::index::SliceIndex<[T]>>::index
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/slice/index.rs:455:9
  11: core::slice::index::<impl core::ops::index::Index<I> for [T]>::index
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/slice/index.rs:18:15
  12: tantivy::schema::term::Term<B>::field
             at /var/h/deploy/airtable/cargo/git/checkouts/tantivy-2d24c0f6ab500020/880f80f/src/schema/term.rs:246:41
  13: tantivy::postings::postings_writer::make_field_partition::{{closure}}
             at /var/h/deploy/airtable/cargo/git/checkouts/tantivy-2d24c0f6ab500020/880f80f/src/postings/postings_writer.rs:22:26
  14: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/ops/function.rs:305:13
  15: core::option::Option<T>::map
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/option.rs:1075:29
  16: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::next
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/iter/adapters/map.rs:103:26
  17: <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::next
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/iter/adapters/enumerate.rs:47:17
  18: tantivy::postings::postings_writer::make_field_partition
             at /var/h/deploy/airtable/cargo/git/checkouts/tantivy-2d24c0f6ab500020/880f80f/src/postings/postings_writer.rs:27:28
  19: tantivy::postings::postings_writer::serialize_postings
             at /var/h/deploy/airtable/cargo/git/checkouts/tantivy-2d24c0f6ab500020/880f80f/src/postings/postings_writer.rs:60:25
  20: tantivy::indexer::segment_writer::remap_and_write
             at /var/h/deploy/airtable/cargo/git/checkouts/tantivy-2d24c0f6ab500020/880f80f/src/indexer/segment_writer.rs:395:5
  21: tantivy::indexer::segment_writer::SegmentWriter::finalize
             at /var/h/deploy/airtable/cargo/git/checkouts/tantivy-2d24c0f6ab500020/880f80f/src/indexer/segment_writer.rs:141:9
  22: tantivy::indexer::index_writer::index_documents
             at /var/h/deploy/airtable/cargo/git/checkouts/tantivy-2d24c0f6ab500020/880f80f/src/indexer/index_writer.rs:198:38
  23: tantivy::indexer::index_writer::IndexWriter::add_indexing_worker::{{closure}}
             at /var/h/deploy/airtable/cargo/git/checkouts/tantivy-2d24c0f6ab500020/880f80f/src/indexer/index_writer.rs:427:21
  24: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/sys_common/backtrace.rs:135:18
  25: std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/thread/mod.rs:529:17
  26: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/panic/unwind_safe.rs:271:9
  27: std::panicking::try::do_call
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs:500:40
  28: std::panicking::try
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs:464:19
  29: std::panic::catch_unwind
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panic.rs:142:14
  30: std::thread::Builder::spawn_unchecked_::{{closure}}
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/thread/mod.rs:528:30
  31: core::ops::function::FnOnce::call_once{{vtable.shim}}
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/ops/function.rs:250:5
  32: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/alloc/src/boxed.rs:1985:9
  33: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/alloc/src/boxed.rs:1985:9
  34: std::sys::unix::thread::Thread::new::thread_start
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/sys/unix/thread.rs:108:17
  35: start_thread
  36: clone

@PSeitz
Copy link
Contributor

PSeitz commented Jan 9, 2024

Can you share your schema, tokenizer and hardware info?

@JaydenNavarro-at
Copy link

Our schema just consists of two string fields and one JSON field. The JSON field is constructed as follows:

let text_field_indexing = TextFieldIndexing::default()
    .set_tokenizer(DEFAULT_TOKENIZER_NAME)
    .set_index_option(IndexRecordOption::WithFreqsAndPositions);
let text_options = TextOptions::default().set_indexing_options(text_field_indexing);
let cell_values_json_field =
    schema_builder.add_json_field(CELL_VALUES_BY_COLUMN_ID_FIELD, text_options);

...

let analyzer = TextAnalyzer::builder(SimpleTokenizer::default())
        .filter(LowerCaser)
        .filter(AsciiFoldingFilter)
        .build();
index.tokenizers().register(DEFAULT_TOKENIZER_NAME, analyzer);

We're running on an x86 EC2 VM. Let me know if there is any specific hardware info that would be useful.

@PSeitz
Copy link
Contributor

PSeitz commented Jan 11, 2024

I'm looking for anything that would suggest you run into some border case, e.g. tokens longer than u16::MAX or very large memory memory_budget_in_bytes

Can you share an example document?

Term is not supposed to be smaller than 5 bytes, so either it's passed incorrectly or it's read incorrectly. Can you apply this patch and see if the error occurs in this line?

diff --git a/src/postings/postings_writer.rs b/src/postings/postings_writer.rs
index d3c26be13..9a5456edf 100644
--- a/src/postings/postings_writer.rs
+++ b/src/postings/postings_writer.rs
@@ -181,7 +181,7 @@ impl<Rec: Recorder> SpecializedPostingsWriter<Rec> {
 impl<Rec: Recorder> PostingsWriter for SpecializedPostingsWriter<Rec> {
     #[inline]
     fn subscribe(&mut self, doc: DocId, position: u32, term: &Term, ctx: &mut IndexingContext) {
-        debug_assert!(term.serialized_term().len() >= 4);
+        assert!(term.serialized_term().len() >= 4);
         self.total_num_tokens += 1;
         let (term_index, arena) = (&mut ctx.term_index, &mut ctx.arena);
         term_index.mutate_or_create(term.serialized_term(), |opt_recorder: Option<Rec>| {

@neilyio
Copy link

neilyio commented Jan 29, 2024

I'm getting this error as well. In my case, I'm passing a tantivy::Document in between processes, serialized with tantivy_common::BinarySerializable.

It seems to be working in trivial unit tests, but I'm seeing the error above with more complex data. I'm trying to narrow down the cause, but would love any pointers on where to look.

@PSeitz
Copy link
Contributor

PSeitz commented Jan 29, 2024

@neilyio Do you have the same stack trace? Can you apply the patch I posted? It would narrow it down if Term is constructed incorrectly or read incorrectly. Neither should not happen and I don't have clear pointers currently.

Can you share your schema?
Can you share anything to reproduce?

@neilyio
Copy link

neilyio commented Jan 29, 2024

@PSeitz I have a minimal reproduction for you.

mod tests {
    use std::io::Cursor;
    use tantivy::{schema::Schema, Document, Index, IndexSettings, IndexSortByField, Order};
    use tantivy_common::BinarySerializable;

    #[test]
    fn test_writer_commit() {
        let serialized_schema = r#"
      [{"name":"category","type":"text","options":{"indexing":{"record":"position","fieldnorms":true,"tokenizer":"default"},"stored":true,"fast":false}},{"name":"description","type":"text","options":{"indexing":{"record":"position","fieldnorms":true,"tokenizer":"default"},"stored":true,"fast":false}},{"name":"rating","type":"i64","options":{"indexed":true,"fieldnorms":false,"fast":true,"stored":true}},{"name":"in_stock","type":"bool","options":{"indexed":true,"fieldnorms":false,"fast":true,"stored":true}},{"name":"metadata","type":"json_object","options":{"stored":true,"indexing":{"record":"position","fieldnorms":true,"tokenizer":"default"},"fast":false,"expand_dots_enabled":true}},{"name":"id","type":"i64","options":{"indexed":true,"fieldnorms":true,"fast":true,"stored":true}},{"name":"ctid","type":"u64","options":{"indexed":true,"fieldnorms":true,"fast":true,"stored":true}}]
    "#;

        let schema: Schema = serde_json::from_str(&serialized_schema).unwrap();
        let settings = IndexSettings {
            sort_by_field: Some(IndexSortByField {
                field: "id".into(),
                order: Order::Asc,
            }),
            ..Default::default()
        };

        let temp_dir = tempfile::Builder::new().tempdir().unwrap();

        let index = Index::builder()
            .schema(schema)
            .settings(settings)
            .create_in_dir(&temp_dir.path())
            .unwrap();

        let mut writer = index.writer(500_000_000).unwrap();

        // This is a string representation of the document bytes that I am sending through IPC.
        let document_bytes: Vec<u8> = serde_json::from_str("[135,5,0,0,0,2,1,0,0,0,0,0,0,0,1,0,0,0,0,152,69,114,103,111,110,111,109,105,99,32,109,101,116,97,108,32,107,101,121,98,111,97,114,100,2,0,0,0,2,4,0,0,0,0,0,0,0,0,0,0,0,0,139,69,108,101,99,116,114,111,110,105,99,115,3,0,0,0,9,1,4,0,0,0,8,123,34,99,111,108,111,114,34,58,34,83,105,108,118,101,114,34,44,34,108,111,99,97,116,105,111,110,34,58,34,85,110,105,116,101,100,32,83,116,97,116,101,115,34,125,5,0,0,0,1,1,0,0,0,0,0,0,0]").unwrap();

        let document_from_bytes: Document =
            BinarySerializable::deserialize(&mut Cursor::new(document_bytes)).unwrap();

        // This is a json representation of the above that I'm including here for readability.
        // This was generated with `println!(serde_json::to_string(document_from_bytes).unwrap())`.
        let document_json = r#"
            {"field_values":[{"field":5,"value":1},{"field":1,"value":"Ergonomic metal keyboard"},{"field":2,"value":4},{"field":0,"value":"Electronics"},{"field":3,"value":true},{"field":4,"value":{"color":"Silver","location":"United States"}},{"field":5,"value":1}]}
        "#;

        // To prove that the document_json and the document_from_bytes represent the same Document,
        // we assert their equality here. This is expected to pass.
        assert_eq!(
            document_json.trim(),
            serde_json::to_string(&document_from_bytes).unwrap().trim()
        );

        writer.add_document(document_from_bytes).unwrap();

        // We expect an error here on commit: ErrorInThread("Any { .. }")
        writer.commit().unwrap();
    }
}

@neilyio
Copy link

neilyio commented Jan 29, 2024

I'd like to note that my Document here contains a JsonObject value, which has given me some trouble with serialization. That's what pushed me to use BinarySerialize in the first place.

Here's a println!("{document_from_bytes:?}") if it's helpful:

Document {
  field_values: [
    FieldValue { field: Field(5), value: I64(1) },
    FieldValue { field: Field(1), value: Str("Ergonomic metal keyboard") },
    FieldValue { field: Field(2), value: I64(4) },
    FieldValue { field: Field(0), value: Str("Electronics") },
    FieldValue { field: Field(3), value: Bool(true) },
    FieldValue { field: Field(4), value: JsonObject({"color": String("Silver"), "location": String("United States")}) },
    FieldValue { field: Field(5), value: U64(1) }
  ]
}

@neilyio
Copy link

neilyio commented Jan 29, 2024

Also, for versions, I have :

tantivy = "0.21.1"
tantivy-common = "0.6.0"

@PSeitz
Copy link
Contributor

PSeitz commented Jan 30, 2024

Do you mean this error?

thread 'thrd-tantivy-index2' panicked at columnar/src/columnar/writer/column_writers.rs:192:17:
assertion `left == right` failed: Input type forbidden. This column has been forced to type U64, received I64(1)
  left: I64
 right: U64

@neilyio
Copy link

neilyio commented Jan 30, 2024

This error:

called `Result::unwrap()` on an `Err` value: ErrorInThread("Any { .. }")

Running the minimal example I posted above consistently produces that.

@PSeitz
Copy link
Contributor

PSeitz commented Jan 30, 2024

Can you provide a repo? I get


---- tests::test_writer_commit stdout ----
thread 'thrd-tantivy-index0' panicked at /home/pascal/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tantivy-columnar-0.2.0/src/columnar/writer/column_writers.rs:192:17:
assertion `left == right` failed: Input type forbidden. This column has been forced to type I64, received U64(1)
  left: U64
 right: I64
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'tests::test_writer_commit' panicked at src/main.rs:57:25:
called `Result::unwrap()` on an `Err` value: ErrorInThread("Any { .. }")


failures:
    tests::test_writer_commit

@JaydenNavarro-at @PingXia-at
Do you also use index sort?

@fulmicoton
Copy link
Collaborator

@neilyio I am getting the same error as @PSeitz here.

Your document contains a u64 where it should have been a i64. Commit returns an explicit error that tells you the problem.
I do not see any issue.

@neilyio
Copy link

neilyio commented Jan 31, 2024

Yes, you're both right, I'm sorry for the distraction. I had some colleagues test the same code and they're seeing the same error as you. It's an issue with my serialization, and my specific test setup seems to be suppressing all but the An error occurred in a thread: 'Any { .. }' message. Thank you both for investigating.

PSeitz added a commit that referenced this issue Apr 5, 2024
@PSeitz PSeitz closed this as completed in b644d78 Apr 5, 2024
@PSeitz PSeitz reopened this Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants