You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Embedders in the LLM xpack now have method get_embedding_dimension that returns number of dimension used by the chosen embedder.
pathway.stdlib.indexing.nearest_neighbors, with implementations of pathway.stdlib.indexing.data_index.InnerIndex based on k-NN via LSH (implemented in Pathway), and k-NN provided by USearch library.
pathway.stdlib.indexing.vector_document_index, with a few predefined instances of pathway.stdlib.indexing.data_index.DataIndex.
pathway.stdlib.indexing.bm25, with implementations of pathway.stdlib.indexing.data_index.InnerIndex based on BM25 index provided by Tantivy.
pathway.stdlib.indexing.full_text_document_index, with a predefined instance of pathway.stdlib.indexing.data_index.DataIndex.
Introduced the reranker module under llm.xpacks. Includes few re-ranking strategies and utility functions for RAG applications.
Changed
BREAKING: windowby generates IDs of produced rows differently than in the previous version.
BREAKING: pw.io.csv.write prints printable non-ascii characters as regular text, not \u{xxxx}.
BREAKING: Connector methods pw.io.elasticsearch.read, pw.io.debezium.read, pw.io.fs.read, pw.io.jsonlines.read, pw.io.kafka.read, pw.io.python.read, pw.io.redpanda.read, pw.io.s3.read now check the type of the input data. Previously it was not checked if the provided format was "json"/"jsonlines". If the data is inconsistent with the provided schema, the row is skipped and the error message is emitted.
BREAKING: query and query_as_of_now methods of pathway.stdlib.indexing.data_index.DataIndex now return pathway.JoinResult, to allow resolving column name conflicts (between columns in the table with queries and table with index data).
BREAKING: DataIndex methods query and query_as_of_now now return score in a column named _pw_index_reply_score (defined as _SCORE variable in pathway.stdlib.indexing.colnames.py).
Removed
BREAKING: pathway.stdlib.indexing.data_index.VectorDocumentIndex class, some predefined instances are now meant to be obtained via methods provided in pathway.stdlib.indexing.vector_document_index.
BREAKING: with_distances parameter of query and query_as_of_now methods in pathway.stdlib.indexing.data_index.DataIndex. Instead of 'distance', we now operate with a more general term 'score' (higher = better). For distance based indices score is usually defined as negative distance. Score is now always included in the answer, as long as underlying index returns something that indicates quality of a match.