Releases: lancedb/lance
v0.4.1 Support Append in Vector Search
The vector search in Lance now supports live updates. Previously, when you added new vectors to the dataset, you would be required to rebuild the index. Now, the index is "inherited" and the vector search results are the combination of ANN search on the indexed data and KNN on the new Appended data. So there's a small latency increase and the recall should be the same or better.
This provides a smooth performance curve until you have inserted enough new data that re-indexing is warranted.
What's Changed
- Adding secret to publish task by @gsilvestrin in #742
- [Rust] make distance function to take slice instead of Float32Array by @eddyxu in #748
- Vector search should support appending new rows by @changhiskhan in #593
- windows lapack support by @gsilvestrin in #743
- Fix LanceDataset.to_batches by @changhiskhan in #751
Full Changelog: v0.4.0...v0.4.1
v0.4.0 Windows support
A warm welcome to @gsajko ! Thanks for making our tutorial notebook easier to use and understand!
Note: OPQ is disabled in windows for the vector index. This will be addressed once LAPACK support is added.
What's Changed
- small fixes by @gsajko in #725
- Windows support by @gsilvestrin in #724
New Contributors
Full Changelog: v0.3.19...v0.4.0
v0.3.19 Bug fix for filter predicates on large-utf8 type
Also fix publishing to crates.io
What's Changed
- Make contract clear for KNN nodes by @eddyxu in #729
- Refactor Scan I/O plan by @eddyxu in #731
- [Rust] Use folked sqlparser to unblock rust crate release by @eddyxu in #732
- [Rust] Fix filter on large UTF8 columns by @eddyxu in #733
Full Changelog: v0.3.18...v0.3.19
v0.3.18 Bug fix release for binary offsets
Fix for incorrect offset for string/variable list columns as reported in #720 (comment)
Thanks @lucazanna for the feedback!
What's Changed
- Train OPQ and write rotation matrix to index file by @eddyxu in #713
- removing warnings by @gsilvestrin in #721
- [Bug] Fix IVF merge sort when refine factor is presented. by @eddyxu in #722
- Add input / output schema contract to Global Take by @eddyxu in #728
- Fix offsets for Binary/Lists/LargeLists by @gsilvestrin in #727
Full Changelog: v0.3.17...v0.3.18
v0.3.17 Support for nested dict columns
A warm welcome to @haoxins , a new contributor who has helped improve Lance documentation.
This release adds support for list-of-dict columns (thanks @lucazanna for reporting the bug in #715).
Also included in this release are various vector index improvements for scalability and more progress towards OPQ implementation.
What's Changed
- docs: fix the links by @haoxins in #701
- repair macos build for duckdb extension by @changhiskhan in #705
- filter evaluation with flat search by @changhiskhan in #704
- fix flaky test by @changhiskhan in #706
- [Bug] Fix transpose in MatrixView.data() by @eddyxu in #711
- Refactored variable length encoders by @gsilvestrin in #710
- add notebook for q&a bot by @changhiskhan in #707
- Allow iteratively train PQ by @eddyxu in #712
- Use relative eq and fix a compiling warning by @eddyxu in #714
- docs: fix the mod path by @haoxins in #718
- Composable vector search pipeline by @eddyxu in #716
- Fix CI failure by increasing epsilon for test_train_pq_iteratively by @eddyxu in #719
- Implement support for list of Dictionaries by @gsilvestrin in #664
New Contributors
Full Changelog: v0.3.16...v0.3.17
v0.3.16 Filte pushdown improvements
Welcome @wangfenjin to lance contributors. Thanks for submitting a bug fix for the Lance DuckDB extensions 🔥
This release contains 2 workarounds for arrow limitations:
-
Lance datasets now support
<field> LIKE '%'
and<field> IN (<values>)
filters to be passed in as string. Generic SQL syntax supported by datafusion is now accepted. This is a break from standard pyarrow Dataset behavior which only accepts arrow compute Expression, which is not present in rust and also does not support introspection in python for developers to build custom adapter. -
When concatenating arrow dictionary arrays, the dict values are duplicated. There is currently no concrete plans to change this behavior in Arrow. Instead, we fix that at write time in Lance.
What's Changed
- Changed encoders to handle multiple Arrays by @gsilvestrin in #681
- Train kmeans iteratively by @eddyxu in #688
- Changed writers to handle multiple Arrays by @gsilvestrin in #691
- Streaming PQ by @eddyxu in #689
- [Bug] PQ training generates empty centroids by @eddyxu in #693
- Allow append mode even if dataset doesn't already exist by @ananis25 in #690
- Support "LIKE" and "IN" in filters by @eddyxu in #696
- fix typo by @wangfenjin in #697
- Improve indexing performance by @eddyxu in #699
- Compute PQ distortion. by @eddyxu in #695
- Bugfix for BinaryEncoder positions by @gsilvestrin in #698
New Contributors
- @wangfenjin made their first contribution in #697
Full Changelog: v0.3.15...v0.3.16
v0.3.15 Bug fix for combining vector search and filter predicate
Thanks to @cemoody for the bug report!
What's Changed
- Missing column when both
nearest
andfilter
are applied by @changhiskhan in #686
Full Changelog: v0.3.14...v0.3.15
v0.3.14 Timestamp support
This is a patch release that adds support for Arrow Timestamp type. Thanks @kesavkolla for the bug report!
Thanks to @Renkai we also an optimized Take for Boolean arrays.
What's Changed
- OPQ rotation matrix training by @eddyxu in #669
- Optimize boolean by @Renkai in #676
- Support timestamp type by @eddyxu in #684
Full Changelog: v0.3.13...v0.3.14
v0.3.13 Support fast Take for variable length list
What's Changed
- update arrow-rs version in duckdb-ext for lance as well by @changhiskhan in #670
- Support take operation on List by @eddyxu in #671
Full Changelog: v0.3.12...v0.3.13
v0.3.12 Upgrade arrow-rs and bug fixes
- Upgraded arrow-rs dependency to 33.0 (Waiting on datafusion for 34.0 upgrade).
- Nested Dictionary fields are now parsed and written correctly.
- More progress towards OPQ implementation.
What's Changed
- Matrix mul and transpose by @eddyxu in #661
- Recursively set dictionaries in struct fields by @gsilvestrin in #662
- Upgrading arrow version to 33.0 by @gsilvestrin in #665
- [Rust] sampling over matrix. by @eddyxu in #666
- Sorting dataset versions by @gsilvestrin in #668
Full Changelog: v0.3.11...v0.3.12