Skip to content

Releases: lancedb/lance

v0.1.5 Pandas Extension Type, Jupyter Notebook and Document Improvements

28 Oct 16:52
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.1.4...v0.1.5

v0.1.4: Var-length binary decoder performance improvements, Open Discord server for community.

16 Oct 17:18
Compare
Choose a tag to compare

What's Changed

  • CLI to inspect lance dataset by @eddyxu in #231
  • Generate primary key for Oxford Pet dataset by @eddyxu in #233
  • Fix datagen test by @eddyxu in #234
  • Add discord link and fix typo in README by @eddyxu in #236
  • Improve VarBinaryDecoder::Take performance by accumulating small batches by @eddyxu in #239

Full Changelog: v0.1.3...v0.1.4

Document improvements and bug fixes

09 Oct 02:34
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.1.2...v0.1.3

v0.1.2

04 Oct 18:27
Compare
Choose a tag to compare
  1. Lance now supports projection for nested column (e.g., "annotations.name")
  2. There's also a fast path for CountRows to get the record count by looking at metadata
  3. Finally, Lance now supports writing optional key-value metadata (pa.Table.schema.metadata)

What's Changed

Full Changelog: v0.1.1...v0.1.2

v0.1.1

29 Sep 05:13
Compare
Choose a tag to compare

Fix up Mac wheel to enable extension types for MacOS

What's Changed

New Contributors

Full Changelog: v0.1.0...v0.1.1

v0.1.0

27 Sep 23:07
Compare
Choose a tag to compare

Highlights

  1. Documentation is now live and a Quickstart Notebook is available
  2. Lance is now integrated with pytorch and supports multiple workers.
  3. Vision-specific extension types like Box2d provides vectorized iou and Image types that make it easy to perform IO and go between bytes, PIL, numpy, and tensors.

What's Changed

  • Setting BatchSize via ScanBuilder by @eddyxu in #135
  • Move Expression based schema project to Schema class by @eddyxu in #137
  • Refactor I/O exec nodes by @eddyxu in #136
  • Simplify RecordBatchReader to use Project.next() by @eddyxu in #139
  • Convert bdd100k dataset in python benchmarks by @eddyxu in #131
  • Fix the condition of Scan advancing batch id by @eddyxu in #143
  • Initial PyTorch Dataset support by @eddyxu in #134
  • Example training code over oxford pet dataset by @eddyxu in #144
  • Test writing fixed size list and fixed size binary via WriteTable by @eddyxu in #151
  • Fix fixed size length calculation by @eddyxu in #152
  • Provide binary to profiling scans by @eddyxu in #149
  • Multi-worker support in Pytorch Dataset by @eddyxu in #147
  • Vision specific extension types by @changhiskhan in #146
  • lance dataset that overrides Dataset.scanner and Dataset.head by @changhiskhan in #158
  • Pickle Image by @changhiskhan in #160
  • Only load manifest once within the dataset and share Manifest amount the readers by @eddyxu in #155
  • Improve ergonomic of the Pytorch dataset and Generate embeddings for oxford pet by @eddyxu in #157
  • Fix PlainEncoder to read empty page by @eddyxu in #164
  • Convert coco annotations from the list of structs to struct of lists by @eddyxu in #166
  • Convert coco bounding box format to [x0,y0,x1,y1] format. by @eddyxu in #169
  • Image Array by @changhiskhan in #168
  • Fix writing and reading extension type by @eddyxu in #172
  • Coco improvements by @changhiskhan in #174
  • Support partitioning and group size control in coco dataset generation. by @eddyxu in #175
  • Extension type improvements to support 3d types by @changhiskhan in #173
  • Support converting PIL from Image in pytorch Dataset by @eddyxu in #176
  • Minor fix for 3d extension types by @changhiskhan in #177
  • MS coco dataset training by @eddyxu in #163
  • Change version import to relative import by @eddyxu in #181
  • [python] Mix of minor improvements by @changhiskhan in #182
  • Automatically build document and publish to Github Pages by @eddyxu in #180
  • [benchmarks] simplify the datagen code and remove partitioning for now by @changhiskhan in #183
  • Fix PlainDecoder handle empty filtered array by @eddyxu in #187
  • [python] minor improvements by @changhiskhan in #190
  • Fix bug that attempt to partitioned columns which does not exist in the file. by @eddyxu in #189
  • Pass filter indices via Limit and Return empty array in GetListArray by @eddyxu in #191
  • Exclude filter columns from projection by @eddyxu in #194
  • action to bump version for new release by @changhiskhan in #199
  • [C++] [BUG] Adjust offset when the batch size is set for reading by @eddyxu in #201
  • GH action to upload wheels and also make reusable yml by @changhiskhan in #200
  • [Python] Test projection in Python Torch Dataset by @eddyxu in #202
  • Fix typo of calculating offsets for slicing index by @eddyxu in #206
  • Changhiskhan/tutorial by @changhiskhan in #167

Full Changelog: v0.0.5...v0.1.0

Support extension types, fixed size list and fixed size binary

30 Aug 17:30
Compare
Choose a tag to compare

What's Changed

  • Update python dataset generation to use boolean and dictionary encoding. by @eddyxu in #124
  • Basic python extension types for image, point2d, and box2d by @changhiskhan in #102
  • Fix offset calculation in VarBinaryEncoder for slices by @eddyxu in #128
  • Support pyarrow write_dataset with partitions by @eddyxu in #125
  • Add encoding for FixedSizeBinary and FixedSizeList by @eddyxu in #129
  • Support parsing fixed size list in schema by @eddyxu in #130

Full Changelog: v0.0.4...v0.0.5

Benchmarks, bug fixes, and writer improvements

23 Aug 19:32
Compare
Choose a tag to compare

Add benchmarks vs parquet and raw json data
Fixes #112, which caused lance datasets to be written with duplicated Arrow buffers
Adds support for large binary, boolean, and temporal types.

What's Changed

Full Changelog: v0.0.3...v0.0.4