Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add compaction support to balanced datasets #3088

Merged
merged 3 commits into from
Nov 5, 2024

Conversation

westonpace
Copy link
Contributor

@westonpace westonpace commented Nov 5, 2024

This allows compaction to succeed on the default storage.

Running compaction on the sibling storage can be added in a future PR.

In addition, this PR adds quite a few more test cases to make sure that a balanced dataset either performs as expected or gives a good "not yet supported" error message.

In addition, this PR reworks the dataset-offset based take (e.g. LanceDataset::take) to reuse the id-based & address-based take paths (e.g. TakeBuilder)

It also fixes a bug in the TakeBuilder path where duplicate IDs were not being handled.

These latter changes are not strictly needed but are preparing for an eventual revamp of the take operation to address #2977

@github-actions github-actions bot added enhancement New feature or request python labels Nov 5, 2024
@codecov-commenter
Copy link

codecov-commenter commented Nov 5, 2024

Codecov Report

Attention: Patch coverage is 67.36111% with 47 lines in your changes missing coverage. Please review.

Project coverage is 77.12%. Comparing base (83439ef) to head (c67fdfc).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
rust/lance/src/dataset/take.rs 73.95% 19 Missing and 6 partials ⚠️
rust/lance/src/dataset/scanner.rs 50.00% 9 Missing and 5 partials ⚠️
rust/lance/src/dataset/fragment.rs 50.00% 4 Missing ⚠️
rust/lance/src/dataset/write/update.rs 66.66% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3088      +/-   ##
==========================================
- Coverage   77.17%   77.12%   -0.06%     
==========================================
  Files         240      240              
  Lines       79671    79703      +32     
  Branches    79671    79703      +32     
==========================================
- Hits        61488    61472      -16     
- Misses      15019    15054      +35     
- Partials     3164     3177      +13     
Flag Coverage Δ
unittests 77.12% <67.36%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@westonpace westonpace marked this pull request as draft November 5, 2024 17:10
@westonpace
Copy link
Contributor Author

Leaving in draft until #3079 merges

@westonpace westonpace force-pushed the feat/compact-blob-col branch from 1b59fc3 to 74a6ba3 Compare November 5, 2024 18:50
@westonpace westonpace marked this pull request as ready for review November 5, 2024 18:50
@westonpace westonpace merged commit 2d3dd67 into lancedb:main Nov 5, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants