Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(rust): adjust scan range to avoid unnecessary warnings #3248

Merged
merged 1 commit into from
Dec 18, 2024

Conversation

takaebato
Copy link
Contributor

@takaebato takaebato commented Dec 15, 2024

Fixes #3086

While adding test cases, I noticed that the following error would occur in the original code. But with this update, this is fixed too.

Error detail

Test case:

When the offset is specified as larger than the number of rows without a limit. (This test case is added in this change)
test_limit_offset in python/tests/test_dataset.py

assert dataset.to_table(offset=101) == table.slice(100, 0)

Error message:

The error occurs here when attempting to execute 100 - 101 as a u64.

let mut rows_to_take = offsets.end - offsets.start;

attempt to subtract with overflow
thread 'dataset::scanner::test::test_limit::data_storage_version_1_LanceFileVersion__Stable' panicked at rust/lance/src/io/exec/scan.rs:154:36:
attempt to subtract with overflow
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:72:14
   2: core::panicking::panic
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:145:5
   3: lance::io::exec::scan::LanceStream::try_new_v2
             at ./src/io/exec/scan.rs:154:36
   4: lance::io::exec::scan::LanceStream::try_new
             at ./src/io/exec/scan.rs:104:13
   5: <lance::io::exec::scan::LanceScanExec as datafusion_physical_plan::execution_plan::ExecutionPlan>::execute
             at ./src/io/exec/scan.rs:529:21
   6: lance_datafusion::exec::execute_plan
             at /Users/taka/Documents/GitHub/lance/rust/lance-datafusion/src/exec.rs:255:8
   7: lance::dataset::scanner::Scanner::try_into_stream::{{closure}}::{{closure}}
             at ./src/dataset/scanner.rs:948:42
   8: lance::dataset::scanner::Scanner::try_into_stream::{{closure}}
             at ./src/dataset/scanner.rs:945:5
   9: lance::dataset::scanner::Scanner::try_into_batch::{{closure}}
             at ./src/dataset/scanner.rs:963:45
  10: lance::dataset::scanner::test::test_limit::{{closure}}
             at ./src/dataset/scanner.rs:2524:14
  11: lance::dataset::scanner::test::test_limit::data_storage_version_1_LanceFileVersion__Stable::{{closure}}
             at ./src/dataset/scanner.rs:2509:5
  12: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
  13: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
  14: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}::{{closure}}
             at /Users/taka/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/scheduler/current_thread/mod.rs:673:57
  15: tokio::runtime::coop::with_budget
             at /Users/taka/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/coop.rs:107:5
  16: tokio::runtime::coop::budget
             at /Users/taka/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/coop.rs:73:5
  17: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}
             at /Users/taka/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/scheduler/current_thread/mod.rs:673:25
  18: tokio::runtime::scheduler::current_thread::Context::enter
             at /Users/taka/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/scheduler/current_thread/mod.rs:412:19
  19: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}
             at /Users/taka/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/scheduler/current_thread/mod.rs:672:36
  20: tokio::runtime::scheduler::current_thread::CoreGuard::enter::{{closure}}
             at /Users/taka/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/scheduler/current_thread/mod.rs:751:68
  21: tokio::runtime::context::scoped::Scoped<T>::set
             at /Users/taka/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/context/scoped.rs:40:9
  22: tokio::runtime::context::set_scheduler::{{closure}}
             at /Users/taka/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/context.rs:180:26
  23: std::thread::local::LocalKey<T>::try_with
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/thread/local.rs:284:16
  24: std::thread::local::LocalKey<T>::with
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/thread/local.rs:260:9
  25: tokio::runtime::context::set_scheduler
             at /Users/taka/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/context.rs:180:9
  26: tokio::runtime::scheduler::current_thread::CoreGuard::enter
             at /Users/taka/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/scheduler/current_thread/mod.rs:751:27
  27: tokio::runtime::scheduler::current_thread::CoreGuard::block_on
             at /Users/taka/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/scheduler/current_thread/mod.rs:660:19
  28: tokio::runtime::scheduler::current_thread::CurrentThread::block_on::{{closure}}
             at /Users/taka/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/scheduler/current_thread/mod.rs:180:28
  29: tokio::runtime::context::runtime::enter_runtime
             at /Users/taka/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/context/runtime.rs:65:16
  30: tokio::runtime::scheduler::current_thread::CurrentThread::block_on
             at /Users/taka/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/scheduler/current_thread/mod.rs:168:9
  31: tokio::runtime::runtime::Runtime::block_on_inner
             at /Users/taka/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/runtime.rs:361:47
  32: tokio::runtime::runtime::Runtime::block_on
             at /Users/taka/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/runtime.rs:335:13
  33: lance::dataset::scanner::test::test_limit::data_storage_version_1_LanceFileVersion__Stable
             at ./src/dataset/scanner.rs:2509:5
  34: lance::dataset::scanner::test::test_limit::data_storage_version_1_LanceFileVersion__Stable::{{closure}}
             at ./src/dataset/scanner.rs:2514:10
  35: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
  36: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


Process finished with exit code 101

Copy link

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@takaebato takaebato marked this pull request as draft December 15, 2024 08:45
@takaebato takaebato changed the title [WIP] Adjust the scan range considering the number of rows fix(rust): adjust scan range to avoid unnecessary warnings Dec 15, 2024
@github-actions github-actions bot added the bug Something isn't working label Dec 15, 2024
@takaebato takaebato marked this pull request as ready for review December 15, 2024 13:59
@takaebato takaebato marked this pull request as draft December 15, 2024 15:35
@takaebato takaebato marked this pull request as ready for review December 16, 2024 11:56
@takaebato takaebato force-pushed the fix/avoid-unnecessary-warning branch from afdf5e6 to 537d4e1 Compare December 16, 2024 12:23
@takaebato
Copy link
Contributor Author

takaebato commented Dec 16, 2024

I noticed something else—does the following case behave as expected?

  • limit 0 without offset returns all rows.
  • limit 0 with some offset returns an empty set of rows.

This behavior is due to the code implemented in this PR: #436.
Specifically, the line of code here:

if use_limit_node && (self.limit.unwrap_or(0) > 0 || self.offset.is_some()) {

Maybe the original code self.limit.is_some() was actually fine as it was?

@wjones127
Copy link
Contributor

does the following case behave as expected?

Yes, this is relied on upstream in LanceDB.

@wjones127 wjones127 merged commit a07717a into lancedb:main Dec 18, 2024
37 of 41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Warning issued if limit is larger than num rows
2 participants