Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add convenience function for limit/offset #141

Closed
changhiskhan opened this issue Sep 5, 2022 · 0 comments · Fixed by #158
Closed

Add convenience function for limit/offset #141

changhiskhan opened this issue Sep 5, 2022 · 0 comments · Fixed by #158
Assignees

Comments

@changhiskhan
Copy link
Contributor

changhiskhan commented Sep 5, 2022

Right now to take advantage of Lance performance, we need to explicitly call lance.scanner(..., limit=20, offset=10). It would be nice to be able to do something like lance.dataset(uri).limit(n=20, offset=10).

Note that pyarrow's Dataset already has a head method and under the hood it's creating a scanner. BUT, it does not support Lance's scanner and limit/offset options. In addition, pyarrow's Dataset and Scanner are cython cdef classes so cannot be monkey patched.

Options:

  1. Create Lance subclasses for FileSystemDatasetFactory and FileSystemDataset (both in C++ and cython) which returns the right scanner and supports more ScanOptions. We would then monkey-patch pyarrow.dataset._filesystem_dataset to use this new factory and dataset.
  2. Create cython-only LanceFileSystemDataset with a factory method to create an instance from a FileSystemDataset. Then in lance.dataset, call this factory method so lance.dataset returns a Dataset that supports Dataset.head(n=20, offset=10), which calls lance.scanner under the hood to support lance-specific ScanOptions.
@changhiskhan changhiskhan changed the title Monkey patch dataset.head? Add convenience function for limit/offset Sep 12, 2022
@changhiskhan changhiskhan self-assigned this Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant