You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now to take advantage of Lance performance, we need to explicitly call lance.scanner(..., limit=20, offset=10). It would be nice to be able to do something like lance.dataset(uri).limit(n=20, offset=10).
Note that pyarrow's Dataset already has a head method and under the hood it's creating a scanner. BUT, it does not support Lance's scanner and limit/offset options. In addition, pyarrow's Dataset and Scanner are cython cdef classes so cannot be monkey patched.
Options:
Create Lance subclasses for FileSystemDatasetFactory and FileSystemDataset (both in C++ and cython) which returns the right scanner and supports more ScanOptions. We would then monkey-patch pyarrow.dataset._filesystem_dataset to use this new factory and dataset.
Create cython-only LanceFileSystemDataset with a factory method to create an instance from a FileSystemDataset. Then in lance.dataset, call this factory method so lance.dataset returns a Dataset that supports Dataset.head(n=20, offset=10), which calls lance.scanner under the hood to support lance-specific ScanOptions.
The text was updated successfully, but these errors were encountered:
changhiskhan
changed the title
Monkey patch dataset.head?
Add convenience function for limit/offset
Sep 12, 2022
Right now to take advantage of Lance performance, we need to explicitly call
lance.scanner(..., limit=20, offset=10)
. It would be nice to be able to do something likelance.dataset(uri).limit(n=20, offset=10)
.Note that pyarrow's Dataset already has a
head
method and under the hood it's creating a scanner. BUT, it does not support Lance's scanner and limit/offset options. In addition, pyarrow's Dataset and Scanner are cython cdef classes so cannot be monkey patched.Options:
lance.dataset
, call this factory method solance.dataset
returns a Dataset that supportsDataset.head(n=20, offset=10)
, which callslance.scanner
under the hood to support lance-specific ScanOptions.The text was updated successfully, but these errors were encountered: