Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Implement Scanner::from_fragments() to initialize a scanner from sub-set of fragments #860

Closed
eddyxu opened this issue May 13, 2023 · 3 comments · Fixed by #869
Closed
Assignees
Labels
arrow Apache Arrow related issues benchmark good first issue Good for newcomers rust Rust related tasks

Comments

@eddyxu
Copy link
Contributor

eddyxu commented May 13, 2023

Problem Statement

It will be useful for fine-tuned distributed scan. a master node can decide the plan that assigns fragments to each distributed node. Each distributed worker could scan the fraction of the dataset accordingly.

@GallagherCommaJack
Copy link

ideally the API for this accepts an iterator of fragments (vs a list)

@changhiskhan
Copy link
Contributor

This means adding a fragment_ids setter or smth to Scanner and modify the Scan IO node to take this list as a parameter ?

@eddyxu
Copy link
Contributor Author

eddyxu commented May 13, 2023

Yes, this could be just a constructor from Scanner, takes in the fragments, and build a scanner in one worker.

Can follow the pattern like https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Scanner.html#pyarrow.dataset.Scanner.from_fragment

@eddyxu eddyxu added good first issue Good for newcomers arrow Apache Arrow related issues benchmark rust Rust related tasks labels May 13, 2023
wjones127 added a commit that referenced this issue May 15, 2023
* chore: add venv to the gitignore

* feat: create a scanner from an iterable of fragments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Apache Arrow related issues benchmark good first issue Good for newcomers rust Rust related tasks
Projects
None yet
4 participants