Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CHORE] [ScanOperator-Follow-Ons-3] Integrate GlobScanOperator with new scan node builder #1564

Merged

Conversation

jaychia
Copy link
Contributor

@jaychia jaychia commented Nov 3, 2023

No description provided.

@github-actions github-actions bot added the chore label Nov 3, 2023
@jaychia jaychia merged commit 2740a54 into clark/scan-operator-integration Nov 3, 2023
2 checks passed
@jaychia jaychia deleted the jay/scan-operator-integration-glob branch November 3, 2023 01:28
jaychia added a commit that referenced this pull request Nov 7, 2023
This PR adds an e2e integration for the new `ScanOperator` for reading
from external sources, integrating with logical plan building, logical
-> physical plan translation, physical plan scheduling, physical task
execution, and the actual `MicroPartition`-based reading.

## TODOs (possibly before merging)

- [ ] Implement Python I/O backend at `MicroPartition` level.
- [ ] Implement reads for non-Parquet formats at `MicroPartition` level.
- [x] Consolidate filter/limit pushdowns to use the same `Pushdown`
struct.
- [x] Look to reinstate non-optional `TableMetadata` at the
`MicroPartition` level. (#1563)
- [x] Look to reinstate non-optional `TableStatistics` when data is
unloaded at the `MicroPartition` level. (#1563)
- [x] Integrate with globbing `ScanOperator` implementation. (#1564)
- [ ] Support different row group selection per Parquet file (currently
applies a single row group selection to all files in a scan task
bundle).
- [ ] Misc. cleanup.
- [ ] (?) Add basic validation that `ScanTask` configurations are
compatible when merging into a `ScanTaskBatch` bundle.

---------

Co-authored-by: Jay Chia <[email protected]@users.noreply.github.com>
Co-authored-by: Jay Chia <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant