Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: provide direct TableProvider integration in datafusion-python #3012

Merged
merged 3 commits into from
Nov 22, 2024

Conversation

timsaucer
Copy link
Contributor

Description

This change adds in a single method on the Delta Table object to expose a PyCapsule that provides a DataFusion FFI Table Provider. With this change, you can register a delta table in datafusion-python without the need to export a pyarrow_dataset. This enables full push down filter support, greatly improving performance in some cases.

Related Issue(s)

Closes #2536

Documentation

This is a follow on from apache/datafusion#12920 and apache/datafusion-python#823

@github-actions github-actions bot added the binding/python Issues for the Python package label Nov 21, 2024
Copy link

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@timsaucer
Copy link
Contributor Author

FYI @ion-elgreco

@timsaucer timsaucer changed the title Feat/datafusion table provider feat: Provide direct TableProvider integration in datafusion-python Nov 21, 2024
@timsaucer timsaucer changed the title feat: Provide direct TableProvider integration in datafusion-python feat: provide direct TableProvider integration in datafusion-python Nov 21, 2024
@timsaucer timsaucer force-pushed the feat/datafusion-table-provider branch from f88f089 to e08a27c Compare November 21, 2024 13:02
@ion-elgreco
Copy link
Collaborator

@timsaucer super excited with this change!

@ion-elgreco
Copy link
Collaborator

Can you add the attribute in here as well:

https://github.com/delta-io/delta-rs/blob/main/python/deltalake/_internal.pyi

Copy link

codecov bot commented Nov 21, 2024

Codecov Report

Attention: Patch coverage is 0% with 10 lines in your changes missing coverage. Please review.

Project coverage is 72.43%. Comparing base (c091a82) to head (2d7fdb8).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
python/src/lib.rs 0.00% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3012      +/-   ##
==========================================
- Coverage   72.47%   72.43%   -0.04%     
==========================================
  Files         128      128              
  Lines       40831    40841      +10     
  Branches    40831    40841      +10     
==========================================
- Hits        29592    29584       -8     
- Misses       9356     9366      +10     
- Partials     1883     1891       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@rtyler rtyler self-assigned this Nov 21, 2024
@rtyler rtyler added the enhancement New feature or request label Nov 21, 2024
@rtyler rtyler added this to the v0.22 milestone Nov 21, 2024
@rtyler rtyler force-pushed the feat/datafusion-table-provider branch from e08a27c to 4a54b9c Compare November 21, 2024 20:01
rtyler
rtyler previously approved these changes Nov 21, 2024
@rtyler rtyler enabled auto-merge November 21, 2024 20:02
@timsaucer
Copy link
Contributor Author

I'll try to address that ci failure tomorrow morning

auto-merge was automatically disabled November 22, 2024 11:46

Head branch was pushed to by a user without write access

@ion-elgreco ion-elgreco force-pushed the feat/datafusion-table-provider branch from aac8510 to 2d7fdb8 Compare November 22, 2024 11:48
@ion-elgreco ion-elgreco added this pull request to the merge queue Nov 22, 2024
Merged via the queue into delta-io:main with commit 0c4344f Nov 22, 2024
24 checks passed
@adriangb
Copy link
Contributor

Amazing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide a direct integration with Datafusion
4 participants