-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to pass empty Arrow tables/datasets to write_deltalake
with rust
engine
#2686
Comments
can you share full code example for pyarrow dataset ?. |
@sherlockbeard yes, here it is: import pyarrow as pa
import pyarrow.parquet as pq
from deltalake import write_deltalake
arrow_table = pa.Table.from_pydict(
{"foo": [1, 2], "bar": [True, False]}
)
empty_arrow_table = arrow_table.schema.empty_table()
pq.write_table(empty_arrow_table, "my_empty_parquet_file.parquet")
empty_arrow_dataset = pa.dataset.dataset("my_empty_parquet_file.parquet")
write_deltalake("my_delta_table", arrow_table, mode="append", engine="rust") # this creates the Delta table
write_deltalake("my_delta_table", empty_arrow_dataset, mode="append", engine="rust") # this errors on both `pyarrow` and `rust` engine |
I have been trying to write an empty pandas/polars/pyarrow dataset to delta lake into a Fabric lakehouse. It works well if I run the code written in Python inside a Fabric notebook, but the error |
This still seems to be an issue with deltalake==0.24.0 |
My deltalake version is 0.22.3 |
It's very simple to check manually if your incoming data is empty or not. Either way, once we have streamed writing support we won't error out when incoming data is empty |
Cool! When is the streaming support expected to be released?
…On Mon, Jan 20, 2025 at 02:23 Ion Koutsouris ***@***.***> wrote:
This still seems to be an issue with deltalake==0.24.0
It's very simple to check manually if your incoming data is empty or not.
Either way, once we have streamed writing support we won't error out when
incoming data is empty
—
Reply to this email directly, view it on GitHub
<#2686 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFSWJL7ZOB3KCBTB6NFKS32LSQADAVCNFSM6AAAAABLELZVXKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMBRGYYDQNZRG4>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Description
It's currently not possible to pass empty Arrow tables or datasets to
write_deltalake
when using therust
engine.Write empty Arrow table:
Error:
Write empty dataset:
Error (when using
rust
engine,pyarrow
throws different error):Use Case
I now use this flow to handle an empty table:
And this flow to handle an empty dataset:
The dataset case is particulary unpleasant, because you need to eagerly materialize the dataset to a table in memory just to check if it's empty.
It would be nice if we could simply use
and leave handling of the "empty case" to
delta-rs
.Related Issue(s)
The text was updated successfully, but these errors were encountered: