Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed import when running deltalake==0.14.0 #65

Closed
avriiil opened this issue Dec 13, 2023 · 4 comments · Fixed by #68
Closed

Failed import when running deltalake==0.14.0 #65

avriiil opened this issue Dec 13, 2023 · 4 comments · Fixed by #68
Labels
bug Something isn't working

Comments

@avriiil
Copy link
Contributor

avriiil commented Dec 13, 2023

To reproduce:

  • create new mamba/conda env
  • mamba install python==3.11 pip
  • pip install deltalake==0.14
  • pip install dask-deltatable
  • run import dask_deltatable
ImportError: cannot import name '_write_new_deltalake' from 'deltalake.writer' (/Users/rpelgrim/miniforge3/envs/deltalake-0130/lib/python3.11/site-packages/deltalake/writer.py)

To solve:

  • pip install deltalake==0.13
@jacobtomlinson jacobtomlinson added the bug Something isn't working label Dec 19, 2023
@mrocklin
Copy link
Contributor

In [1]: from pipeline.reduce import *
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[1], line 1
----> 1 from pipeline.reduce import *

File ~/workspace/etl-tpch/pipeline/reduce.py:6
      4 import coiled
      5 import dask_expr as dd
----> 6 import dask_deltatable
      7 from dask.distributed import LocalCluster
      8 from prefect import flow, task

File ~/mambaforge/envs/etl-tpch/lib/python3.11/site-packages/dask_deltatable/__init__.py:9
      3 __all__ = [
      4     "read_deltalake",
      5     "to_deltalake",
      6 ]
      8 from .core import read_deltalake as read_deltalake
----> 9 from .write import to_deltalake as to_deltalake

File ~/mambaforge/envs/etl-tpch/lib/python3.11/site-packages/dask_deltatable/write.py:18
     16 from dask.highlevelgraph import HighLevelGraph
     17 from deltalake import DeltaTable
---> 18 from deltalake.writer import (
     19     MAX_SUPPORTED_WRITER_VERSION,
     20     PYARROW_MAJOR_VERSION,
     21     AddAction,
     22     DeltaJSONEncoder,
     23     DeltaProtocolError,
     24     DeltaStorageHandler,
     25     __enforce_append_only,
     26     _write_new_deltalake,
     27     get_file_stats_from_metadata,
     28     get_partitions_from_path,
     29     try_get_table_and_table_uri,
     30 )
     31 from toolz.itertoolz import pluck
     33 from ._schema import pyarrow_to_deltalake, validate_compatible

ImportError: cannot import name '_write_new_deltalake' from 'deltalake.writer' (/Users/mrocklin/mambaforge/envs/etl-tpch/lib/python3.11/site-packages/deltalake/writer.py)

@jrbourbeau
Copy link
Member

Looks like that method got moved to

from deltalake._internal import write_new_deltalake

Though still a private method, which isn't ideal

@jrbourbeau
Copy link
Member

Ah, I think we want this public method https://delta-io.github.io/delta-rs/api/delta_writer/#deltalake.write_deltalake (at least from a quick glance, it looks like we could use that instead)

@fjetter
Copy link
Contributor

fjetter commented Feb 13, 2024

Ah, I think we want this public method

No. this would create a commit for every partition which is exactly what we don't want. Especially without a distributed lock in place this would likely cause data loss whenever transactions conflict

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants