[C++] Unrecognized filesystem type in URI: abfss:// #32912

asfimport · 2022-09-10T08:02:53Z

I am running the below commands in databricks.

When I am trying to read a file which is stored in adls using pandas:

pip install adlfs 
import pandas as pd
data = pd.read_parquet("abfss://data.parquet", storage_options= {})

Then I got the below error:

File "/databricks/python/lib/python3.7/site-packages/pandas/io/parquet.py", line 310, in read_parquet
return impl.read(path, columns=columns, **kwargs)
File "/databricks/python/lib/python3.7/site-packages/pandas/io/parquet.py", line 125, in read
path, columns=columns, **kwargs
File "/databricks/python/lib/python3.7/site-packages/pyarrow/parquet.py", line 1573, in read_table
ignore_prefixes=ignore_prefixes,
File "/databricks/python/lib/python3.7/site-packages/pyarrow/parquet.py", line 1434, in __init__
ignore_prefixes=ignore_prefixes)
File "/databricks/python/lib/python3.7/site-packages/pyarrow/dataset.py", line 667, in dataset
return _filesystem_dataset(source, **kwargs)
File "/databricks/python/lib/python3.7/site-packages/pyarrow/dataset.py", line 424, in _filesystem_dataset
fs, paths_or_selector = _ensure_single_source(source, filesystem)
File "/databricks/python/lib/python3.7/site-packages/pyarrow/dataset.py", line 371, in _ensure_single_source
filesystem, path = FileSystem.from_uri(path)
File "pyarrow/_fs.pyx", line 347, in pyarrow._fs.FileSystem.from_uri
File "pyarrow/error.pxi", line 122, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Unrecognized filesystem type in URI: abfss://data.parquet

Reporter: Prakhar Sandhu

_{Note: This issue was originally created as ARROW-17672. Please see the migration documentation for further details.}

asfimport · 2022-09-11T21:50:59Z

Kouhei Sutou / @kou:
We need to implement a filesystem module for Azure Data Lake Storage in C++ like ARROW-2034 to support this case.

asfimport · 2022-09-13T11:03:08Z

Joris Van den Bossche / @jorisvandenbossche:
You are using adlfs, which is an fsspec-compatible filesystem, and so normally I expect that the pandas read_parquet call converts the "abfss://data.parquet" URI to an fsspec filesystem, passing that to the underlying pyarrow function, and we do have support for fsspec filesystems (and in that way we can support filesystems that don't have native support inside Arrow C++, such as Azure at the moment).

So something is going wrong here. As a starter, can you indicate which versions you are using for pyarrow, pandas, fsspec and adlfs? (eg a pip list or conda list)

asfimport · 2022-09-13T17:00:41Z

Prakhar Sandhu:
Please find the versions used below:

pandas==1.3.5
pyarrow==4.0.0
python==3.7.6
adlfs==2022.2.0
fsspec==2022.8.2

Tom-Newton · 2023-12-20T12:32:44Z

If you really want to use adlfs this issue is definitely solvable just with changes to the user code. However, I think this will also be solved by #39317. This will connect up the new C++ AzureFileSystem on the python side and provide much better performance and reliability compared to adlfs.

jorisvandenbossche · 2023-12-20T13:17:48Z

Let's close this issue in favor of #39317

jorisvandenbossche closed this as completed Dec 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] Unrecognized filesystem type in URI: abfss:// #32912

[C++] Unrecognized filesystem type in URI: abfss:// #32912

asfimport commented Sep 10, 2022

asfimport commented Sep 11, 2022

asfimport commented Sep 13, 2022

asfimport commented Sep 13, 2022

Tom-Newton commented Dec 20, 2023

jorisvandenbossche commented Dec 20, 2023

[C++] Unrecognized filesystem type in URI: abfss:// #32912

[C++] Unrecognized filesystem type in URI: abfss:// #32912

Comments

asfimport commented Sep 10, 2022

asfimport commented Sep 11, 2022

asfimport commented Sep 13, 2022

asfimport commented Sep 13, 2022

Tom-Newton commented Dec 20, 2023

jorisvandenbossche commented Dec 20, 2023