Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error loading wikitext data raise NotImplementedError(f"Loading a dataset cached in a {type(self._fs).__name__} is not supported.") #6352

Closed
Ahmed-Roushdy opened this issue Oct 25, 2023 · 13 comments

Comments

@Ahmed-Roushdy
Copy link

I was trying to load the wiki dataset, but i got this error

traindata = load_dataset('wikitext', 'wikitext-2-raw-v1', split='train')

File "/home/aelkordy/.conda/envs/prune_llm/lib/python3.9/site-packages/datasets/load.py", line 1804, in load_dataset
ds = builder_instance.as_dataset(split=split, verification_mode=verification_mode, in_memory=keep_in_memory)
File "/home/aelkordy/.conda/envs/prune_llm/lib/python3.9/site-packages/datasets/builder.py", line 1108, in as_dataset
raise NotImplementedError(f"Loading a dataset cached in a {type(self._fs).name} is not supported.")
NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported.

@zachsmith1
Copy link

zachsmith1 commented Oct 26, 2023

+1

Found cached dataset csv (file:///home/ubuntu/.cache/huggingface/datasets/theSquarePond___csv/theSquarePond--XXXXX-bbf0a8365d693d2c/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d)
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[14], line 4
      1 get_ipython().system('pip install -U datasets')
      3 # Load dataset from the hub
----> 4 dataset = load_dataset(dataset_name)

File ~/anaconda3/envs/python38-env/lib/python3.8/site-packages/datasets/load.py:1810, in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, ignore_verifications, keep_in_memory, save_infos, revision, use_auth_token, task, streaming, num_proc, storage_options, **config_kwargs)
   1806 # Build dataset for splits
   1807 keep_in_memory = (
   1808     keep_in_memory if keep_in_memory is not None else is_small_dataset(builder_instance.info.dataset_size)
   1809 )
-> 1810 ds = builder_instance.as_dataset(split=split, verification_mode=verification_mode, in_memory=keep_in_memory)
   1811 # Rename and cast features to match task schema
   1812 if task is not None:

File ~/anaconda3/envs/python38-env/lib/python3.8/site-packages/datasets/builder.py:1128, in DatasetBuilder.as_dataset(self, split, run_post_process, verification_mode, ignore_verifications, in_memory)
   1126 is_local = not is_remote_filesystem(self._fs)
   1127 if not is_local:
-> 1128     raise NotImplementedError(f"Loading a dataset cached in a {type(self._fs).__name__} is not supported.")
   1129 if not os.path.exists(self._output_dir):
   1130     raise FileNotFoundError(
   1131         f"Dataset {self.name}: could not find data in {self._output_dir}. Please make sure to call "
   1132         "builder.download_and_prepare(), or use "
   1133         "datasets.load_dataset() before trying to access the Dataset object."
   1134     )

NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported.

@Englader
Copy link

+1

Found cached dataset csv ([file://C:/Users/Shady/.cache/huggingface/datasets/knkarthick___csv/knkarthick--dialogsum-cd36827d3490488d/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1](file:///C:/Users/Shady/.cache/huggingface/datasets/knkarthick___csv/knkarthick--dialogsum-cd36827d3490488d/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1))
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[38], line 3
      1 huggingface_dataset_name = "knkarthick/dialogsum"
----> 3 dataset = load_dataset(huggingface_dataset_name)

File D:\Desktop\Workspace\GenAI\genai\lib\site-packages\datasets\load.py:1804, in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, ignore_verifications, keep_in_memory, save_infos, revision, use_auth_token, task, streaming, num_proc, storage_options, **config_kwargs)
   1800 # Build dataset for splits
   1801 keep_in_memory = (
   1802     keep_in_memory if keep_in_memory is not None else is_small_dataset(builder_instance.info.dataset_size)
   1803 )
-> 1804 ds = builder_instance.as_dataset(split=split, verification_mode=verification_mode, in_memory=keep_in_memory)
   1805 # Rename and cast features to match task schema
   1806 if task is not None:

File D:\Desktop\Workspace\GenAI\genai\lib\site-packages\datasets\builder.py:1108, in DatasetBuilder.as_dataset(self, split, run_post_process, verification_mode, ignore_verifications, in_memory)
   1106 is_local = not is_remote_filesystem(self._fs)
   1107 if not is_local:
-> 1108     raise NotImplementedError(f"Loading a dataset cached in a {type(self._fs).__name__} is not supported.")
   1109 if not os.path.exists(self._output_dir):
   1110     raise FileNotFoundError(
   1111         f"Dataset {self.name}: could not find data in {self._output_dir}. Please make sure to call "
   1112         "builder.download_and_prepare(), or use "
   1113         "datasets.load_dataset() before trying to access the Dataset object."
   1114     )

NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported.

@mariosasko
Copy link
Collaborator

This error stems from a breaking change in fsspec. It has been fixed in the latest datasets release (2.14.6). Updating the installation with pip install -U datasets should fix the issue.

@yueool
Copy link

yueool commented Oct 26, 2023

此错误源于 中的重大更改。此问题已在最新版本 () 中修复。更新安装应该可以解决此问题。fsspec``datasets``2.14.6``pip install -U datasets

thanks , 太好啦,刚好解决了我的问题,GPT都没解决了,终于被你搞定了

@RubTalha
Copy link

RubTalha commented Nov 7, 2023

@albertvillanova
Copy link
Member

Fixed by:

The fix was released in datasets-2.14.6.

@dhruv-anand-aintech
Copy link

this is fixed in 2.15.0, but broken again in 2.17.0. Can someone verify?

@jgontrum
Copy link

I'm on 2.17.1 and can confirm it's broken again. Downgrading to 2.16 helped.

@Mariemomezzine
Copy link

2.14.6

i update the version but the error still exist

@JPonsa
Copy link

JPonsa commented Mar 16, 2024

The issue seems to persist in 2.18.0

@GoGoZeppeli-towa
Copy link

same problem in 2.18.0

@lhoestq
Copy link
Member

lhoestq commented Mar 19, 2024

Which version of fsspec and OS are you using ?

@GoGoZeppeli-towa
Copy link

Which version of fsspec and OS are you using ?

fsspec-2023.10.0 and Windows 10, guess fsspec version too old...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests