Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper #1986

emorfaq · 2024-07-04T13:52:58Z

while executing step poetry run python scripts/setup I'm running into the below error on Windows 10 machine.

private-gpt> poetry run python scripts/setup
08:22:46.800 [INFO ] private_gpt.settings.settings_loader - Starting application with profiles=['default']
Downloading embedding BAAI/bge-small-en-v1.5
Fetching 14 files: 100%|██████████████████████████████████████████████████████| 14/14 [00:03<00:00, 4.56it/s]
Embedding model downloaded!
Downloading LLM mistral-7b-instruct-v0.2.Q4_K_M.gguf
LLM model downloaded!
Downloading tokenizer mistralai/Mistral-7B-Instruct-v0.2
Traceback (most recent call last):
File "C:\Users\mxxx\projects\AI\privategpt\private-gpt\scripts\setup", line 43, in
AutoTokenizer.from_pretrained(
File "C:\Users\mxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-jsTJl_Mz-py3.11\Lib\site-packages\transformers\models\auto\tokenization_auto.py", line 825, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\mxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-jsTJl_Mz-py3.11\Lib\site-packages\transformers\tokenization_utils_base.py", line 2048, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\mxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-jsTJl_Mz-py3.11\Lib\site-packages\transformers\tokenization_utils_base.py", line 2287, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\mxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-jsTJl_Mz-py3.11\Lib\site-packages\transformers\models\llama\tokenization_llama_fast.py", line 133, in init
super().init(
File "C:\Users\mxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-jsTJl_Mz-py3.11\Lib\site-packages\transformers\tokenization_utils_fast.py", line 111, in init
fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 40 column 3

OS Name: Microsoft Windows 10 Enterprise
OS Version: 10.0.19045
CPU Name: 13th Gen Intel(R) Core(TM) i5-1345U
python3 --version
Python 3.11.9
poetry --version
Poetry (version 1.8.3)

jaluma · 2024-07-08T11:10:53Z

Should be fixed in the latest version. It was fixed in #1987. Can you try again? I just tried the latest version and it worked fine :)

emorfaq · 2024-07-08T13:04:02Z

Tried the latest version and it worked fine :)

jaluma added the bug Something isn't working label Jul 8, 2024

jaluma closed this as completed Jul 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper #1986

Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper #1986

emorfaq commented Jul 4, 2024 •

edited

Loading

jaluma commented Jul 8, 2024

emorfaq commented Jul 8, 2024

Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper #1986

Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper #1986

Comments

emorfaq commented Jul 4, 2024 • edited Loading

jaluma commented Jul 8, 2024

emorfaq commented Jul 8, 2024

emorfaq commented Jul 4, 2024 •

edited

Loading