Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper #1986

Closed
emorfaq opened this issue Jul 4, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@emorfaq
Copy link

emorfaq commented Jul 4, 2024

while executing step poetry run python scripts/setup I'm running into the below error on Windows 10 machine.

private-gpt> poetry run python scripts/setup
08:22:46.800 [INFO ] private_gpt.settings.settings_loader - Starting application with profiles=['default']
Downloading embedding BAAI/bge-small-en-v1.5
Fetching 14 files: 100%|██████████████████████████████████████████████████████| 14/14 [00:03<00:00, 4.56it/s]
Embedding model downloaded!
Downloading LLM mistral-7b-instruct-v0.2.Q4_K_M.gguf
LLM model downloaded!
Downloading tokenizer mistralai/Mistral-7B-Instruct-v0.2
Traceback (most recent call last):
File "C:\Users\mxxx\projects\AI\privategpt\private-gpt\scripts\setup", line 43, in
AutoTokenizer.from_pretrained(
File "C:\Users\mxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-jsTJl_Mz-py3.11\Lib\site-packages\transformers\models\auto\tokenization_auto.py", line 825, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\mxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-jsTJl_Mz-py3.11\Lib\site-packages\transformers\tokenization_utils_base.py", line 2048, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\mxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-jsTJl_Mz-py3.11\Lib\site-packages\transformers\tokenization_utils_base.py", line 2287, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\mxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-jsTJl_Mz-py3.11\Lib\site-packages\transformers\models\llama\tokenization_llama_fast.py", line 133, in init
super().init(
File "C:\Users\mxxx\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-jsTJl_Mz-py3.11\Lib\site-packages\transformers\tokenization_utils_fast.py", line 111, in init
fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 40 column 3

OS Name: Microsoft Windows 10 Enterprise
OS Version: 10.0.19045
CPU Name: 13th Gen Intel(R) Core(TM) i5-1345U
python3 --version
Python 3.11.9
poetry --version
Poetry (version 1.8.3)

@jaluma jaluma added the bug Something isn't working label Jul 8, 2024
@jaluma
Copy link
Collaborator

jaluma commented Jul 8, 2024

Should be fixed in the latest version. It was fixed in #1987. Can you try again? I just tried the latest version and it worked fine :)

@jaluma jaluma closed this as completed Jul 8, 2024
@emorfaq
Copy link
Author

emorfaq commented Jul 8, 2024

Tried the latest version and it worked fine :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants