Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Cannot convert token with models.Exllamav2 #1319

Open
Qasimk555 opened this issue Dec 4, 2024 · 0 comments
Open

RuntimeError: Cannot convert token with models.Exllamav2 #1319

Qasimk555 opened this issue Dec 4, 2024 · 0 comments
Labels

Comments

@Qasimk555
Copy link

Describe the issue as clearly as possible:

Im getting runtime error, whenever I try generate.json on Exvllama2 model. The error is as follows:

RuntimeError: Cannot convert token (127815) to bytes: �

Steps/code to reproduce the bug:

from outlines import models, generate, samplers
#import exllamav2
from pydantic import BaseModel
import outlines

class User(BaseModel):
    name: str
    last_name: str
    id: int

@outlines.prompt
def chat_template(messages, bos_token="<|begin_of_text|>", custom_tools=None, tools_in_user_message=True, 
                 date_string=None, strftime_now=None, tools=None, add_generation_prompt=False):
    """{{- bos_token }}
{%- if custom_tools is defined %}
    {%- set tools = custom_tools %}
{%- endif %}
{%- if not tools_in_user_message is defined %}
    {%- set tools_in_user_message = true %}
{%- endif %}
{%- if not date_string is defined %}
    {%- if strftime_now is defined %}
        {%- set date_string = strftime_now("%d %b %Y") %}
    {%- else %}
        {%- set date_string = "26 Jul 2024" %}
    {%- endif %}
{%- endif %}
{%- if not tools is defined %}
    {%- set tools = none %}
{%- endif %}

{#- This block extracts the system message, so we can slot it into the right place. #}
{%- if messages[0]['role'] == 'system' %}
    {%- set system_message = messages[0]['content']|trim %}
    {%- set messages = messages[1:] %}
{%- else %}
    {%- set system_message = "" %}
{%- endif %}

{#- System message #}
{{- "<|start_header_id|>system<|end_header_id|>\\n\\n" }}
{%- if tools is not none %}
    {{- "Environment: ipython\\n" }}
{%- endif %}
{{- "Cutting Knowledge Date: December 2023\\n" }}
{{- "Today Date: " + date_string + "\\n\\n" }}
{%- if tools is not none and not tools_in_user_message %}
    {{- "You have access to the following functions. To call a function, please respond with JSON for a function call." }}
    {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
    {{- "Do not use variables.\\n\\n" }}
    {%- for t in tools %}
        {{- t | tojson(indent=4) }}
        {{- "\\n\\n" }}
    {%- endfor %}
{%- endif %}
{{- system_message }}
{{- "<|eot_id|>" }}

{#- Custom tools are passed in a user message with some extra guidance #}
{%- if tools_in_user_message and not tools is none %}
    {#- Extract the first user message so we can plug it in here #}
    {%- if messages | length != 0 %}
        {%- set first_user_message = messages[0]['content']|trim %}
        {%- set messages = messages[1:] %}
    {%- else %}
        {{- raise_exception("Cannot put tools in the first user message when there's no first user message!") }}
    {%- endif %}
    {{- '<|start_header_id|>user<|end_header_id|>\\n\\n' -}}
    {{- "Given the following functions, please respond with a JSON for a function call " }}
    {{- "with its proper arguments that best answers the given prompt.\\n\\n" }}
    {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
    {{- "Do not use variables.\\n\\n" }}
    {%- for t in tools %}
        {{- t | tojson(indent=4) }}
        {{- "\\n\\n" }}
    {%- endfor %}
    {{- first_user_message + "<|eot_id|>"}}
{%- endif %}

{%- for message in messages %}
    {%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}
        {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\\n\\n'+ message['content'] | trim + '<|eot_id|>' }}
    {%- elif 'tool_calls' in message %}
        {%- if not message.tool_calls|length == 1 %}
            {{- raise_exception("This model only supports single tool-calls at once!") }}
        {%- endif %}
        {%- set tool_call = message.tool_calls[0].function %}
        {{- '<|start_header_id|>assistant<|end_header_id|>\\n\\n' -}}
        {{- '{"name": "' + tool_call.name + '", ' }}
        {{- '"parameters": ' }}
        {{- tool_call.arguments | tojson }}
        {{- "}" }}
        {{- "<|eot_id|>" }}
    {%- elif message.role == "tool" or message.role == "ipython" %}
        {{- "<|start_header_id|>ipython<|end_header_id|>\\n\\n" }}
        {%- if message.content is mapping or message.content is iterable %}
            {{- message.content | tojson }}
        {%- else %}
            {{- message.content }}
        {%- endif %}
        {{- "<|eot_id|>" }}
    {%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
    {{- '<|start_header_id|>assistant<|end_header_id|>\\n\\n' }}
{%- endif %}"""

# Example usage
example_messages = [
    {"role": "system", "content": "You are a helpful assistant"},

    #{"role": "user", "content": "Hello."}
    {"role": "user", "content": "Create a user profile with the fields name, last_name and id."}
]

prompt = chat_template(
    messages=example_messages,
    date_string="04 Dec 2024",add_generation_prompt=True
    
)
print(prompt)

model = models.exl2(model_path="./Llama-3.2-3B-Instruct-exl2",max_seq_len=2048)

sampler = samplers.multinomial(temperature=0.1)

#generator = generate.text(model,sampler)
generator = generate.json(model,User)
# stop_conditions" :["<|eot_id|>"] not working - ???
kwargs = {"stop_conditions" :['<|eot_id|>'], 'max_new_tokens': 512, "completion_only" :True}
result = generator(
    prompt,
   **kwargs
)
print(result)

Expected result:

{"name":<some name,
"last_name":<some name>,
"id":<some id>}

Error message:

Traceback (most recent call last):
  File "C:\Users\legio\Desktop\llm_sm\outlines_exlv2.py", line 130, in <module>
    generator = generate.json(model,User)
  File "C:\Users\legio\anaconda3\envs\llm_sm\lib\functools.py", line 889, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "C:\Users\legio\anaconda3\envs\llm_sm\lib\site-packages\outlines\generate\json.py", line 54, in json
    generator = regex(model, regex_str, sampler)
  File "C:\Users\legio\anaconda3\envs\llm_sm\lib\functools.py", line 889, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "C:\Users\legio\anaconda3\envs\llm_sm\lib\site-packages\outlines\generate\regex.py", line 34, in regex
    logits_processor = RegexLogitsProcessor(regex_str, tokenizer=model.tokenizer)
  File "C:\Users\legio\anaconda3\envs\llm_sm\lib\site-packages\outlines\processors\structured.py", line 152, in __init__
    guide = RegexGuide.from_regex(regex_string, tokenizer)
  File "C:\Users\legio\anaconda3\envs\llm_sm\lib\site-packages\outlines\fsm\guide.py", line 92, in from_regex
    return super().from_regex(
  File "C:\Users\legio\anaconda3\envs\llm_sm\lib\site-packages\outlines_core\fsm\guide.py", line 212, in from_regex
    ) = _create_states_mapping(
  File "C:\Users\legio\anaconda3\envs\llm_sm\lib\site-packages\outlines\fsm\guide.py", line 76, in cached_create_states_mapping
    return uncached_create_states_mapping(regex_string, tokenizer, *args, **kwargs)
  File "C:\Users\legio\anaconda3\envs\llm_sm\lib\site-packages\outlines_core\fsm\guide.py", line 141, in create_states_mapping
    return create_states_mapping_from_fsm(regex_fsm, tokenizer, frozen_tokens)
  File "C:\Users\legio\anaconda3\envs\llm_sm\lib\site-packages\outlines_core\fsm\guide.py", line 178, in create_states_mapping_from_fsm
    states_to_token_maps, empty_token_ids = create_fsm_index_tokenizer(
  File "C:\Users\legio\anaconda3\envs\llm_sm\lib\site-packages\outlines_core\fsm\regex.py", line 471, in create_fsm_index_tokenizer
    tokens_to_token_ids, empty_token_ids = reduced_vocabulary(tokenizer)
  File "C:\Users\legio\anaconda3\envs\llm_sm\lib\site-packages\outlines_core\fsm\regex.py", line 424, in reduced_vocabulary
    raise RuntimeError(
RuntimeError: Cannot convert token `` (127815) to bytes:  �

Outlines/Python version information:

Version information

``` 0.1.7

Python 3.10.15 | packaged by Anaconda, Inc. | (main, Oct 3 2024, 07:22:19) [MSC v.1929 64 bit (AMD64)]

aiohttpx==0.0.12
airportsdata==20241001
annotated-types==0.7.0
anyio==4.6.2.post1
asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1698341106958/work
async-lru==2.0.4
async_openai==0.0.52
attrs==24.2.0
backoff==2.2.1
certifi==2024.8.30
charset-normalizer==3.4.0
cloudpickle==3.1.0
colorama @ file:///home/conda/feedstock_root/build_artifacts/colorama_1666700638685/work
comm @ file:///home/conda/feedstock_root/build_artifacts/comm_1710320294760/work
cramjam==2.9.0
debugpy @ file:///C:/b/abs_c0y1fjipt2/croot/debugpy_1690906864587/work
decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work
diskcache==5.6.3
distro==1.9.0
einops==0.8.0
exceptiongroup @ file:///home/conda/feedstock_root/build_artifacts/exceptiongroup_1720869315914/work
executing @ file:///home/conda/feedstock_root/build_artifacts/executing_1725214404607/work
exllamav2 @ git+https://github.com/lapp0/exllamav2@ce08f16674a67dac9ea6a770650eb02248b8364a
fastparquet==2024.11.0
filelock==3.16.1
flash_attn @ file:///C:/Users/legio/Desktop/llm_sm/flash_attn-2.6.3%2Bcu122torch2.4.1cxx11abiFALSE-cp310-cp310-win_amd64.whl#sha256=0eea9204c7b67d3e5829f10fcce05e11d14d7f264c28c39f24a9357ea76e5601
frozendict==2.4.6
fsspec==2024.10.0
h11==0.14.0
httpcore==1.0.7
httpx==0.27.2
huggingface-hub==0.26.3
idna==3.10
importlib_metadata @ file:///home/conda/feedstock_root/build_artifacts/importlib-metadata_1726082825846/work
interegular==0.3.3
ipykernel @ file:///D:/bld/ipykernel_1719845595208/work
ipython @ file:///D:/bld/ipython_1729866374643/work
jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1731317204262/work
Jinja2==3.1.4
jiter==0.7.1
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
jupyter_client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1726610684920/work
jupyter_core @ file:///D:/bld/jupyter_core_1710257272359/work
lark==1.2.2
lazyops==0.2.84
loguru==0.7.2
markdown-it-py==3.0.0
MarkupSafe==3.0.2
matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1713250518406/work
mdurl==0.1.2
MooreLLM==0.1.7
mpmath==1.3.0
nest_asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1705850609492/work
networkx==3.4.2
ninja==1.11.1.2
numpy==2.1.3
openai==1.54.5
outlines==0.1.7
outlines_core==0.1.17
packaging @ file:///home/conda/feedstock_root/build_artifacts/packaging_1731802491770/work
pandas==2.2.3
parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1712320355065/work
pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602536217715/work
platformdirs @ file:///home/conda/feedstock_root/build_artifacts/platformdirs_1726613481435/work
prompt_toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1727341649933/work
psutil @ file:///C:/Windows/Temp/abs_b2c2fd7f-9fd5-4756-95ea-8aed74d0039flsd9qufz/croots/recipe/psutil_1656431277748/work
pure_eval @ file:///home/conda/feedstock_root/build_artifacts/pure_eval_1721585709575/work
pycountry==24.6.1
pydantic==2.9.2
pydantic-settings==2.6.1
pydantic_core==2.23.4
Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1714846767233/work
python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1731919281354/work
python-dotenv==1.0.1
pytz==2024.2
pywin32==305.1
PyYAML==6.0.2
pyzmq @ file:///D:/bld/pyzmq_1666828590571/work
referencing==0.35.1
regex==2024.11.6
requests==2.32.3
rich==13.9.4
rpds-py==0.22.0
safetensors==0.4.5
sentencepiece==0.2.0
six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work
sniffio==1.3.1
stack-data @ file:///home/conda/feedstock_root/build_artifacts/stack_data_1669632077133/work
sympy==1.13.1
tiktoken==0.8.0
tokenizers==0.20.3
torch==2.4.1+cu121
tornado @ file:///D:/bld/tornado_1666788744359/work
tqdm==4.67.0
traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1713535121073/work
transformers==4.46.3
typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1717802530399/work
tzdata==2024.2
urllib3==2.2.3
wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1704731205417/work
websockets==14.1
win32-setctime==1.1.0
zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1731262100163/work

</details>


### Context for the issue:

_No response_
@Qasimk555 Qasimk555 added the bug label Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant