Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokenizer WPM fixes for bert-bge and jina-v2-en #7500

Merged
merged 5 commits into from
May 28, 2024

Conversation

jaime-m-p
Copy link
Collaborator

Modifications to make WPM tokenizer match AutoTokenizer.

Tested with vocabs from models bert-bge and jina-v2-en.

@github-actions github-actions bot added testing Everything test related python python script changes labels May 23, 2024
@mofosyne mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label May 23, 2024
@jaime-m-p
Copy link
Collaborator Author

bert-bpe is failing with a simple "Hello world".

text = 'Hello world'

TokenIDs: [101, 7592, 11108, 102]  <-- llama.cpp
Expected: [101, 7592, 2088, 102]   <-- AutoTokenizer

["[CLS]", "hello", " world", "[SEP]"]  <-- llama.cpp
["[CLS]", "hello", "world",  "[SEP]"]  <-- AutoTokenizer

Seems that WPM does not follow this assumption:
https://github.com/ggerganov/llama.cpp/blob/74f33adf5f8b20b08fc5a6aa17ce081abe86ef2f/llama.cpp#L4685-L4688

vocab.special_tokens_cache ends with 7104 tokens, including " word", so the function tokenizer_st_partition() is directly matching " word" as token id 11108.

@ggerganov, Can't we just trust vocab.id_to_token[id].type? Instead of using as a final check.
https://github.com/ggerganov/llama.cpp/blob/74f33adf5f8b20b08fc5a6aa17ce081abe86ef2f/llama.cpp#L4704-L4706

There are a few more modifications to make, but this is the main problem.

@teleprint-me
Copy link
Contributor

teleprint-me commented May 23, 2024

@jaime-m-p Can you try out applying the GPT-2 pre-tokenizer instead of the "default" value that's used on llama.cpp#L12360? I think that might be part of the issue. It's a symptom I'm working on at the moment.

@iamlemec
Copy link
Collaborator

iamlemec commented May 23, 2024

The usual value for parse_special is False. That's what gets used in the embedding example. In that case, the special tokens cache isn't used in tokenization. I actually think that the special tokens cache approach may not be desirable for the WPM tokenizer. Considering we're picking up ~7000 of them for most embedding models (versus 5 actual). This is also the reason the server embeddings don't seem to work correctly.

@jaime-m-p
Copy link
Collaborator Author

@jaime-m-p Can you try out applying the GPT-2 pre-tokenizer instead of the "default" value that's used on llama.cpp#L12360? I think that might be part of the issue. It's a symptom I'm working on at the moment.

Do you mean apply the GPT-2 regex split, then the WPM processing? Actually the WPM is not using any explicit regex split.
Or do you want a general BPE testing replacing default regex with GPT-2 regex?

I'm looking close to your PR.
If I don't miss your point, are you planning to store all tokenizer "flags" in the GGUF file?
If so, I think I can help you with the ohter half part (modify C++ tokenizers for using these "flags").

@teleprint-me
Copy link
Contributor

teleprint-me commented May 24, 2024

Do you mean apply the GPT-2 regex split, then the WPM processing? Actually the WPM is not using any explicit regex split.

Jina uses the BERT Pre-tokenizer. The BERT Pre-tokenizer inherits from the ByteLevel Pre-tokenizer. The ByteLevel Pre-tokenizer defaults to GPT-2 Pre-tokenizer. The Pre-tokenizer is responsible for partitioning its normalized substrings. If the substrings are split the right way, it should resolve the encoding issue. I haven't tested this my self yet, it's just a hypothesis based on my research so far.

Or do you want a general BPE testing replacing default regex with GPT-2 regex?

This sounds good. Whatever you think is best.

If I don't miss your point, are you planning to store all tokenizer "flags" in the GGUF file?

Yes, I am. I'm still working on this part, though.

If so, I think I can help you with the ohter half part (modify C++ tokenizers for using these "flags").

Awesome! I really appreciate that. I'm not as familiar with C++ as I am with C and Python.

@jaime-m-p
Copy link
Collaborator Author

jaime-m-p commented May 25, 2024

@teleprint-me I tested the default regex vs GPT-2 regex.

The models jina-* are already using GPT-2 regex.
https://github.com/ggerganov/llama.cpp/blob/902184dd3a9d6685e752b19027a48423742531db/llama.cpp#L4571-L4577

I forced it to use default regex for comparing.

LLAMA_VOCAB_PRE_TYPE_GPT2
  [0] "'s|'t|'re|'ve|'m|'ll|'d| ?\\p{L}+| ?\\p{N}+| ?[^\\s\\p{L}\\p{N}]+|\\s+(?!\\S)",

default:
  [1] "[\\p{P}\\$\\+<=>\\^~\\|]+",
  [2] "'s|'t|'re|'ve|'m|'ll|'d| ?\\p{L}+| ?\\p{N}+| ?[^\\s\\p{L}\\p{N}]+|\\s+(?!\\S)",
  [3] "\\p{N}+",
  [4] "[0-9][0-9][0-9]",

[2] is equal to [0].

[3] is executed after [2], splitting the optional preceding space in numbers from [2]. (DISCREPANCE1)

[4] is executed after [3], so numbers are splitted again in groups of 3 digits.(DISCREPANCE2)

[1] is NOT included in [2].
[2] "[^\\s\\p{L}\\p{N}]+" matches the all the [1] symbols (and more). (DISCREPANCE3)
[1] "\p{P}" matches the "'" and has preference over [2] "'s|'t|'re|'ve|'m|'ll|'d" so this is never reached.
[1] splits "'s" in two tokens, while [0] generates a single token (DISCREPANCE4).

@teleprint-me
Copy link
Contributor

teleprint-me commented May 25, 2024

Hm. Interesting. 🤔

The original expression used for GPT-2 is the following:

"'s|'t|'re|'ve|'m|'ll|'d| ?\p{L}+| ?\p{N}+| ?[^\s\p{L}\p{N}]+|\s+(?!\S)|\s+"

https://github.com/openai/gpt-2/blob/master/src/encoder.py#L53

The |\s+ might seem pointless, but this part of the regex decides whether to include spaces or not within segments based on the context.

The others should be omitted completely from pre-tok for testing purposes. This way, we at least have a control group as per tokenizers.

https://github.com/huggingface/tokenizers/blob/main/tokenizers/src/pre_tokenizers/byte_level.rs#L41

@jaime-m-p jaime-m-p force-pushed the tokenizer-wpm-fixes branch from e92c3f8 to f3f6c0a Compare May 27, 2024 18:24
Copy link
Contributor

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 545 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8639.8ms p(95)=20564.24ms fails=, finish reason: stop=491 truncated=54
  • Prompt processing (pp): avg=101.46tk/s p(95)=402.23tk/s
  • Token generation (tg): avg=32.27tk/s p(95)=45.81tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=tokenizer-wpm-fixes commit=f3f6c0a930c155f43ca7a3ce8ebb428dfc4116ed

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716844198 --> 1716844834
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 810.45, 810.45, 810.45, 810.45, 810.45, 788.81, 788.81, 788.81, 788.81, 788.81, 811.75, 811.75, 811.75, 811.75, 811.75, 854.17, 854.17, 854.17, 854.17, 854.17, 856.75, 856.75, 856.75, 856.75, 856.75, 859.82, 859.82, 859.82, 859.82, 859.82, 882.16, 882.16, 882.16, 882.16, 882.16, 875.0, 875.0, 875.0, 875.0, 875.0, 886.98, 886.98, 886.98, 886.98, 886.98, 910.28, 910.28, 910.28, 910.28, 910.28, 905.03, 905.03, 905.03, 905.03, 905.03, 829.25, 829.25, 829.25, 829.25, 829.25, 858.75, 858.75, 858.75, 858.75, 858.75, 878.25, 878.25, 878.25, 878.25, 878.25, 881.03, 881.03, 881.03, 881.03, 881.03, 884.98, 884.98, 884.98, 884.98, 884.98, 885.76, 885.76, 885.76, 885.76, 885.76, 882.25, 882.25, 882.25, 882.25, 882.25, 870.83, 870.83, 870.83, 870.83, 870.83, 871.3, 871.3, 871.3, 871.3, 871.3, 862.95, 862.95, 862.95, 862.95, 862.95, 863.98, 863.98, 863.98, 863.98, 863.98, 863.91, 863.91, 863.91, 863.91, 863.91, 858.69, 858.69, 858.69, 858.69, 858.69, 861.01, 861.01, 861.01, 861.01, 861.01, 871.05, 871.05, 871.05, 871.05, 871.05, 871.93, 871.93, 871.93, 871.93, 871.93, 871.82, 871.82, 871.82, 871.82, 871.82, 863.53, 863.53, 863.53, 863.53, 863.53, 868.93, 868.93, 868.93, 868.93, 868.93, 868.19, 868.19, 868.19, 868.19, 868.19, 866.39, 866.39, 866.39, 866.39, 866.39, 869.78, 869.78, 869.78, 869.78, 869.78, 874.22, 874.22, 874.22, 874.22, 874.22, 878.57, 878.57, 878.57, 878.57, 878.57, 889.89, 889.89, 889.89, 889.89, 889.89, 886.53, 886.53, 886.53, 886.53, 886.53, 886.4, 886.4, 886.4, 886.4, 886.4, 888.37, 888.37, 888.37, 888.37, 888.37, 890.25, 890.25, 890.25, 890.25, 890.25, 889.61, 889.61, 889.61, 889.61, 889.61, 861.95, 861.95, 861.95, 861.95, 861.95, 861.33, 861.33, 861.33, 861.33, 861.33, 860.25, 860.25, 860.25, 860.25, 860.25, 857.88, 857.88, 857.88, 857.88, 857.88, 857.73, 857.73, 857.73, 857.73, 857.73, 862.81, 862.81, 862.81, 862.81, 862.81, 861.88, 861.88, 861.88, 861.88, 861.88, 861.4, 861.4, 861.4, 861.4, 861.4, 859.9, 859.9, 859.9, 859.9, 859.9, 863.46, 863.46, 863.46, 863.46, 863.46, 865.41, 865.41, 865.41, 865.41, 865.41, 866.11, 866.11, 866.11, 866.11, 866.11, 869.38, 869.38, 869.38, 869.38, 869.38, 869.96, 869.96, 869.96, 869.96, 869.96, 868.91, 868.91, 868.91, 868.91, 868.91, 868.66, 868.66, 868.66, 868.66, 868.66, 869.56, 869.56, 869.56, 869.56, 869.56, 871.03, 871.03, 871.03, 871.03, 871.03, 872.84, 872.84, 872.84, 872.84, 872.84, 872.59, 872.59, 872.59, 872.59]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716844198 --> 1716844834
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 35.86, 35.86, 35.86, 35.86, 35.86, 31.98, 31.98, 31.98, 31.98, 31.98, 32.08, 32.08, 32.08, 32.08, 32.08, 33.0, 33.0, 33.0, 33.0, 33.0, 33.46, 33.46, 33.46, 33.46, 33.46, 34.56, 34.56, 34.56, 34.56, 34.56, 35.06, 35.06, 35.06, 35.06, 35.06, 35.41, 35.41, 35.41, 35.41, 35.41, 34.96, 34.96, 34.96, 34.96, 34.96, 34.42, 34.42, 34.42, 34.42, 34.42, 34.35, 34.35, 34.35, 34.35, 34.35, 34.1, 34.1, 34.1, 34.1, 34.1, 33.55, 33.55, 33.55, 33.55, 33.55, 33.05, 33.05, 33.05, 33.05, 33.05, 30.49, 30.49, 30.49, 30.49, 30.49, 30.02, 30.02, 30.02, 30.02, 30.02, 29.93, 29.93, 29.93, 29.93, 29.93, 30.26, 30.26, 30.26, 30.26, 30.26, 30.26, 30.26, 30.26, 30.26, 30.26, 30.02, 30.02, 30.02, 30.02, 30.02, 30.19, 30.19, 30.19, 30.19, 30.19, 30.28, 30.28, 30.28, 30.28, 30.28, 30.47, 30.47, 30.47, 30.47, 30.47, 30.13, 30.13, 30.13, 30.13, 30.13, 30.34, 30.34, 30.34, 30.34, 30.34, 30.64, 30.64, 30.64, 30.64, 30.64, 30.47, 30.47, 30.47, 30.47, 30.47, 30.41, 30.41, 30.41, 30.41, 30.41, 30.38, 30.38, 30.38, 30.38, 30.38, 30.62, 30.62, 30.62, 30.62, 30.62, 30.67, 30.67, 30.67, 30.67, 30.67, 30.82, 30.82, 30.82, 30.82, 30.82, 30.94, 30.94, 30.94, 30.94, 30.94, 30.9, 30.9, 30.9, 30.9, 30.9, 30.87, 30.87, 30.87, 30.87, 30.87, 30.72, 30.72, 30.72, 30.72, 30.72, 30.47, 30.47, 30.47, 30.47, 30.47, 30.61, 30.61, 30.61, 30.61, 30.61, 30.8, 30.8, 30.8, 30.8, 30.8, 30.97, 30.97, 30.97, 30.97, 30.97, 31.03, 31.03, 31.03, 31.03, 31.03, 31.1, 31.1, 31.1, 31.1, 31.1, 30.69, 30.69, 30.69, 30.69, 30.69, 30.49, 30.49, 30.49, 30.49, 30.49, 30.3, 30.3, 30.3, 30.3, 30.3, 29.36, 29.36, 29.36, 29.36, 29.36, 29.33, 29.33, 29.33, 29.33, 29.33, 29.34, 29.34, 29.34, 29.34, 29.34, 29.37, 29.37, 29.37, 29.37, 29.37, 29.41, 29.41, 29.41, 29.41, 29.41, 29.51, 29.51, 29.51, 29.51, 29.51, 29.56, 29.56, 29.56, 29.56, 29.56, 29.49, 29.49, 29.49, 29.49, 29.49, 29.5, 29.5, 29.5, 29.5, 29.5, 29.36, 29.36, 29.36, 29.36, 29.36, 29.41, 29.41, 29.41, 29.41, 29.41, 29.55, 29.55, 29.55, 29.55, 29.55, 29.66, 29.66, 29.66, 29.66, 29.66, 29.74, 29.74, 29.74, 29.74, 29.74, 29.8, 29.8, 29.8, 29.8, 29.8, 29.77, 29.77, 29.77, 29.77]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716844198 --> 1716844834
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.35, 0.35, 0.35, 0.35, 0.35, 0.25, 0.25, 0.25, 0.25, 0.25, 0.12, 0.12, 0.12, 0.12, 0.12, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.13, 0.13, 0.13, 0.13, 0.13, 0.18, 0.18, 0.18, 0.18, 0.18, 0.14, 0.14, 0.14, 0.14, 0.14, 0.25, 0.25, 0.25, 0.25, 0.25, 0.2, 0.2, 0.2, 0.2, 0.2, 0.16, 0.16, 0.16, 0.16, 0.16, 0.28, 0.28, 0.28, 0.28, 0.28, 0.22, 0.22, 0.22, 0.22, 0.22, 0.37, 0.37, 0.37, 0.37, 0.37, 0.35, 0.35, 0.35, 0.35, 0.35, 0.22, 0.22, 0.22, 0.22, 0.22, 0.18, 0.18, 0.18, 0.18, 0.18, 0.16, 0.16, 0.16, 0.16, 0.16, 0.31, 0.31, 0.31, 0.31, 0.31, 0.22, 0.22, 0.22, 0.22, 0.22, 0.18, 0.18, 0.18, 0.18, 0.18, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.09, 0.09, 0.09, 0.09, 0.09, 0.13, 0.13, 0.13, 0.13, 0.13, 0.19, 0.19, 0.19, 0.19, 0.19, 0.3, 0.3, 0.3, 0.3, 0.3, 0.24, 0.24, 0.24, 0.24, 0.24, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.21, 0.21, 0.21, 0.21, 0.21, 0.09, 0.09, 0.09, 0.09, 0.09, 0.18, 0.18, 0.18, 0.18, 0.18, 0.23, 0.23, 0.23, 0.23, 0.23, 0.24, 0.24, 0.24, 0.24, 0.24, 0.26, 0.26, 0.26, 0.26, 0.26, 0.09, 0.09, 0.09, 0.09, 0.09, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.18, 0.18, 0.18, 0.18, 0.18, 0.38, 0.38, 0.38, 0.38, 0.38, 0.69, 0.69, 0.69, 0.69, 0.69, 0.58, 0.58, 0.58, 0.58, 0.58, 0.46, 0.46, 0.46, 0.46, 0.46, 0.16, 0.16, 0.16, 0.16, 0.16, 0.24, 0.24, 0.24, 0.24, 0.24, 0.19, 0.19, 0.19, 0.19, 0.19, 0.25, 0.25, 0.25, 0.25, 0.25, 0.11, 0.11, 0.11, 0.11, 0.11, 0.21, 0.21, 0.21, 0.21, 0.21, 0.29, 0.29, 0.29, 0.29, 0.29, 0.15, 0.15, 0.15, 0.15, 0.15, 0.26, 0.26, 0.26, 0.26, 0.26, 0.18, 0.18, 0.18, 0.18, 0.18, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.24, 0.24, 0.24, 0.24, 0.24, 0.3, 0.3, 0.3, 0.3]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716844198 --> 1716844834
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0]
                    
Loading

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggerganov, Can't we just trust vocab.id_to_token[id].type?

I think so

@jaime-m-p jaime-m-p merged commit 02c1eca into ggml-org:master May 28, 2024
71 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants