Tokenizer WPM fixes for bert-bge and jina-v2-en #7500

jaime-m-p · 2024-05-23T18:26:20Z

Modifications to make WPM tokenizer match AutoTokenizer.

Tested with vocabs from models bert-bge and jina-v2-en.

jaime-m-p · 2024-05-23T19:35:27Z

bert-bpe is failing with a simple "Hello world".

text = 'Hello world'

TokenIDs: [101, 7592, 11108, 102]  <-- llama.cpp
Expected: [101, 7592, 2088, 102]   <-- AutoTokenizer

["[CLS]", "hello", " world", "[SEP]"]  <-- llama.cpp
["[CLS]", "hello", "world",  "[SEP]"]  <-- AutoTokenizer

Seems that WPM does not follow this assumption:
https://github.com/ggerganov/llama.cpp/blob/74f33adf5f8b20b08fc5a6aa17ce081abe86ef2f/llama.cpp#L4685-L4688

vocab.special_tokens_cache ends with 7104 tokens, including " word", so the function tokenizer_st_partition() is directly matching " word" as token id 11108.

@ggerganov, Can't we just trust vocab.id_to_token[id].type? Instead of using as a final check.
https://github.com/ggerganov/llama.cpp/blob/74f33adf5f8b20b08fc5a6aa17ce081abe86ef2f/llama.cpp#L4704-L4706

There are a few more modifications to make, but this is the main problem.

teleprint-me · 2024-05-23T20:35:21Z

@jaime-m-p Can you try out applying the GPT-2 pre-tokenizer instead of the "default" value that's used on llama.cpp#L12360? I think that might be part of the issue. It's a symptom I'm working on at the moment.

iamlemec · 2024-05-23T21:54:54Z

The usual value for parse_special is False. That's what gets used in the embedding example. In that case, the special tokens cache isn't used in tokenization. I actually think that the special tokens cache approach may not be desirable for the WPM tokenizer. Considering we're picking up ~7000 of them for most embedding models (versus 5 actual). This is also the reason the server embeddings don't seem to work correctly.

jaime-m-p · 2024-05-24T17:47:22Z

@jaime-m-p Can you try out applying the GPT-2 pre-tokenizer instead of the "default" value that's used on llama.cpp#L12360? I think that might be part of the issue. It's a symptom I'm working on at the moment.

Do you mean apply the GPT-2 regex split, then the WPM processing? Actually the WPM is not using any explicit regex split.
Or do you want a general BPE testing replacing default regex with GPT-2 regex?

I'm looking close to your PR.
If I don't miss your point, are you planning to store all tokenizer "flags" in the GGUF file?
If so, I think I can help you with the ohter half part (modify C++ tokenizers for using these "flags").

teleprint-me · 2024-05-24T18:44:51Z

Do you mean apply the GPT-2 regex split, then the WPM processing? Actually the WPM is not using any explicit regex split.

Jina uses the BERT Pre-tokenizer. The BERT Pre-tokenizer inherits from the ByteLevel Pre-tokenizer. The ByteLevel Pre-tokenizer defaults to GPT-2 Pre-tokenizer. The Pre-tokenizer is responsible for partitioning its normalized substrings. If the substrings are split the right way, it should resolve the encoding issue. I haven't tested this my self yet, it's just a hypothesis based on my research so far.

Or do you want a general BPE testing replacing default regex with GPT-2 regex?

This sounds good. Whatever you think is best.

If I don't miss your point, are you planning to store all tokenizer "flags" in the GGUF file?

Yes, I am. I'm still working on this part, though.

If so, I think I can help you with the ohter half part (modify C++ tokenizers for using these "flags").

Awesome! I really appreciate that. I'm not as familiar with C++ as I am with C and Python.

jaime-m-p · 2024-05-25T04:04:25Z

@teleprint-me I tested the default regex vs GPT-2 regex.

The models jina-* are already using GPT-2 regex.
https://github.com/ggerganov/llama.cpp/blob/902184dd3a9d6685e752b19027a48423742531db/llama.cpp#L4571-L4577

I forced it to use default regex for comparing.

LLAMA_VOCAB_PRE_TYPE_GPT2
  [0] "'s|'t|'re|'ve|'m|'ll|'d| ?\\p{L}+| ?\\p{N}+| ?[^\\s\\p{L}\\p{N}]+|\\s+(?!\\S)",

default:
  [1] "[\\p{P}\\$\\+<=>\\^~\\|]+",
  [2] "'s|'t|'re|'ve|'m|'ll|'d| ?\\p{L}+| ?\\p{N}+| ?[^\\s\\p{L}\\p{N}]+|\\s+(?!\\S)",
  [3] "\\p{N}+",
  [4] "[0-9][0-9][0-9]",

[2] is equal to [0].

[3] is executed after [2], splitting the optional preceding space in numbers from [2]. (DISCREPANCE1)

[4] is executed after [3], so numbers are splitted again in groups of 3 digits.(DISCREPANCE2)

[1] is NOT included in [2].
[2] "[^\\s\\p{L}\\p{N}]+" matches the all the [1] symbols (and more). (DISCREPANCE3)
[1] "\p{P}" matches the "'" and has preference over [2] "'s|'t|'re|'ve|'m|'ll|'d" so this is never reached.
[1] splits "'s" in two tokens, while [0] generates a single token (DISCREPANCE4).

teleprint-me · 2024-05-25T19:50:02Z

Hm. Interesting. 🤔

The original expression used for GPT-2 is the following:

"'s|'t|'re|'ve|'m|'ll|'d| ?\p{L}+| ?\p{N}+| ?[^\s\p{L}\p{N}]+|\s+(?!\S)|\s+"

https://github.com/openai/gpt-2/blob/master/src/encoder.py#L53

The |\s+ might seem pointless, but this part of the regex decides whether to include spaces or not within segments based on the context.

The others should be omitted completely from pre-tok for testing purposes. This way, we at least have a control group as per tokenizers.

https://github.com/huggingface/tokenizers/blob/main/tokenizers/src/pre_tokenizers/byte_level.rs#L41

Fix unicode edge case combinations. Split by whitspace in the same pass.

github-actions · 2024-05-27T21:20:40Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 545 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8639.8ms p(95)=20564.24ms fails=, finish reason: stop=491 truncated=54
Prompt processing (pp): avg=101.46tk/s p(95)=402.23tk/s
Token generation (tg): avg=32.27tk/s p(95)=45.81tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=tokenizer-wpm-fixes commit=f3f6c0a930c155f43ca7a3ce8ebb428dfc4116ed

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716844198 --> 1716844834
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 810.45, 810.45, 810.45, 810.45, 810.45, 788.81, 788.81, 788.81, 788.81, 788.81, 811.75, 811.75, 811.75, 811.75, 811.75, 854.17, 854.17, 854.17, 854.17, 854.17, 856.75, 856.75, 856.75, 856.75, 856.75, 859.82, 859.82, 859.82, 859.82, 859.82, 882.16, 882.16, 882.16, 882.16, 882.16, 875.0, 875.0, 875.0, 875.0, 875.0, 886.98, 886.98, 886.98, 886.98, 886.98, 910.28, 910.28, 910.28, 910.28, 910.28, 905.03, 905.03, 905.03, 905.03, 905.03, 829.25, 829.25, 829.25, 829.25, 829.25, 858.75, 858.75, 858.75, 858.75, 858.75, 878.25, 878.25, 878.25, 878.25, 878.25, 881.03, 881.03, 881.03, 881.03, 881.03, 884.98, 884.98, 884.98, 884.98, 884.98, 885.76, 885.76, 885.76, 885.76, 885.76, 882.25, 882.25, 882.25, 882.25, 882.25, 870.83, 870.83, 870.83, 870.83, 870.83, 871.3, 871.3, 871.3, 871.3, 871.3, 862.95, 862.95, 862.95, 862.95, 862.95, 863.98, 863.98, 863.98, 863.98, 863.98, 863.91, 863.91, 863.91, 863.91, 863.91, 858.69, 858.69, 858.69, 858.69, 858.69, 861.01, 861.01, 861.01, 861.01, 861.01, 871.05, 871.05, 871.05, 871.05, 871.05, 871.93, 871.93, 871.93, 871.93, 871.93, 871.82, 871.82, 871.82, 871.82, 871.82, 863.53, 863.53, 863.53, 863.53, 863.53, 868.93, 868.93, 868.93, 868.93, 868.93, 868.19, 868.19, 868.19, 868.19, 868.19, 866.39, 866.39, 866.39, 866.39, 866.39, 869.78, 869.78, 869.78, 869.78, 869.78, 874.22, 874.22, 874.22, 874.22, 874.22, 878.57, 878.57, 878.57, 878.57, 878.57, 889.89, 889.89, 889.89, 889.89, 889.89, 886.53, 886.53, 886.53, 886.53, 886.53, 886.4, 886.4, 886.4, 886.4, 886.4, 888.37, 888.37, 888.37, 888.37, 888.37, 890.25, 890.25, 890.25, 890.25, 890.25, 889.61, 889.61, 889.61, 889.61, 889.61, 861.95, 861.95, 861.95, 861.95, 861.95, 861.33, 861.33, 861.33, 861.33, 861.33, 860.25, 860.25, 860.25, 860.25, 860.25, 857.88, 857.88, 857.88, 857.88, 857.88, 857.73, 857.73, 857.73, 857.73, 857.73, 862.81, 862.81, 862.81, 862.81, 862.81, 861.88, 861.88, 861.88, 861.88, 861.88, 861.4, 861.4, 861.4, 861.4, 861.4, 859.9, 859.9, 859.9, 859.9, 859.9, 863.46, 863.46, 863.46, 863.46, 863.46, 865.41, 865.41, 865.41, 865.41, 865.41, 866.11, 866.11, 866.11, 866.11, 866.11, 869.38, 869.38, 869.38, 869.38, 869.38, 869.96, 869.96, 869.96, 869.96, 869.96, 868.91, 868.91, 868.91, 868.91, 868.91, 868.66, 868.66, 868.66, 868.66, 868.66, 869.56, 869.56, 869.56, 869.56, 869.56, 871.03, 871.03, 871.03, 871.03, 871.03, 872.84, 872.84, 872.84, 872.84, 872.84, 872.59, 872.59, 872.59, 872.59]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716844198 --> 1716844834
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 35.86, 35.86, 35.86, 35.86, 35.86, 31.98, 31.98, 31.98, 31.98, 31.98, 32.08, 32.08, 32.08, 32.08, 32.08, 33.0, 33.0, 33.0, 33.0, 33.0, 33.46, 33.46, 33.46, 33.46, 33.46, 34.56, 34.56, 34.56, 34.56, 34.56, 35.06, 35.06, 35.06, 35.06, 35.06, 35.41, 35.41, 35.41, 35.41, 35.41, 34.96, 34.96, 34.96, 34.96, 34.96, 34.42, 34.42, 34.42, 34.42, 34.42, 34.35, 34.35, 34.35, 34.35, 34.35, 34.1, 34.1, 34.1, 34.1, 34.1, 33.55, 33.55, 33.55, 33.55, 33.55, 33.05, 33.05, 33.05, 33.05, 33.05, 30.49, 30.49, 30.49, 30.49, 30.49, 30.02, 30.02, 30.02, 30.02, 30.02, 29.93, 29.93, 29.93, 29.93, 29.93, 30.26, 30.26, 30.26, 30.26, 30.26, 30.26, 30.26, 30.26, 30.26, 30.26, 30.02, 30.02, 30.02, 30.02, 30.02, 30.19, 30.19, 30.19, 30.19, 30.19, 30.28, 30.28, 30.28, 30.28, 30.28, 30.47, 30.47, 30.47, 30.47, 30.47, 30.13, 30.13, 30.13, 30.13, 30.13, 30.34, 30.34, 30.34, 30.34, 30.34, 30.64, 30.64, 30.64, 30.64, 30.64, 30.47, 30.47, 30.47, 30.47, 30.47, 30.41, 30.41, 30.41, 30.41, 30.41, 30.38, 30.38, 30.38, 30.38, 30.38, 30.62, 30.62, 30.62, 30.62, 30.62, 30.67, 30.67, 30.67, 30.67, 30.67, 30.82, 30.82, 30.82, 30.82, 30.82, 30.94, 30.94, 30.94, 30.94, 30.94, 30.9, 30.9, 30.9, 30.9, 30.9, 30.87, 30.87, 30.87, 30.87, 30.87, 30.72, 30.72, 30.72, 30.72, 30.72, 30.47, 30.47, 30.47, 30.47, 30.47, 30.61, 30.61, 30.61, 30.61, 30.61, 30.8, 30.8, 30.8, 30.8, 30.8, 30.97, 30.97, 30.97, 30.97, 30.97, 31.03, 31.03, 31.03, 31.03, 31.03, 31.1, 31.1, 31.1, 31.1, 31.1, 30.69, 30.69, 30.69, 30.69, 30.69, 30.49, 30.49, 30.49, 30.49, 30.49, 30.3, 30.3, 30.3, 30.3, 30.3, 29.36, 29.36, 29.36, 29.36, 29.36, 29.33, 29.33, 29.33, 29.33, 29.33, 29.34, 29.34, 29.34, 29.34, 29.34, 29.37, 29.37, 29.37, 29.37, 29.37, 29.41, 29.41, 29.41, 29.41, 29.41, 29.51, 29.51, 29.51, 29.51, 29.51, 29.56, 29.56, 29.56, 29.56, 29.56, 29.49, 29.49, 29.49, 29.49, 29.49, 29.5, 29.5, 29.5, 29.5, 29.5, 29.36, 29.36, 29.36, 29.36, 29.36, 29.41, 29.41, 29.41, 29.41, 29.41, 29.55, 29.55, 29.55, 29.55, 29.55, 29.66, 29.66, 29.66, 29.66, 29.66, 29.74, 29.74, 29.74, 29.74, 29.74, 29.8, 29.8, 29.8, 29.8, 29.8, 29.77, 29.77, 29.77, 29.77]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716844198 --> 1716844834
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.35, 0.35, 0.35, 0.35, 0.35, 0.25, 0.25, 0.25, 0.25, 0.25, 0.12, 0.12, 0.12, 0.12, 0.12, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.13, 0.13, 0.13, 0.13, 0.13, 0.18, 0.18, 0.18, 0.18, 0.18, 0.14, 0.14, 0.14, 0.14, 0.14, 0.25, 0.25, 0.25, 0.25, 0.25, 0.2, 0.2, 0.2, 0.2, 0.2, 0.16, 0.16, 0.16, 0.16, 0.16, 0.28, 0.28, 0.28, 0.28, 0.28, 0.22, 0.22, 0.22, 0.22, 0.22, 0.37, 0.37, 0.37, 0.37, 0.37, 0.35, 0.35, 0.35, 0.35, 0.35, 0.22, 0.22, 0.22, 0.22, 0.22, 0.18, 0.18, 0.18, 0.18, 0.18, 0.16, 0.16, 0.16, 0.16, 0.16, 0.31, 0.31, 0.31, 0.31, 0.31, 0.22, 0.22, 0.22, 0.22, 0.22, 0.18, 0.18, 0.18, 0.18, 0.18, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.09, 0.09, 0.09, 0.09, 0.09, 0.13, 0.13, 0.13, 0.13, 0.13, 0.19, 0.19, 0.19, 0.19, 0.19, 0.3, 0.3, 0.3, 0.3, 0.3, 0.24, 0.24, 0.24, 0.24, 0.24, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.21, 0.21, 0.21, 0.21, 0.21, 0.09, 0.09, 0.09, 0.09, 0.09, 0.18, 0.18, 0.18, 0.18, 0.18, 0.23, 0.23, 0.23, 0.23, 0.23, 0.24, 0.24, 0.24, 0.24, 0.24, 0.26, 0.26, 0.26, 0.26, 0.26, 0.09, 0.09, 0.09, 0.09, 0.09, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.18, 0.18, 0.18, 0.18, 0.18, 0.38, 0.38, 0.38, 0.38, 0.38, 0.69, 0.69, 0.69, 0.69, 0.69, 0.58, 0.58, 0.58, 0.58, 0.58, 0.46, 0.46, 0.46, 0.46, 0.46, 0.16, 0.16, 0.16, 0.16, 0.16, 0.24, 0.24, 0.24, 0.24, 0.24, 0.19, 0.19, 0.19, 0.19, 0.19, 0.25, 0.25, 0.25, 0.25, 0.25, 0.11, 0.11, 0.11, 0.11, 0.11, 0.21, 0.21, 0.21, 0.21, 0.21, 0.29, 0.29, 0.29, 0.29, 0.29, 0.15, 0.15, 0.15, 0.15, 0.15, 0.26, 0.26, 0.26, 0.26, 0.26, 0.18, 0.18, 0.18, 0.18, 0.18, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.24, 0.24, 0.24, 0.24, 0.24, 0.3, 0.3, 0.3, 0.3]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716844198 --> 1716844834
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0]

ggerganov

@ggerganov, Can't we just trust vocab.id_to_token[id].type?

I think so

github-actions bot added testing Everything test related python python script changes labels May 23, 2024

mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label May 23, 2024

jaime-m-p added 5 commits May 27, 2024 18:51

Update random test: add_bos_token

2a38e5f

Add WPM models for testing

af45703

Build vocab.special_tokens_cache using vocab token types

938cb49

Fix and improve preprocessing

117b091

Fix unicode edge case combinations. Split by whitspace in the same pass.

Discard all tokens when no matching found

f3f6c0a

jaime-m-p force-pushed the tokenizer-wpm-fixes branch from e92c3f8 to f3f6c0a Compare May 27, 2024 18:24

ggerganov approved these changes May 28, 2024

View reviewed changes

jaime-m-p merged commit 02c1eca into ggml-org:master May 28, 2024
71 checks passed

ggerganov mentioned this pull request Jun 19, 2024

Bug: b3028 breaks mixtral 8x22b #7969

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tokenizer WPM fixes for bert-bge and jina-v2-en #7500

Tokenizer WPM fixes for bert-bge and jina-v2-en #7500

jaime-m-p commented May 23, 2024

jaime-m-p commented May 23, 2024

teleprint-me commented May 23, 2024 •

edited

Loading

iamlemec commented May 23, 2024 •

edited

Loading

jaime-m-p commented May 24, 2024

teleprint-me commented May 24, 2024 •

edited

Loading

jaime-m-p commented May 25, 2024 •

edited

Loading

teleprint-me commented May 25, 2024 •

edited

Loading

github-actions bot commented May 27, 2024

ggerganov left a comment

Tokenizer WPM fixes for bert-bge and jina-v2-en #7500

Tokenizer WPM fixes for bert-bge and jina-v2-en #7500

Conversation

jaime-m-p commented May 23, 2024

jaime-m-p commented May 23, 2024

teleprint-me commented May 23, 2024 • edited Loading

iamlemec commented May 23, 2024 • edited Loading

jaime-m-p commented May 24, 2024

teleprint-me commented May 24, 2024 • edited Loading

jaime-m-p commented May 25, 2024 • edited Loading

teleprint-me commented May 25, 2024 • edited Loading

github-actions bot commented May 27, 2024

ggerganov left a comment

Choose a reason for hiding this comment

teleprint-me commented May 23, 2024 •

edited

Loading

iamlemec commented May 23, 2024 •

edited

Loading

teleprint-me commented May 24, 2024 •

edited

Loading

jaime-m-p commented May 25, 2024 •

edited

Loading

teleprint-me commented May 25, 2024 •

edited

Loading