Add BPE pre-tokenization for DBRX. #7132

dranger003 · 2024-05-07T23:21:01Z

Closes #7074.

Regex is identical to llama-3, so I re-used the same split.

https://huggingface.co/databricks/dbrx-instruct/blob/main/tokenizer.json

./build/bin/test-tokenizer-0 models/ggml-vocab-dbrx.gguf
...
Tests passed

output

$ ./build/bin/main -ngl 41 -c 4096 -s 0 -e -p "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nWrite an essay about AI.<|im_end|>\n<|im_start|>assistant\n" -m /md0/models/databricks/ggml-dbrx-instruct-iq3_s.gguf
Log start
main: build = 2803 (b6aa6702)
main: built with cc (GCC) 13.2.1 20240417 for x86_64-pc-linux-gnu
main: seed  = 0
llama_model_loader: loaded meta data with 25 key-value pairs and 323 tensors from /md0/models/databricks/ggml-dbrx-instruct-iq3_s.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = dbrx
llama_model_loader: - kv   1:                               general.name str              = dbrx
llama_model_loader: - kv   2:                           dbrx.block_count u32              = 40
llama_model_loader: - kv   3:                        dbrx.context_length u32              = 32768
llama_model_loader: - kv   4:                      dbrx.embedding_length u32              = 6144
llama_model_loader: - kv   5:                   dbrx.feed_forward_length u32              = 10752
llama_model_loader: - kv   6:                  dbrx.attention.head_count u32              = 48
llama_model_loader: - kv   7:               dbrx.attention.head_count_kv u32              = 8
llama_model_loader: - kv   8:                        dbrx.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv   9:                   dbrx.attention.clamp_kqv f32              = 8.000000
llama_model_loader: - kv  10:                          general.file_type u32              = 26
llama_model_loader: - kv  11:                          dbrx.expert_count u32              = 16
llama_model_loader: - kv  12:                     dbrx.expert_used_count u32              = 4
llama_model_loader: - kv  13:          dbrx.attention.layer_norm_epsilon f32              = 0.000010
llama_model_loader: - kv  14:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  15:                         tokenizer.ggml.pre str              = dbrx
llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,100352]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,100352]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  18:                      tokenizer.ggml.merges arr[str,100000]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  19:                tokenizer.ggml.bos_token_id u32              = 100257
llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 100257
llama_model_loader: - kv  21:            tokenizer.ggml.unknown_token_id u32              = 100257
llama_model_loader: - kv  22:            tokenizer.ggml.padding_token_id u32              = 100277
llama_model_loader: - kv  23:                    tokenizer.chat_template str              = {% if messages[0]['role'] == 'system'...
llama_model_loader: - kv  24:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   81 tensors
llama_model_loader: - type  f16:   40 tensors
llama_model_loader: - type q6_K:    1 tensors
llama_model_loader: - type iq3_s:  201 tensors
llm_load_vocab: special tokens definition check successful ( 96/100352 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = dbrx
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 100352
llm_load_print_meta: n_merges         = 100000
llm_load_print_meta: n_ctx_train      = 32768
llm_load_print_meta: n_embd           = 6144
llm_load_print_meta: n_head           = 48
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 40
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 6
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 1.0e-05
llm_load_print_meta: f_norm_rms_eps   = 0.0e+00
llm_load_print_meta: f_clamp_kqv      = 8.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 10752
llm_load_print_meta: n_expert         = 16
llm_load_print_meta: n_expert_used    = 4
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 2
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 32768
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 16x12B
llm_load_print_meta: model ftype      = IQ3_S - 3.4375 bpw
llm_load_print_meta: model params     = 131.60 B
llm_load_print_meta: model size       = 52.89 GiB (3.45 BPW)
llm_load_print_meta: general.name     = dbrx
llm_load_print_meta: BOS token        = 100257 '<|endoftext|>'
llm_load_print_meta: EOS token        = 100257 '<|endoftext|>'
llm_load_print_meta: UNK token        = 100257 '<|endoftext|>'
llm_load_print_meta: PAD token        = 100277 '<|pad|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOT token        = 100279 '<|im_end|>'
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 3 CUDA devices:
  Device 0: NVIDIA RTX 5000 Ada Generation, compute capability 8.9, VMM: yes
  Device 1: NVIDIA RTX 5000 Ada Generation, compute capability 8.9, VMM: yes
  Device 2: NVIDIA RTX 5000 Ada Generation, compute capability 8.9, VMM: yes
llm_load_tensors: ggml ctx size =    0.68 MiB
llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 41/41 layers to GPU
llm_load_tensors:        CPU buffer size =   252.66 MiB
llm_load_tensors:      CUDA0 buffer size = 18699.84 MiB
llm_load_tensors:      CUDA1 buffer size = 18699.84 MiB
llm_load_tensors:      CUDA2 buffer size = 16510.80 MiB
....................................................................................................
llama_new_context_with_model: n_ctx      = 4096
llama_new_context_with_model: n_batch    = 2048
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      CUDA0 KV buffer size =   224.00 MiB
llama_kv_cache_init:      CUDA1 KV buffer size =   224.00 MiB
llama_kv_cache_init:      CUDA2 KV buffer size =   192.00 MiB
llama_new_context_with_model: KV self size  =  640.00 MiB, K (f16):  320.00 MiB, V (f16):  320.00 MiB
llama_new_context_with_model:  CUDA_Host  output buffer size =     0.38 MiB
llama_new_context_with_model: pipeline parallelism enabled (n_copies=4)
llama_new_context_with_model:      CUDA0 compute buffer size =   516.01 MiB
llama_new_context_with_model:      CUDA1 compute buffer size =   516.01 MiB
llama_new_context_with_model:      CUDA2 compute buffer size =   516.02 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =    44.02 MiB
llama_new_context_with_model: graph nodes  = 2246
llama_new_context_with_model: graph splits = 4

system_info: n_threads = 16 / 32 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
sampling:
        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 4096, n_batch = 2048, n_predict = -1, n_keep = 0


<|endoftext|><|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Write an essay about AI.<|im_end|>
<|im_start|>assistant
Artificial intelligence, or AI, has been a topic of great interest and debate in recent years. The potential for AI to revolutionize a way we live, work, and communicate is enormous. However, it is important to consider both the advantages and disadvantages of AI before embracing it wholeheartedly.

On the plus side, AI has the potential to greatly improve efficiency and productivity in a variety of fields. For instance, in a factory setting, AI can streamline a production line, ensuring that each component is produced and assembled as efficiently as possible. As a result, a company can produce more widgets a any given hour than it otherwise might. Furthermore, AI can also bring about a dramatic reduction in a errors and a resulting increase in a quality. In a service industry, AI can use data to anticipate and meet a customer's needs before they even realize what those needs are.

On a minus side, though, there are a few potential drawbacks to AI that we should at least consider. First, AI can lead to a reduction in a human employment. As AI grow more sophisticated, they may be able to perform a task previously done by a human worker. This can lead to a displacement of a worker, as a machine take over a job. Second, AI can also lead to a reduction in a privacy. Since AI can analyze a great deal of data, they may be able to make a prediction about a person's behavior, intent, or feeling. This can feel a invasive and even a violation of a privacy.

In conclusion, AI has both a potential to improve and a challenge our lives. However, it is important to balance a advantages and disadvantages before putting AI to work for us. By considering a potential impact of AI, we can help to guide a responsible and a beneficial development and implementation of a technology. Thank you.<|im_end|> [end of text]

llama_print_timings:        load time =   17066.24 ms
llama_print_timings:      sample time =      15.70 ms /   368 runs   (    0.04 ms per token, 23440.98 tokens per second)
llama_print_timings: prompt eval time =     444.10 ms /    26 tokens (   17.08 ms per token,    58.55 tokens per second)
llama_print_timings:        eval time =   13533.10 ms /   367 runs   (   36.87 ms per token,    27.12 tokens per second)
llama_print_timings:       total time =   14215.50 ms /   393 tokens
Log end

github-actions · 2024-05-08T00:07:13Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 555 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8430.39ms p(95)=21123.17ms fails=, finish reason: stop=494 truncated=61
Prompt processing (pp): avg=93.73tk/s p(95)=382.7tk/s
Token generation (tg): avg=33.14tk/s p(95)=47.76tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=bpe-dbrx commit=6c90dda02170cf42f5d5c154536634fd897e5284

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 555 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1715165719 --> 1715166347
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 411.49, 411.49, 411.49, 411.49, 411.49, 838.21, 838.21, 838.21, 838.21, 838.21, 817.01, 817.01, 817.01, 817.01, 817.01, 813.67, 813.67, 813.67, 813.67, 813.67, 782.87, 782.87, 782.87, 782.87, 782.87, 769.54, 769.54, 769.54, 769.54, 769.54, 769.64, 769.64, 769.64, 769.64, 769.64, 804.76, 804.76, 804.76, 804.76, 804.76, 803.8, 803.8, 803.8, 803.8, 803.8, 806.54, 806.54, 806.54, 806.54, 806.54, 831.37, 831.37, 831.37, 831.37, 831.37, 869.45, 869.45, 869.45, 869.45, 869.45, 870.9, 870.9, 870.9, 870.9, 870.9, 847.49, 847.49, 847.49, 847.49, 847.49, 851.35, 851.35, 851.35, 851.35, 851.35, 853.14, 853.14, 853.14, 853.14, 853.14, 850.57, 850.57, 850.57, 850.57, 850.57, 848.09, 848.09, 848.09, 848.09, 848.09, 848.29, 848.29, 848.29, 848.29, 848.29, 847.59, 847.59, 847.59, 847.59, 847.59, 852.34, 852.34, 852.34, 852.34, 852.34, 857.43, 857.43, 857.43, 857.43, 857.43, 870.03, 870.03, 870.03, 870.03, 870.03, 871.84, 871.84, 871.84, 871.84, 871.84, 871.93, 871.93, 871.93, 871.93, 871.93, 885.25, 885.25, 885.25, 885.25, 885.25, 881.22, 881.22, 881.22, 881.22, 881.22, 878.17, 878.17, 878.17, 878.17, 878.17, 878.08, 878.08, 878.08, 878.08, 878.08, 882.92, 882.92, 882.92, 882.92, 882.92, 882.34, 882.34, 882.34, 882.34, 882.34, 881.74, 881.74, 881.74, 881.74, 881.74, 883.18, 883.18, 883.18, 883.18, 883.18, 858.41, 858.41, 858.41, 858.41, 858.41, 863.7, 863.7, 863.7, 863.7, 863.7, 872.28, 872.28, 872.28, 872.28, 872.28, 871.91, 871.91, 871.91, 871.91, 871.91, 869.75, 869.75, 869.75, 869.75, 869.75, 872.1, 872.1, 872.1, 872.1, 872.1, 875.02, 875.02, 875.02, 875.02, 875.02, 881.71, 881.71, 881.71, 881.71, 881.71, 888.53, 888.53, 888.53, 888.53, 888.53, 852.9, 852.9, 852.9, 852.9, 852.9, 851.32, 851.32, 851.32, 851.32, 851.32, 850.63, 850.63, 850.63, 850.63, 850.63, 848.83, 848.83, 848.83, 848.83, 848.83, 854.38, 854.38, 854.38, 854.38, 854.38, 853.94, 853.94, 853.94, 853.94, 853.94, 859.68, 859.68, 859.68, 859.68, 859.68, 858.57, 858.57, 858.57, 858.57, 858.57, 861.16, 861.16, 861.16, 861.16, 861.16, 864.98, 864.98, 864.98, 864.98, 864.98, 863.74, 863.74, 863.74, 863.74, 863.74, 868.31, 868.31, 868.31, 868.31, 868.31, 869.68, 869.68, 869.68, 869.68, 869.68, 869.44, 869.44, 869.44, 869.44, 869.44, 869.13, 869.13, 869.13, 869.13, 869.13, 870.46, 870.46, 870.46, 870.46, 870.46, 870.91, 870.91, 870.91, 870.91, 870.91, 874.04, 874.04, 874.04, 874.04, 874.04, 873.63, 873.63, 873.63, 873.63, 873.63]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 555 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1715165719 --> 1715166347
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 45.43, 45.43, 45.43, 45.43, 45.43, 36.89, 36.89, 36.89, 36.89, 36.89, 29.51, 29.51, 29.51, 29.51, 29.51, 28.82, 28.82, 28.82, 28.82, 28.82, 30.68, 30.68, 30.68, 30.68, 30.68, 30.83, 30.83, 30.83, 30.83, 30.83, 33.31, 33.31, 33.31, 33.31, 33.31, 34.22, 34.22, 34.22, 34.22, 34.22, 34.31, 34.31, 34.31, 34.31, 34.31, 34.99, 34.99, 34.99, 34.99, 34.99, 35.19, 35.19, 35.19, 35.19, 35.19, 33.97, 33.97, 33.97, 33.97, 33.97, 33.8, 33.8, 33.8, 33.8, 33.8, 33.2, 33.2, 33.2, 33.2, 33.2, 32.43, 32.43, 32.43, 32.43, 32.43, 32.66, 32.66, 32.66, 32.66, 32.66, 32.89, 32.89, 32.89, 32.89, 32.89, 32.59, 32.59, 32.59, 32.59, 32.59, 32.49, 32.49, 32.49, 32.49, 32.49, 32.25, 32.25, 32.25, 32.25, 32.25, 31.8, 31.8, 31.8, 31.8, 31.8, 31.9, 31.9, 31.9, 31.9, 31.9, 31.89, 31.89, 31.89, 31.89, 31.89, 32.07, 32.07, 32.07, 32.07, 32.07, 32.32, 32.32, 32.32, 32.32, 32.32, 32.35, 32.35, 32.35, 32.35, 32.35, 31.74, 31.74, 31.74, 31.74, 31.74, 31.47, 31.47, 31.47, 31.47, 31.47, 31.67, 31.67, 31.67, 31.67, 31.67, 31.88, 31.88, 31.88, 31.88, 31.88, 31.97, 31.97, 31.97, 31.97, 31.97, 32.16, 32.16, 32.16, 32.16, 32.16, 32.25, 32.25, 32.25, 32.25, 32.25, 32.19, 32.19, 32.19, 32.19, 32.19, 32.05, 32.05, 32.05, 32.05, 32.05, 31.73, 31.73, 31.73, 31.73, 31.73, 31.59, 31.59, 31.59, 31.59, 31.59, 31.63, 31.63, 31.63, 31.63, 31.63, 31.72, 31.72, 31.72, 31.72, 31.72, 31.86, 31.86, 31.86, 31.86, 31.86, 31.94, 31.94, 31.94, 31.94, 31.94, 31.78, 31.78, 31.78, 31.78, 31.78, 31.3, 31.3, 31.3, 31.3, 31.3, 31.22, 31.22, 31.22, 31.22, 31.22, 30.86, 30.86, 30.86, 30.86, 30.86, 29.9, 29.9, 29.9, 29.9, 29.9, 29.93, 29.93, 29.93, 29.93, 29.93, 29.95, 29.95, 29.95, 29.95, 29.95, 30.14, 30.14, 30.14, 30.14, 30.14, 30.2, 30.2, 30.2, 30.2, 30.2, 30.32, 30.32, 30.32, 30.32, 30.32, 30.29, 30.29, 30.29, 30.29, 30.29, 30.09, 30.09, 30.09, 30.09, 30.09, 30.01, 30.01, 30.01, 30.01, 30.01, 30.1, 30.1, 30.1, 30.1, 30.1, 30.21, 30.21, 30.21, 30.21, 30.21, 30.36, 30.36, 30.36, 30.36, 30.36, 30.43, 30.43, 30.43, 30.43, 30.43, 30.56, 30.56, 30.56, 30.56, 30.56, 30.56, 30.56, 30.56, 30.56, 30.56, 30.57, 30.57, 30.57, 30.57, 30.57]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 555 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1715165719 --> 1715166347
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.12, 0.12, 0.12, 0.12, 0.12, 0.4, 0.4, 0.4, 0.4, 0.4, 0.25, 0.25, 0.25, 0.25, 0.25, 0.12, 0.12, 0.12, 0.12, 0.12, 0.21, 0.21, 0.21, 0.21, 0.21, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.21, 0.21, 0.21, 0.21, 0.21, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.18, 0.18, 0.18, 0.18, 0.18, 0.25, 0.25, 0.25, 0.25, 0.25, 0.31, 0.31, 0.31, 0.31, 0.31, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.13, 0.13, 0.13, 0.13, 0.13, 0.11, 0.11, 0.11, 0.11, 0.11, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12, 0.12, 0.12, 0.12, 0.12, 0.3, 0.3, 0.3, 0.3, 0.3, 0.25, 0.25, 0.25, 0.25, 0.25, 0.12, 0.12, 0.12, 0.12, 0.12, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.11, 0.11, 0.11, 0.11, 0.11, 0.13, 0.13, 0.13, 0.13, 0.13, 0.18, 0.18, 0.18, 0.18, 0.18, 0.23, 0.23, 0.23, 0.23, 0.23, 0.2, 0.2, 0.2, 0.2, 0.2, 0.21, 0.21, 0.21, 0.21, 0.21, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.1, 0.1, 0.1, 0.1, 0.1, 0.28, 0.28, 0.28, 0.28, 0.28, 0.46, 0.46, 0.46, 0.46, 0.46, 0.54, 0.54, 0.54, 0.54, 0.54, 0.53, 0.53, 0.53, 0.53, 0.53, 0.52, 0.52, 0.52, 0.52, 0.52, 0.1, 0.1, 0.1, 0.1, 0.1, 0.19, 0.19, 0.19, 0.19, 0.19, 0.11, 0.11, 0.11, 0.11, 0.11, 0.19, 0.19, 0.19, 0.19, 0.19, 0.11, 0.11, 0.11, 0.11, 0.11, 0.15, 0.15, 0.15, 0.15, 0.15, 0.25, 0.25, 0.25, 0.25, 0.25, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.14, 0.14, 0.14, 0.14, 0.14, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.11, 0.11, 0.11, 0.11, 0.11, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 555 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1715165719 --> 1715166347
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0]

ggerganov · 2024-05-08T07:58:09Z

~~Any idea why the conversion fails for me:~~

Nvm - I had to pull the repo

ggerganov · 2024-05-08T09:44:02Z

tests/CMakeLists.txt

@@ -84,6 +84,7 @@ llama_test(test-tokenizer-0 NAME test-tokenizer-0-starcoder         ARGS ${CMAKE
 llama_test(test-tokenizer-0 NAME test-tokenizer-0-gpt-2             ARGS ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-gpt-2.gguf)
 llama_test(test-tokenizer-0 NAME test-tokenizer-0-refact            ARGS ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-refact.gguf)
 llama_test(test-tokenizer-0 NAME test-tokenizer-0-command-r         ARGS ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-command-r.gguf)
+llama_test(test-tokenizer-0 NAME test-tokenizer-0-dbrx              ARGS ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-dbrx.gguf)


Let's add tests only for new types of pre-tokenizer in order to keep the binary data in the repo small. Remove the models/ggml-vocab-dbrx.* and let's merge

dranger003 added 2 commits May 7, 2024 19:14

Add BPE pre-tokenization for DBRX.

18e447d

Add vocab GGUFs.

48605e2

ggerganov approved these changes May 8, 2024

View reviewed changes

dranger003 added 2 commits May 8, 2024 06:28

Remove test.

2a87d5e

Remove GGUFs.

6c90dda

ggerganov approved these changes May 8, 2024

View reviewed changes

ggerganov merged commit 4cd621c into ggml-org:master May 8, 2024
56 of 61 checks passed

dranger003 deleted the bpe-dbrx branch January 3, 2025 13:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BPE pre-tokenization for DBRX. #7132

Add BPE pre-tokenization for DBRX. #7132

dranger003 commented May 7, 2024 •

edited

Loading

github-actions bot commented May 8, 2024 •

edited

Loading

ggerganov commented May 8, 2024 •

edited

Loading

ggerganov May 8, 2024

Add BPE pre-tokenization for DBRX. #7132

Add BPE pre-tokenization for DBRX. #7132

Conversation

dranger003 commented May 7, 2024 • edited Loading

github-actions bot commented May 8, 2024 • edited Loading

ggerganov commented May 8, 2024 • edited Loading

ggerganov May 8, 2024

Choose a reason for hiding this comment

dranger003 commented May 7, 2024 •

edited

Loading

github-actions bot commented May 8, 2024 •

edited

Loading

ggerganov commented May 8, 2024 •

edited

Loading