support op clamp in backend metal #6660

phymbert · 2024-04-13T13:27:32Z

Motivation

Since the support DBRX in #6515, ggml metal backend lacks the support of op GGML_OP_CLAMP.

The text was updated successfully, but these errors were encountered:

dave-fl · 2024-04-13T15:28:00Z

@phymbert I've taken a stab at clamp.

Details

llama_new_context_with_model:      Metal compute buffer size =   208.00 MiB
llama_new_context_with_model:        CPU compute buffer size =    13.01 MiB
llama_new_context_with_model: graph nodes  = 2886
llama_new_context_with_model: graph splits = 2

system_info: n_threads = 12 / 16 | AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | 
sampling: 
	repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
	top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
	mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order: 
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature 
generate: n_ctx = 512, n_batch = 2048, n_predict = -1, n_keep = 0


I believe the meaning of life is to find a purpose that you can dedicate your life to, which will then give your life meaning. I think life is about the journey you take, and the experiences and relationships you make along the way. Life is about learning, growing, and becoming the best version of yourself that you can possibly be. It's about finding happiness and fulfillment in the things that you do, and in the relationships that you have. Ultimately, I believe that the meaning of life is to find your own unique path and to follow it with passion and purpose. [end of text]

llama_print_timings:        load time =   93196.69 ms
llama_print_timings:      sample time =       3.54 ms /   109 runs   (    0.03 ms per token, 30817.08 tokens per second)
llama_print_timings: prompt eval time =     366.42 ms /     7 tokens (   52.35 ms per token,    19.10 tokens per second)
llama_print_timings:        eval time =    7247.30 ms /   108 runs   (   67.10 ms per token,    14.90 tokens per second)
llama_print_timings:       total time =    7644.95 ms /   115 tokens
ggml_metal_free: deallocating
Log end

phymbert · 2024-04-13T17:49:31Z

~~So you just removed it, and it works ? Maybe we can test if the op is supported or not then?~~

You implement it, great

dave-fl · 2024-04-13T18:22:59Z

~~So you just removed it, and it works ? Maybe we can test if the op is supported or not then?~~

You implement it, great

It's unclear to me if I should be using clamp from metal_stdlib or my own. I went ahead and used my own since that was what cuda does as well.

phymbert added the enhancement New feature or request label Apr 13, 2024

phymbert mentioned this issue Apr 13, 2024

model: support arch DbrxForCausalLM #6515

Merged

13 tasks

dave-fl mentioned this issue Apr 13, 2024

Added support for GGML_OP_CLAMP in Metal #6662

Merged

phymbert closed this as completed in #6662 Apr 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support op clamp in backend metal #6660

support op clamp in backend metal #6660

phymbert commented Apr 13, 2024

dave-fl commented Apr 13, 2024

phymbert commented Apr 13, 2024 •

edited

Loading

dave-fl commented Apr 13, 2024 •

edited

Loading

support op clamp in backend metal #6660

support op clamp in backend metal #6660

Comments

phymbert commented Apr 13, 2024

Motivation

dave-fl commented Apr 13, 2024

phymbert commented Apr 13, 2024 • edited Loading

dave-fl commented Apr 13, 2024 • edited Loading

phymbert commented Apr 13, 2024 •

edited

Loading

dave-fl commented Apr 13, 2024 •

edited

Loading