Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support op clamp in backend metal #6660

Closed
phymbert opened this issue Apr 13, 2024 · 3 comments · Fixed by #6662
Closed

support op clamp in backend metal #6660

phymbert opened this issue Apr 13, 2024 · 3 comments · Fixed by #6662
Labels
enhancement New feature or request

Comments

@phymbert
Copy link
Collaborator

Motivation

Since the support DBRX in #6515, ggml metal backend lacks the support of op GGML_OP_CLAMP.

@phymbert phymbert added the enhancement New feature or request label Apr 13, 2024
@dave-fl
Copy link
Contributor

dave-fl commented Apr 13, 2024

@phymbert I've taken a stab at clamp.

Details

llama_new_context_with_model:      Metal compute buffer size =   208.00 MiB
llama_new_context_with_model:        CPU compute buffer size =    13.01 MiB
llama_new_context_with_model: graph nodes  = 2886
llama_new_context_with_model: graph splits = 2

system_info: n_threads = 12 / 16 | AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | 
sampling: 
	repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
	top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
	mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order: 
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature 
generate: n_ctx = 512, n_batch = 2048, n_predict = -1, n_keep = 0


I believe the meaning of life is to find a purpose that you can dedicate your life to, which will then give your life meaning. I think life is about the journey you take, and the experiences and relationships you make along the way. Life is about learning, growing, and becoming the best version of yourself that you can possibly be. It's about finding happiness and fulfillment in the things that you do, and in the relationships that you have. Ultimately, I believe that the meaning of life is to find your own unique path and to follow it with passion and purpose. [end of text]

llama_print_timings:        load time =   93196.69 ms
llama_print_timings:      sample time =       3.54 ms /   109 runs   (    0.03 ms per token, 30817.08 tokens per second)
llama_print_timings: prompt eval time =     366.42 ms /     7 tokens (   52.35 ms per token,    19.10 tokens per second)
llama_print_timings:        eval time =    7247.30 ms /   108 runs   (   67.10 ms per token,    14.90 tokens per second)
llama_print_timings:       total time =    7644.95 ms /   115 tokens
ggml_metal_free: deallocating
Log end

@phymbert
Copy link
Collaborator Author

phymbert commented Apr 13, 2024

So you just removed it, and it works ? Maybe we can test if the op is supported or not then?

You implement it, great

@dave-fl
Copy link
Contributor

dave-fl commented Apr 13, 2024

So you just removed it, and it works ? Maybe we can test if the op is supported or not then?

You implement it, great

It's unclear to me if I should be using clamp from metal_stdlib or my own. I went ahead and used my own since that was what cuda does as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants