You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add an option for models to use Classifier-Free Guidance (CFG) during inference. CFG uses a negative prompt to push inference to follow the system prompt more closely.
MLC-LLM is memory-constrained on mobile devices. The response quality of LLMs using CFG averaged similarly to standard models twice the size. For example, LLaMA-7B with CFG outperformed LLaMA-13B on the Lambada text-generation benchmark. This is at the expense of inference time.
The text was updated successfully, but these errors were encountered:
Thank @wronkiew for raising this up!
The example code by @Vermeille is very intuitive, and the idea of contrasting logits w/ and w/o prompts looks interesting.
I suppose it needs some effort on the MLC-LLM side because we need to support multiple KV caches, I'm glad to help prototype it.
🚀 Feature
Add an option for models to use Classifier-Free Guidance (CFG) during inference. CFG uses a negative prompt to push inference to follow the system prompt more closely.
Support for this has also been requested at huggingface/transformers#24536 and ggml-org/llama.cpp#2083. The paper describing the technique is here. Section 3.4 shows their evaluation using CFG to improve chatbot responses.
Motivation
MLC-LLM is memory-constrained on mobile devices. The response quality of LLMs using CFG averaged similarly to standard models twice the size. For example, LLaMA-7B with CFG outperformed LLaMA-13B on the Lambada text-generation benchmark. This is at the expense of inference time.
The text was updated successfully, but these errors were encountered: