Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Classifier-Free Guidance sampling #499

Open
wronkiew opened this issue Jul 3, 2023 · 2 comments
Open

[Feature Request] Classifier-Free Guidance sampling #499

wronkiew opened this issue Jul 3, 2023 · 2 comments
Labels
feature request New feature or request

Comments

@wronkiew
Copy link

wronkiew commented Jul 3, 2023

🚀 Feature

Add an option for models to use Classifier-Free Guidance (CFG) during inference. CFG uses a negative prompt to push inference to follow the system prompt more closely.

Support for this has also been requested at huggingface/transformers#24536 and ggml-org/llama.cpp#2083. The paper describing the technique is here. Section 3.4 shows their evaluation using CFG to improve chatbot responses.

Motivation

MLC-LLM is memory-constrained on mobile devices. The response quality of LLMs using CFG averaged similarly to standard models twice the size. For example, LLaMA-7B with CFG outperformed LLaMA-13B on the Lambada text-generation benchmark. This is at the expense of inference time.

@wronkiew wronkiew added the feature request New feature or request label Jul 3, 2023
@Vermeille
Copy link

Author here. Let me know if help is needed.

@yzh119
Copy link
Member

yzh119 commented Jul 4, 2023

Thank @wronkiew for raising this up!
The example code by @Vermeille is very intuitive, and the idea of contrasting logits w/ and w/o prompts looks interesting.

I suppose it needs some effort on the MLC-LLM side because we need to support multiple KV caches, I'm glad to help prototype it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants