Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huggingface quantisation #409

Closed
wants to merge 1 commit into from
Closed

Huggingface quantisation #409

wants to merge 1 commit into from

Conversation

Archit-Kohli
Copy link
Contributor

  • Documentation Update

  • Why was this update required?: Difficult to load model with such high parameters so needs to be loaded via bitsnbytes

  • Other information: Added link to hugging face's official bits-n-bytes notebook for reference

Added link to hugging face's official bits-n-bytes notebook for reference
@vercel
Copy link

vercel bot commented Oct 4, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
docs-gpt ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 4, 2023 1:56pm
nextra-docsgpt ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 4, 2023 1:56pm

@dartpain
Copy link
Contributor

dartpain commented Oct 4, 2023

Unfortunately users will need to change some code in application/llm/huggingface.py
What you can do is maybe create a differnt class or a q parameter that can be passed to it called q to set quantisation if explicit by users. Will only need to add few lines of code here

basically add
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)

to AutoModelForCausalLM.from_pretrained and a conditional import

this would be much appreciated

@dartpain
Copy link
Contributor

dartpain commented Oct 4, 2023

Unfortunately users will need to change some code in application/llm/huggingface.py
What you can do is maybe create a differnt class or a q parameter that can be passed to it called q to set quantisation if explicit by users. Will only need to add few lines of code here

@dartpain dartpain changed the title Update README.md Huggingface quantisation Oct 4, 2023
@Archit-Kohli
Copy link
Contributor Author

Updated in a new pull request #425 you can close this request

@dartpain dartpain closed this Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants