Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请追加模型量化 #18

Closed
Minami-su opened this issue Aug 27, 2023 · 6 comments
Closed

请追加模型量化 #18

Minami-su opened this issue Aug 27, 2023 · 6 comments

Comments

@Minami-su
Copy link

如题

@Minami-su
Copy link
Author

量化可以只针对llm

@simonJJJ
Copy link
Contributor

We are working on it.

@Minami-su
Copy link
Author

We are working on it.

谢谢你们的工作

@77h2l
Copy link

77h2l commented Aug 28, 2023

A10 22G , out of memory @simonJJJ

@77h2l
Copy link

77h2l commented Aug 29, 2023

@simonJJJ hello, thx for your work, may I ask what is the least memory requirement to deploy such a Qwen-VL model, I have tested that single A10 is not able to run, Is there a way to offload the memory, thx

@ShuaiBai623
Copy link
Collaborator

The Int4 quantized model for Qwen-VL-Chat, Qwen-VL-Chat-Int4, is available. about 12G cost

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants