-
-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add instructions for Langchain #1162
Conversation
Hi @mspronesti , does this LangChain-VLLM support quantized model? Because the vllm-project already supported quantized model (AWQ format) as shown in #1032 However, when I use the same way and just pass "quantization='awq" to your LangChain-VLLM, it seems does not work and just show OOM. model_path = "/home/quadrep/toan/projects/LLMs/weights/vicuna-33B-AWQ" |
Hi @pvtoan, model = VLLM(
model=model_path,
tensor_parallel_size=2,
trust_remote_code=True,
max_new_tokens=512,
vllm_kwargs={"quantization": "awq"}
) Also, notice that |
Btw @WoosukKwon I have also resolved the conflicts in this PR, in case you are interested in merging It. |
Hi @mspronesti , yes, it works now and I indeed uses 2 GPUs. Thank you so much for your help!. Besides, I'd like to ask if I have any problems when loading the model vicuna-33B-AWQ (model size is around, in total, 18GB with 2 shards) and one embedding model (size about 1.3GB) into 2 GPUs 4090 (each with 24GB) because all loaded models almost occupied 48GB from 2 GPUs. In particular, two GPUs increased memory simultaneously from 0GB to around 9GB, then growed up to 14GB, and finally to 23.9GB. My DRAM also increases from 6GB to around 28GB. My OS is Ubuntu 22.04. Do you think it is normal for occupied memory including VRAM and DRAM? |
Thank you for your contribution! |
Hi! A month ago I made a few contributions to Langchain to support vLLM and its OpenAI-compatible server.
This PR aims at updating vLLM's documentation to showcase how to use vLLM with Langchain and to redirect to the tutorial I wrote there for further details. Hope you find this contribution meaningful.