Question: CPU and quantization support? #1481

oleotiger · 2023-10-26T12:27:47Z

Does vLLM support LLM inference on CPU? And with quantized model?

What's the difference between vLLM and GGML?

hmellor · 2024-03-13T11:57:25Z

Does vLLM support LLM inference on CPU?

No, but it's on the roadmap #2681

And with quantized model?

Yes: AWQ, GPTQ, SqueezeLLM

What's the difference between vLLM and GGML?

vLLM is a serving engine, GGLM is a tensor library for CPUs

hmellor closed this as completed Mar 13, 2024

Provide feedback