-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature][WIP] Prototype of vLLM execution on Intel GPU devices via SYCL. #2378
Conversation
setup.py
Outdated
BUILD_CPU_ONLY = os.getenv('VLLM_BUILD_CPU_ONLY', "0") == "1" | ||
BUILD_XPU_OPS = os.getenv('VLLM_BUILD_XPU_OPS', "0") == "1" | ||
if BUILD_XPU_OPS: | ||
from xpu_extension.xpu_cpp_extension import DPCPPExtension, DpcppBuildExtension |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can utilize extensions what ipex already provides:
https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/features/DPC%2B%2B_Extension.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your comments. already updated. please check latest change.
49c3c7e
to
8cdfae2
Compare
When I try to install the package via
What might be the issue? |
Oh sorry, I didn't verify latest code, there are some refactor from another developer. Can you checkout to commit id 8cdfae2 and try compile with prefix |
Does it support tensor parallelism via multiple GPUs and oneCCL? |
I am working on another branch which could run tensor parallel on PVC, arc not works yet. will sub mt another PR to support when this got merged. https://github.com/jikunshang/vllm/tree/tp |
do you observe performance boost when 7B model is executed on 2 GPUs? |
Actually, performance drop about 1X on llama-2-7b and llama-2-13b, we are still investigating the root cause. |
When I run
My setup has GPU Max 1100. I think this error is because of that cuda dependency still exists in runtime even though cuda libraries are not installed. In CPU PR (#1028), this was solved, i.e., CPU-only installation and runtime were possible. Maybe apply the same thing here too? |
emmm, I think it's not necessary. please try to add |
@jikunshang Thanks it worked. |
Thanks for the feature! Is it the way to run it: docker build -f Dockerfile.xpu -t vllm-xpu-env --shm-size=4g .
docker run -it \
--rm \
--network=host \
--device /dev/dri \
-v /dev/dri/by-path:/dev/dri/by-path \
vllm-xpu-env as per the doc? Or it's something different? |
sycl version support is deprecated. Please follow latest ipex based solution. thanks. |
Thank you and apologies for the delay in getting back! May I ask you why sycl version is deprecated? It's not that I have any good experience with it nor I would advocate for it - but it you could share the background for that decision, it would help me to understand things better! |
SYCL version is hard to maintain and performance is not optimal. IPEX team have experts to maintain these kernels and provide stable API so we choose to use IPEX as backend. |
Makes sense, thank you! |
This is follow up for PR #1028.
Will refactor and separate this to several smaller PR later. mainly contains below items:
prepare env
make sure hardware, gpu driver ready.
install onapi toolkit base
how to build
how to run ut
how to run E2E test