-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessively slow prompt processing time with 70B partially offloaded in SYCL #5272
Comments
hi @Jacoby1218 could you provide some reference data to show the magnitude of gaps? for example, performance on RTX-4070ti (16 GB), or entirely on iGPU/CPU? |
I don't have any other GPU to test, but i can provide results from my CPU and other backends.
|
I think this maybe due to lacking optimization on multi-batch, has been recordd in #5277, please stay tuned! |
This issue is stale because it has been open for 30 days with no activity. |
I think this has been improved with #6217, please give a try. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
prompt processing is extremely slow with a 70B partially offloaded.
llama-bench.exe -ngl 20 -m "D:\models\lzlv_70b_fp16_hf.Q4_K_M.gguf"
Using device 0 (Intel(R) Arc(TM) A770 Graphics) as main device
build: a28c5ef (2045)
The text was updated successfully, but these errors were encountered: