Excessively slow prompt processing time with 70B partially offloaded in SYCL #5272

Jacoby1218 · 2024-02-02T04:38:28Z

prompt processing is extremely slow with a 70B partially offloaded.
llama-bench.exe -ngl 20 -m "D:\models\lzlv_70b_fp16_hf.Q4_K_M.gguf"
Using device 0 (Intel(R) Arc(TM) A770 Graphics) as main device

model	size	params	backend	ngl	test	t/s
llama 70B Q4_K - Medium	38.58 GiB	68.98 B	SYCL	20	pp 512	2.14 ± 0.28
llama 70B Q4_K - Medium	38.58 GiB	68.98 B	SYCL	20	tg 128	1.03 ± 0.01

build: a28c5ef (2045)

The text was updated successfully, but these errors were encountered:

airMeng · 2024-02-02T06:08:39Z

hi @Jacoby1218 could you provide some reference data to show the magnitude of gaps? for example, performance on RTX-4070ti (16 GB), or entirely on iGPU/CPU?

Jacoby1218 · 2024-02-02T07:15:49Z

I don't have any other GPU to test, but i can provide results from my CPU and other backends.

model	size	params	backend	threads	test	t/s
llama 70B Q4_K - Medium	38.58 GiB	68.98 B	BLAS	6	pp 512	1.93 ± 0.06
llama 70B Q4_K - Medium	38.58 GiB	68.98 B	BLAS	6	tg 128	0.81 ± 0.02

model	size	params	backend	ngl	test	t/s
llama 70B Q4_K - Medium	38.58 GiB	68.98 B	Vulkan	20	pp 512	7.02 ± 0.25
llama 70B Q4_K - Medium	38.58 GiB	68.98 B	Vulkan	20	tg 128	0.97 ± 0.04
llama 70B Q4_K - Medium	38.58 GiB	68.98 B	OpenCL	20	pp 512	8.81 ± 1.10
llama 70B Q4_K - Medium	38.58 GiB	68.98 B	OpenCL	20	tg 128	0.82 ± 0.02

airMeng · 2024-02-02T08:21:06Z

I think this maybe due to lacking optimization on multi-batch, has been recordd in #5277, please stay tuned!

github-actions · 2024-03-18T01:32:21Z

This issue is stale because it has been open for 30 days with no activity.

airMeng · 2024-03-24T13:08:13Z

I think this has been improved with #6217, please give a try.

github-actions · 2024-05-09T01:06:27Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

Jacoby1218 added the bug-unconfirmed label Feb 2, 2024

NeoZhangJianyu added the Intel GPU label Feb 2, 2024

NeoZhangJianyu mentioned this issue Feb 2, 2024

SYCL backend support Multi-card #5282

Closed

5 tasks

github-actions bot added the stale label Mar 18, 2024

github-actions bot removed the stale label Mar 25, 2024

github-actions bot added the stale label Apr 24, 2024

github-actions bot closed this as completed May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessively slow prompt processing time with 70B partially offloaded in SYCL #5272

Excessively slow prompt processing time with 70B partially offloaded in SYCL #5272

Jacoby1218 commented Feb 2, 2024

airMeng commented Feb 2, 2024

Jacoby1218 commented Feb 2, 2024 •

edited

Loading

airMeng commented Feb 2, 2024

github-actions bot commented Mar 18, 2024

airMeng commented Mar 24, 2024

github-actions bot commented May 9, 2024

Excessively slow prompt processing time with 70B partially offloaded in SYCL #5272

Excessively slow prompt processing time with 70B partially offloaded in SYCL #5272

Comments

Jacoby1218 commented Feb 2, 2024

airMeng commented Feb 2, 2024

Jacoby1218 commented Feb 2, 2024 • edited Loading

airMeng commented Feb 2, 2024

github-actions bot commented Mar 18, 2024

airMeng commented Mar 24, 2024

github-actions bot commented May 9, 2024

Jacoby1218 commented Feb 2, 2024 •

edited

Loading