-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huge performance degradation using latest branch on Intel Core Ultra 7 155H #8328
Comments
This CPU has only 6 performance cores - how is the speed using |
@ggerganov sure, here are the results with llama-bench:
build: be55134 (2568)
build: 51d2ebad (3303) |
@aahouzi With the older release you have context length 512 and with the latest release you have context length 4096. At some point llama.cpp started using the longest possible context length by default, so simply set it to lower value and see if this restores performance. |
@fairydreaming Thanks for the suggestion, I tried with context length of 512 but unfortunately this doesn't improve performance. I think it might be deeper than that, any guess ? |
@aahouzi Can you attach main.log from both releases? I'd like to see if there are any other differences. |
@aahouzi By the way I tried the older release you mentioned on my machine and the current master. With the older release I got:
With the current one:
This is LLaMa-3 70B Q8_0 with 512 context on Epyc 9374F, the current master is clearly faster - at least on my workstation. |
Again, your tests are on a CPU workstation, so it's different from Core Ultra in many ways ^^'
|
@aahouzi I don't see any obvious problems, I guess the only thing left to do is to use bisection to find the release that introduced the performance degradation for your system. Alternatively you can try testing releases introducing changes in sgemm. Possible candidates are for example b2715 and b2816. |
@fairydreaming After running a bisection between b2568 and b3303, it seems like the regression was introduced in b2715, more specifically in the commit resulting from #6796 @jart tagging you here since this was your PR, and in case you got any suggestions to remove this performance penalty below :)
@ggerganov I was wondering, when there are new changes introduced in gemm functions for instance, does your CI measure performance through multiple client HWs or only on Mac ? |
We measure the performance manually - there is no CI for that |
@aahouzi Hello, can you share some performance comparison between CPU and SYCL on Intel Core Ultra 7 155H? |
@aahouzi Let me check it. |
@aahouzi Old version of b2568 latest version: There are several PRs to increase the performance obviously on Arc 770. I know the MTL gpu driver will impact the performance more. |
@NeoZhangJianyu please read details of the issue, this is unrelated to the issue I'm talking about: The problem is with CPU performance degradation and not GPU, see the numbers I shared too |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Type of issue
Name and Version
./llama-cli.exe release b3317 vs ./main.exe release b2568
What operating system are you seeing the problem on?
Windows 11
Relevant log output
See issue description
The text was updated successfully, but these errors were encountered: