-
Notifications
You must be signed in to change notification settings - Fork 10.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix get_num_physical_cores() (#1436)
* fix get_num_physical_cores() had been broken on complex topologies because "cpu cores" in /proc/cpuinfo is per-"physical id" * Add spaces to maintain consistent formatting --------- Co-authored-by: slaren <[email protected]>
- Loading branch information
Showing
1 changed file
with
15 additions
and
14 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
63d2046
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if this is an ignorant post, but I'm wondering what the point of this code is, on Apple Silicon at least. If I run something that tells me my Mac M2 Pro (standard, not the upgraded one) has 10 cores, and I specify -t 10, llama.cpp will run much slower than if I specify just the number of 'performance cores', which is 6.
This has been discussed elsewhere in an issue, but on Apple Silicon, for fastest performance, the key is specifying the number of 'performance cores', not total number, as utilizing those 'efficiency cores' will actually cause llama.cpp to run SLOWER than it would without them.
For about a month I was running much slower than necessary because of my misreading of the utility of that bit of text 'system_info: n_threads = 10 / 10' when I would have been better off without it. Maybe it's still useful but there should be an explanation in the README somewhere? And maybe this is only relevant for Apple Silicon or does it apply to other architectures as well?
If there were a script that could break that out into efficiency vs performance cores than it would make more sense IMO, but maybe I'm just not getting the point of this feature. Again, please ignore if this is not relevant in the context of this feature...