Skip to content

Commit

Permalink
fix get_num_physical_cores() (#1436)
Browse files Browse the repository at this point in the history
* fix get_num_physical_cores()
had been broken on complex topologies because "cpu cores" in /proc/cpuinfo is per-"physical id"

* Add spaces to maintain consistent formatting

---------

Co-authored-by: slaren <[email protected]>
  • Loading branch information
zrm and slaren authored May 15, 2023
1 parent b5c9295 commit 63d2046
Showing 1 changed file with 15 additions and 14 deletions.
29 changes: 15 additions & 14 deletions examples/common.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
#include <iterator>
#include <algorithm>
#include <sstream>
#include <unordered_set>

#if defined(__APPLE__) && defined(__MACH__)
#include <sys/types.h>
Expand All @@ -28,21 +29,21 @@

int32_t get_num_physical_cores() {
#ifdef __linux__
std::ifstream cpuinfo("/proc/cpuinfo");
std::string line;
while (std::getline(cpuinfo, line)) {
std::size_t pos = line.find("cpu cores");
if (pos != std::string::npos) {
pos = line.find(": ", pos);
if (pos != std::string::npos) {
try {
// Extract the number and return it
return static_cast<int32_t>(std::stoul(line.substr(pos + 2)));
} catch (const std::invalid_argument &) {
// Ignore if we could not parse
}
}
// enumerate the set of thread siblings, num entries is num cores
std::unordered_set<std::string> siblings;
for (uint32_t cpu=0; cpu < UINT32_MAX; ++cpu) {
std::ifstream thread_siblings("/sys/devices/system/cpu"
+ std::to_string(cpu) + "/topology/thread_siblings");
if (!thread_siblings.is_open()) {
break; // no more cpus
}
std::string line;
if (std::getline(thread_siblings, line)) {
siblings.insert(line);
}
}
if (siblings.size() > 0) {
return static_cast<int32_t>(siblings.size());
}
#elif defined(__APPLE__) && defined(__MACH__)
int32_t num_physical_cores;
Expand Down

1 comment on commit 63d2046

@dogjamboree
Copy link

@dogjamboree dogjamboree commented on 63d2046 May 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if this is an ignorant post, but I'm wondering what the point of this code is, on Apple Silicon at least. If I run something that tells me my Mac M2 Pro (standard, not the upgraded one) has 10 cores, and I specify -t 10, llama.cpp will run much slower than if I specify just the number of 'performance cores', which is 6.
This has been discussed elsewhere in an issue, but on Apple Silicon, for fastest performance, the key is specifying the number of 'performance cores', not total number, as utilizing those 'efficiency cores' will actually cause llama.cpp to run SLOWER than it would without them.

For about a month I was running much slower than necessary because of my misreading of the utility of that bit of text 'system_info: n_threads = 10 / 10' when I would have been better off without it. Maybe it's still useful but there should be an explanation in the README somewhere? And maybe this is only relevant for Apple Silicon or does it apply to other architectures as well?

If there were a script that could break that out into efficiency vs performance cores than it would make more sense IMO, but maybe I'm just not getting the point of this feature. Again, please ignore if this is not relevant in the context of this feature...

Please sign in to comment.