-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(python): Specify tune-cpu & add more features #17615
Conversation
@orlp can you take a look here? |
I'd prefer to keep using a manual list of features, on an as-needed basis. This also makes it easy to keep the CPU check module up-to-date. Another example of a pitfall of |
Sorry, I did not notice that.
I want to argue that the compiler can implicitly make use of arch features. An "as-needed basis" is sub-optimal for the compiler. For example, if Another point, as I've mentioned, is the tunings. Specifying a manual list of features does not turn on the tunings for the target CPU - tunings for generic CPU is used. Tunings in Despite the aforementioned benefits of using microarch levels, if a manual list of features is still preferred, IMO we can at least complete the list with all features that the target CPU supports, and manually specify a |
@ruihe774 I would be open to adding more features if it means we don't exclude any CPUs we support now, and adding a reasonable tuning target. I'm not very open to relaxing the CPU check. It's not that expensive and it means users get a proper explained error rather than a segfault. Also please consider that there are 'CPUs' which support weird combinations of feature sets, such as certain emulators like Rosetta. |
OK. I'm going to work on this. BTW what is the "reasonable tuning target" in your opinion? IMO I would choose
Rosetta does not support AVX at all1; it can only run polars-lts-cpu. BTW, I'm still wondering what the purpose of the macOS branch is: elif [[ "$IS_MACOS" = true ]]; then
FEATURES=+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt,+avx,+fma,+pclmulqdq I cannot find a cpu model that support fma but not avx2. Footnotes |
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #17615 +/- ##
==========================================
+ Coverage 80.64% 80.66% +0.01%
==========================================
Files 1484 1490 +6
Lines 195509 195931 +422
Branches 2780 2789 +9
==========================================
+ Hits 157675 158043 +368
- Misses 37324 37375 +51
- Partials 510 513 +3 ☔ View full report in Codecov by Sentry. |
@ruihe774 We talked internally and it seems no one can remember the reason why that MacOS check was there. So let's just get rid of it and we'll see if it turns out to be a regression. If so we can always fix it again and place a comment why it's so then :) |
This PR makes use of standard x86 microarhitecture levels1 for
target-cpu
instead of manually specifiedtarget-features
when building the release wheels. This brings the following benefits:target-cpu
impliestune-cpu
, while settingtarget-features
does not. e.g.,target-cpu=x86-84-v3
enables tuning for those v3 cpu models5.This PR uses:
x86-64-v3
for normal release; it includes sse, avx, avx2, etc.x86-64-v2
for lts-cpu release; it includes sse, etc.Supported CPU models (i.e. compatibility) of either of them is not changed.
The special configuration for macOS is removed. I cannot find a cpu model that support fma but not avx2. Please correct me if I'm wrong.
This PR does not affect
_cpu_check.py
. A list of features is still given to it.XXX:
tune-cpu
?x86-64-v3
is tuned for a kind of old cpu model (Haswell). We may settarget-cpu
andtune-cpu
to different values, e.g.-C target-cpu=x86-64-v3 -Z tune-cpu=skylake
._cpu_check.py
. e.g. a cpu that has avx also has sse. We can simplify its logic and only check a sentinel feature.Footnotes
https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels ↩
https://news.opensuse.org/2023/03/02/tw-gains-optional-optimizations ↩
https://wiki.cachyos.org/cachyos_repositories/what_are_the_cachyos_repo/#performance-and-optimizations ↩
https://github.com/pola-rs/polars/pull/14746 ↩
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86.td ↩