You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note
This issue is not for discussion, but to keep the various steps organized. To create new subtasks, create new issues and link to them from here by editing this comment.
Platform Support
We should probably add a CI step to be sure we're indeed testing the tier we think we are. Maybe just ./python -c 'import sysconfig; assert sysconfig.get_config_var("PY_SUPPORT_TIER") == "%{{ matrix.support_tier }}"'
This has no equivalent on Windows, so I've just added sys._support_tier for now and am using that instead.
The current CI for this should be cleaned up.
Discover installed LLVM tools in a good, cross-platform way.
Some runners have newer LLVMs installed, and are using those instead.
Release builds should build with PGO/LTO.
Tier One
Done. These are currently being tested in CI on copy-and-patch branches. The matrix includes both debug and release builds with stencils being generated by LLVM 14, 15, and 16.
i686-pc-windows-msvc/msvc
x86_64-pc-windows-msvc/msvc
x86_64-apple-darwin/clang
x86_64-unknown-linux-gnu/gcc
Tier Two
I consider this done. Since wide platform support is one of the selling points of copy-and-patch, seeing how the build system extends to these platforms is a good idea.
aarch64-apple-darwin/clang
No CI resources available, but all tests pass locally for the full matrix.
aarch64-unknown-linux-gnu/gcc
Using hardware emulation in CI. test_cmd_line, test_concurrent_futures, test_eintr, test_faulthandler, test_os, test_perf_profiler, test_posix, test_signal, test_socket, test_subprocess, and test_tools are being skipped since they fail under emulation (even on CPython main). I've verified that the all tests pass locally for the full matrix on native hardware.
aarch64-unknown-linux-gnu/clang
See above.
powerpc64le-unknown-linux-gnu/gcc
musttail produces an internal clang error (see "Upstream LLVM work" below). This probably won't happen until that issue is fixed.
x86_64-unknown-linux-gnu/clang
Tier Three
Interesting, but not planned at this time. Could be good projects for external contributors once the build steps have stabilized for tier two. The wasm32 builds sound... "fun".
aarch64-pc-windows-msvc/msvc
armv7l-unknown-linux-gnueabihf/gcc
powerpc64le-unknown-linux-gnu/clang
s390x-unknown-linux-gnu/gcc
wasm32-unknown-emscripten/clang
wasm32-unknown-wasi/clang
x86_64-unknown-freebsd/clang
Benchmarking
Install LLVM (any of 14, 15, or 16) on at least one of our benchmarking machines.
Get comparisons/stats vs current main. It doesn't have to be faster yet, but it would help to know where we stand (nice, only 1.75% slower, even with a naive tracing implementation and no speed tricks).
Upstream LLVM Work
musttail + ghccc + aarch64 produces an internal clang error.
musttail + powerpc64le produces an internal clang error.
We have to compile with -fomit-frame-pointer, since the GHC calling convention uses %rbp as an argument-passing register. This feels like a bug.
It would be nice if clang supported __attribute__((ghccc)).
--elf-output-style=JSON isn't supported for COFF and Mach-O, but basically works (it prints slightly broken JSON that can be recovered using string replacement). It would be really nice if it were properly supported:
Mach-O
COFF
3.13 Integration
Start by rebasing current work on main to use the new optimizer/executor model, rather than the current specialization of JUMP_BACKWARD.
We currently pass lots of extra flags when compiling. It would be nice if we didn't have to:
-fno-asynchronous-unwind-tables
-fno-pic
-fno-stack-protector
-fomit-frame-pointer (see "Upstream LLVM work" above)
-g0
-mcmodel=large
Begin handling side exits and explore trace-tree management.
We still need a notion of relocation "types" when patching.
Maybe dump comments with a human-readable disassembly in Python/jit_stencils.h?
Other Interesting Ideas
Basic TOS caching for several items. This is hard and inefficient to get right, since every stack shrink/grow invalidates most of the cached values. It (sort of) works, but it doesn't appear to be a big win in its current form (so it's been disabled for now).
Benchmark what we have anyways.
Try caching the bottom values on the stack. This requires compiling several stencil variants for different stack sizes and choosing the right one, but requires much less invalidation logic (since the mapping of registers to stack slots never changes).
Our stencils don't benefit from PGO/LTO, so we should either explore how difficult it is to get this to work, or manually add likely/unlikely attributes to the template scaffolding.
Maybe get cross-builds working? The emulated tier 2 platforms are super slow...
The text was updated successfully, but these errors were encountered:
Platform Support
./python -c 'import sysconfig; assert sysconfig.get_config_var("PY_SUPPORT_TIER") == "%{{ matrix.support_tier }}"'
sys._support_tier
for now and am using that instead.Tier One
Done. These are currently being tested in CI on copy-and-patch branches. The matrix includes both debug and release builds with stencils being generated by LLVM 14, 15, and 16.
i686-pc-windows-msvc/msvc
x86_64-pc-windows-msvc/msvc
x86_64-apple-darwin/clang
x86_64-unknown-linux-gnu/gcc
Tier Two
I consider this done. Since wide platform support is one of the selling points of copy-and-patch, seeing how the build system extends to these platforms is a good idea.
aarch64-apple-darwin/clang
aarch64-unknown-linux-gnu/gcc
test_cmd_line
,test_concurrent_futures
,test_eintr
,test_faulthandler
,test_os
,test_perf_profiler
,test_posix
,test_signal
,test_socket
,test_subprocess
, andtest_tools
are being skipped since they fail under emulation (even on CPython main). I've verified that the all tests pass locally for the full matrix on native hardware.aarch64-unknown-linux-gnu/clang
powerpc64le-unknown-linux-gnu/gcc
musttail
produces an internal clang error (see "Upstream LLVM work" below). This probably won't happen until that issue is fixed.x86_64-unknown-linux-gnu/clang
Tier Three
Interesting, but not planned at this time. Could be good projects for external contributors once the build steps have stabilized for tier two. The
wasm32
builds sound... "fun".aarch64-pc-windows-msvc/msvc
armv7l-unknown-linux-gnueabihf/gcc
powerpc64le-unknown-linux-gnu/clang
s390x-unknown-linux-gnu/gcc
wasm32-unknown-emscripten/clang
wasm32-unknown-wasi/clang
x86_64-unknown-freebsd/clang
Benchmarking
main
. It doesn't have to be faster yet, but it would help to know where we stand (nice, only 1.75% slower, even with a naive tracing implementation and no speed tricks).Upstream LLVM Work
musttail
+ghccc
+aarch64
produces an internal clang error.musttail
+powerpc64le
produces an internal clang error.-fomit-frame-pointer
, since the GHC calling convention uses%rbp
as an argument-passing register. This feels like a bug.clang
supported__attribute__((ghccc))
.--elf-output-style=JSON
isn't supported for COFF and Mach-O, but basically works (it prints slightly broken JSON that can be recovered using string replacement). It would be really nice if it were properly supported:3.13 Integration
main
to use the new optimizer/executor model, rather than the current specialization ofJUMP_BACKWARD
.-fno-asynchronous-unwind-tables
-fno-pic
-fno-stack-protector
-fomit-frame-pointer
(see "Upstream LLVM work" above)-g0
-mcmodel=large
Python/jit_stencils.h
?Other Interesting Ideas
likely
/unlikely
attributes to the template scaffolding.The text was updated successfully, but these errors were encountered: