METABUG: Copy-and-patch #588

mdboom · 2023-05-24T19:49:02Z

Note
This issue is not for discussion, but to keep the various steps organized. To create new subtasks, create new issues and link to them from here by editing this comment.

Platform Support

We should probably add a CI step to be sure we're indeed testing the tier we think we are. Maybe just ./python -c 'import sysconfig; assert sysconfig.get_config_var("PY_SUPPORT_TIER") == "%{{ matrix.support_tier }}"'
- This has no equivalent on Windows, so I've just added sys._support_tier for now and am using that instead.
The current CI for this should be cleaned up.
Discover installed LLVM tools in a good, cross-platform way.
Some runners have newer LLVMs installed, and are using those instead.
Release builds should build with PGO/LTO.

Tier One

Done. These are currently being tested in CI on copy-and-patch branches. The matrix includes both debug and release builds with stencils being generated by LLVM 14, 15, and 16.

i686-pc-windows-msvc/msvc
x86_64-pc-windows-msvc/msvc
x86_64-apple-darwin/clang
x86_64-unknown-linux-gnu/gcc

Tier Two

I consider this done. Since wide platform support is one of the selling points of copy-and-patch, seeing how the build system extends to these platforms is a good idea.

aarch64-apple-darwin/clang
- No CI resources available, but all tests pass locally for the full matrix.
aarch64-unknown-linux-gnu/gcc
- Using hardware emulation in CI. test_cmd_line, test_concurrent_futures, test_eintr, test_faulthandler, test_os, test_perf_profiler, test_posix, test_signal, test_socket, test_subprocess, and test_tools are being skipped since they fail under emulation (even on CPython main). I've verified that the all tests pass locally for the full matrix on native hardware.
aarch64-unknown-linux-gnu/clang
- See above.
powerpc64le-unknown-linux-gnu/gcc
- musttail produces an internal clang error (see "Upstream LLVM work" below). This probably won't happen until that issue is fixed.
x86_64-unknown-linux-gnu/clang

Tier Three

Interesting, but not planned at this time. Could be good projects for external contributors once the build steps have stabilized for tier two. The wasm32 builds sound... "fun".

aarch64-pc-windows-msvc/msvc
armv7l-unknown-linux-gnueabihf/gcc
powerpc64le-unknown-linux-gnu/clang
s390x-unknown-linux-gnu/gcc
wasm32-unknown-emscripten/clang
wasm32-unknown-wasi/clang
x86_64-unknown-freebsd/clang

Benchmarking

Install LLVM (any of 14, 15, or 16) on at least one of our benchmarking machines.
Get comparisons/stats vs current main. It doesn't have to be faster yet, but it would help to know where we stand (nice, only 1.75% slower, even with a naive tracing implementation and no speed tricks).

Upstream LLVM Work

musttail + ghccc + aarch64 produces an internal clang error.
musttail + powerpc64le produces an internal clang error.
We have to compile with -fomit-frame-pointer, since the GHC calling convention uses %rbp as an argument-passing register. This feels like a bug.
It would be nice if clang supported __attribute__((ghccc)).
--elf-output-style=JSON isn't supported for COFF and Mach-O, but basically works (it prints slightly broken JSON that can be recovered using string replacement). It would be really nice if it were properly supported:
- Mach-O
- COFF

3.13 Integration

Other Interesting Ideas

Basic TOS caching for several items. This is hard and inefficient to get right, since every stack shrink/grow invalidates most of the cached values. It (sort of) works, but it doesn't appear to be a big win in its current form (so it's been disabled for now).
- Benchmark what we have anyways.
Try caching the bottom values on the stack. This requires compiling several stencil variants for different stack sizes and choosing the right one, but requires much less invalidation logic (since the mapping of registers to stack slots never changes).
Our stencils don't benefit from PGO/LTO, so we should either explore how difficult it is to get this to work, or manually add likely/unlikely attributes to the template scaffolding.
Maybe get cross-builds working? The emulated tier 2 platforms are super slow...

The text was updated successfully, but these errors were encountered:

brandtbucher · 2024-04-28T02:44:03Z

Closing and moving this tracking to the various new issues over at on the CPython repo.

mdboom added the epic-copy-and-patch label May 24, 2023

mdboom assigned mdboom and brandtbucher May 24, 2023

brandtbucher closed this as completed Apr 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

METABUG: Copy-and-patch #588

METABUG: Copy-and-patch #588

mdboom commented May 24, 2023 •

edited by brandtbucher

Loading

brandtbucher commented Apr 28, 2024

METABUG: Copy-and-patch #588

METABUG: Copy-and-patch #588

Comments

mdboom commented May 24, 2023 • edited by brandtbucher Loading

Platform Support

Tier One

Tier Two

Tier Three

Benchmarking

Upstream LLVM Work

3.13 Integration

Other Interesting Ideas

brandtbucher commented Apr 28, 2024

mdboom commented May 24, 2023 •

edited by brandtbucher

Loading