Experiment with LLVM BOLT binary optimizer #90536

corona10 · 2022-01-14T16:11:27Z

BPO	46378
Nosy	@corona10, @brandtbucher

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/corona10'
closed_at = <Date 2022-01-27.17:00:46.351>
created_at = <Date 2022-01-14.16:11:27.201>
labels = ['3.11', 'build', 'performance']
title = 'Experiment with LLVM BOLT binary optimizer'
updated_at = <Date 2022-01-27.17:00:46.351>
user = 'https://github.com/corona10'

bugs.python.org fields:

activity = <Date 2022-01-27.17:00:46.351>
actor = 'corona10'
assignee = 'corona10'
closed = True
closed_date = <Date 2022-01-27.17:00:46.351>
closer = 'corona10'
components = ['Build']
creation = <Date 2022-01-14.16:11:27.201>
creator = 'corona10'
dependencies = []
files = []
hgrepos = []
issue_num = 46378
keywords = []
message_count = 2.0
messages = ['410570', '411901']
nosy_count = 2.0
nosy_names = ['corona10', 'brandtbucher']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue46378'
versions = ['Python 3.11']

corona10 · 2022-01-14T16:11:27Z

Just experiment how it will be worth :)

Thread: faster-cpython/ideas#224

corona10 · 2022-01-27T17:00:46Z

Only 1% gain, so we decided not to adopt it yet.
see: faster-cpython/ideas#224 (comment)

* Add support for the BOLT post-link binary optimizer Using [bolt](https://github.com/llvm/llvm-project/tree/main/bolt) provides a fairly large speedup without any code or functionality changes. It provides roughly a 1% speedup on pyperformance, and a 4% improvement on the Pyston web macrobenchmarks. It is gated behind an `--enable-bolt` configure arg because not all toolchains and environments are supported. It has been tested on a Linux x86_64 toolchain, using llvm-bolt built from the LLVM 14.0.6 sources (their binary distribution of this version did not include bolt). Compared to [a previous attempt](faster-cpython/ideas#224), this commit uses bolt's preferred "instrumentation" approach, as well as adds some non-PIE flags which enable much better optimizations from bolt. The effects of this change are a bit more dependent on CPU microarchitecture than other changes, since it optimizes i-cache behavior which seems to be a bit more variable between architectures. The 1%/4% numbers were collected on an Intel Skylake CPU, and on an AMD Zen 3 CPU I got a slightly larger speedup (2%/4%), and on a c6i.xlarge EC2 instance I got a slightly lower speedup (1%/3%). The low speedup on pyperformance is not entirely unexpected, because BOLT improves i-cache behavior, and the benchmarks in the pyperformance suite are small and tend to fit in i-cache. This change uses the existing pgo profiling task (`python -m test --pgo`), though I was able to measure about a 1% macrobenchmark improvement by using the macrobenchmarks as the training task. I personally think that both the PGO and BOLT tasks should be updated to use macrobenchmarks, but for the sake of splitting up the work this PR uses the existing pgo task. * Simplify the build flags * Add a NEWS entry * Update Makefile.pre.in Co-authored-by: Dong-hee Na <[email protected]> * Update configure.ac Co-authored-by: Dong-hee Na <[email protected]> * Add myself to ACKS * Add docs * Other review comments * fix tab/space issue * Make it more clear that --enable-bolt is experimental * Add link to bolt's github page Co-authored-by: Dong-hee Na <[email protected]>

corona10 added the 3.11 only security fixes label Jan 14, 2022

corona10 self-assigned this Jan 14, 2022

corona10 added performance Performance or resource usage 3.11 only security fixes labels Jan 14, 2022

corona10 self-assigned this Jan 14, 2022

corona10 added performance Performance or resource usage build The build process and cross-build labels Jan 14, 2022

corona10 closed this as completed Jan 27, 2022

ezio-melotti transferred this issue from another repository Apr 10, 2022

corona10 reopened this Aug 11, 2022

bedevere-bot mentioned this issue Aug 11, 2022

gh-90536: Add support for the BOLT post-link binary optimizer #95908

Merged

corona10 closed this as completed Aug 18, 2022

corona10 added a commit to corona10/cpython that referenced this issue Aug 20, 2022

pythongh-90536: Fix link syntax to LLVM-BOLT repository

93f68ec

bedevere-bot mentioned this issue Aug 20, 2022

gh-90536: Fix link syntax to LLVM-BOLT repository #96141

Merged

corona10 added a commit that referenced this issue Aug 20, 2022

gh-90536: Fix link syntax to LLVM-BOLT repository (gh-96141)

6ec57e7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment with LLVM BOLT binary optimizer #90536

Experiment with LLVM BOLT binary optimizer #90536

corona10 commented Jan 14, 2022

corona10 commented Jan 14, 2022

corona10 commented Jan 27, 2022

Experiment with LLVM BOLT binary optimizer #90536

Experiment with LLVM BOLT binary optimizer #90536

Comments

corona10 commented Jan 14, 2022

corona10 commented Jan 14, 2022

corona10 commented Jan 27, 2022