Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary build script: Use shared builds + LTO for Python 3.10 #1320

Merged
merged 1 commit into from
May 4, 2022

Conversation

edmorley
Copy link
Member

@edmorley edmorley commented May 3, 2022

Shared builds are beneficial for a number of reasons:

  • Reduces the size of the build, since it avoids the duplication between the Python binary and the static library. For example, for Python 3.10.x on Heroku-20, the output size drops from 206MB -> 90MB. This frees up slug space for use by dependencies/apps, and also reduces the download/extraction/re-archiving times for both the classic buildpack and CNB.
  • Permits use-cases that only work with the shared Python library, and not the static library (such as pycall.rb or PyO3).
  • More consistent with the official Python Docker images and other distributions.

However, shared builds are slower unless no-semantic-interpositionand LTO is used:
https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup

It's only as of Python 3.10 that no-semantic-interposition is enabled by default, so we only use shared builds on Python 3.10+ to avoid needing to override the default compiler flags.

Build size reductions:

$ diff -U0 <(cd with-system-expat/ && du -Sh) <(cd shared-lto/ && du -Sh)
--- /dev/fd/63
+++ /dev/fd/62
@@ -1 +1 @@
-24M    ./bin
+56K    ./bin
@@ -78 +78 @@
-60M    ./lib/python3.10/config-3.10-x86_64-linux-gnu
+144K   ./lib/python3.10/config-3.10-x86_64-linux-gnu
@@ -81 +81 @@
-20M    ./lib/python3.10/lib-dynload
+18M    ./lib/python3.10/lib-dynload
@@ -103 +103 @@
-59M    ./lib
+26M    ./lib

$ diff -U0 <(cd with-system-expat/ && du -sh) <(cd shared-lto/ && du -sh)
--- /dev/fd/63
+++ /dev/fd/62
@@ -1 +1 @@
-206M   .
+90M    .

Also adds a du at the end of the build scripts, to make it easier to see output sizes.

Note: This change will only take effect for future Python version releases (or future Heroku stacks) - existing Python binaries are not being recompiled.

Configure docs:
https://docs.python.org/3/using/configure.html#cmdoption-enable-shared
https://docs.python.org/3/using/configure.html#cmdoption-with-lto
https://docs.python.org/3/using/configure.html#cmdoption-without-static-libpython

See also:
https://github.com/docker-library/python/blob/1cf43e70e45843c70909a5f914c3c6d0f85fc200/Dockerfile-linux.template#L154-L159
docker-library/python#501
docker-library/python#502
docker-library/python#660
python/cpython#24418

Closes #243.
Closes #665.
Closes #1225.
GUS-W-10989125.

@edmorley edmorley requested a review from a team as a code owner May 3, 2022 12:54
@edmorley edmorley self-assigned this May 3, 2022
@edmorley edmorley force-pushed the builds-python3.10-shared-lto branch from 50b636f to d67447c Compare May 3, 2022 13:47
Shared builds are beneficial for a number of reasons:
- Reduces the size of the build, since it avoids the duplication between the
  Python binary and the static library.  For example, for Python 3.10.x on
  Heroku-20 the, output size drops from 206MB -> 90MB.
- Permits use-cases that only work with the shared Python library,
  and not the static library (such as `pycall.rb` or `PyO3`).
- More consistent with the official Python Docker images and other distributions.

However, shared builds are slower unless `no-semantic-interposition`and LTO is used:
https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup

It's only as of Python 3.10 that `no-semantic-interposition` is enabled by default,
so we only use shared builds on Python 3.10+ to avoid needing to override the default
compiler flags.

Build size reductions:

```
$ diff -U0 <(cd with-system-expat/ && du -Sh) <(cd shared-lto/ && du -Sh)
--- /dev/fd/63
+++ /dev/fd/62
@@ -1 +1 @@
-24M    ./bin
+56K    ./bin
@@ -78 +78 @@
-60M    ./lib/python3.10/config-3.10-x86_64-linux-gnu
+144K   ./lib/python3.10/config-3.10-x86_64-linux-gnu
@@ -81 +81 @@
-20M    ./lib/python3.10/lib-dynload
+18M    ./lib/python3.10/lib-dynload
@@ -103 +103 @@
-59M    ./lib
+26M    ./lib
```

```
$ diff -U0 <(cd with-system-expat/ && du -sh) <(cd shared-lto/ && du -sh)
--- /dev/fd/63
+++ /dev/fd/62
@@ -1 +1 @@
-206M   .
+90M    .
```

Also adds a `du` at the end of the build scripts, to make it easier to see
output sizes.

Note: This change will only take effect for future Python version releases (or future Heroku stacks) - existing Python binaries are not being recompiled.

Configure docs:
https://docs.python.org/3/using/configure.html#cmdoption-enable-shared
https://docs.python.org/3/using/configure.html#cmdoption-with-lto
https://docs.python.org/3/using/configure.html#cmdoption-without-static-libpython

Closes #243, #665, #1225.
GUS-W-10989125.
@edmorley edmorley force-pushed the builds-python3.10-shared-lto branch from d67447c to 854ae2b Compare May 4, 2022 06:37
@edmorley edmorley enabled auto-merge (squash) May 4, 2022 06:38
@edmorley edmorley merged commit 623da04 into main May 4, 2022
@edmorley edmorley deleted the builds-python3.10-shared-lto branch May 4, 2022 06:40
@edmorley
Copy link
Member Author

For a summary of the combined Python runtime size reductions from this and related PRs, see:
#1322 (comment)

edmorley added a commit that referenced this pull request Apr 16, 2024
…#1565)

During the build, the buildpack makes the config vars set on the app
available to certain subprocesses (such as Django collectstatic) via the
`sub_env` utility function. This function filters out env vars that
might cause the subprocess to fail. (Note: This filtering only affects
app config vars, and not the env vars provided by buildpacks that run
prior to the Python buildpack.)

This change adds `LD_LIBRARY_PATH` and `PYTHONHOME` to the list of env
vars that are filtered out, to prevent errors when they are set to
invalid values.

In particular, very old versions of the buildpack used to set these env
vars as actual app config vars (via the `bin/release` script), to values
that no longer work:
https://github.com/heroku/heroku-buildpack-python/blob/27abdfe7d7ad104dabceb45641415251e965671c/bin/release#L11-L18

These broken app config vars have not typically caused problems since:
1. Only Python apps created in 2012 or earlier will have them (unless
   someone manually sets them on an app)
2. Static builds of Python don't rely upon `LD_LIBRARY_PATH`

However, as of Python 3.10 we switched to building in shared mode (see
#1320), and so apps with broken config vars will otherwise see errors
like the following once they upgrade Python versions:

```
python: error while loading shared libraries: libpython3.10.so.1.0: cannot open shared object file: No such file or directory
```

As seen in:
https://heroku.support/1365030

The `GIT_DIR` env var was removed from the filter list, since there is
no need to filter it out, since it's no longer set by the build system,
see #1120.

(The CNB isn't affected by this issue, and already has a test to confirm that.)

GUS-W-15519103.
edmorley added a commit that referenced this pull request Apr 18, 2024
As part of the CNB multi-architecture support work, we need to change
the Python runtime archive S3 URLs to include the architecture name.
In addition, for the CNB transition from "stacks" to "targets", it would
be helpful to switch from stack ID references (such as `heroku-22`) in
the URL scheme, to the distro name+version (eg `ubuntu` and `22.04`)
available to CNBs via the CNB targets feature. See:
https://github.com/buildpacks/spec/blob/buildpack/0.10/buildpack.md#targets-1

Rather than duplicate the Python archives on S3 under different
filenames/locations, it makes sense to migrate this buildpack to the new
archive names too, so the same S3 archives can be used by both this
buildpack and the CNB.

Moving to new archive names/URLs also means we can safely regenerate all
existing Python versions to pick up the changes in #1566 (and changes
made in the past, such as #1319, #1320, #1321 and #1322), since we won't
have to worry about overwriting the old archives (which is something
we've typically avoided, since it isn't compatible with the model of
being able to roll back to an older buildpack version to return to prior
behaviour).

Since we're changing the S3 URLs anyway, now is also a good time to make
another change that would otherwise cause churn in the S3 URLs again
(which affects people that pin buildpack version): Switching archive
compression format from gzip to Zstandard (something that we've been
wanting to do for a while).

Zstandard (aka zstd) is a much superior compression format over gzip
(smaller archives and much faster decompression), and is seeing
widespread adoption across multiple ecosystems (eg APT packages,
Docker images, web browsers etc).

See:
https://github.com/facebook/zstd
https://github.com/facebook/zstd/blob/dev/programs/README.md#usage-of-command-line-interface

Our base images already have `zstd` installed (and for Rust for the CNB,
there is the [zstd](https://crates.io/crates/zstd) crate available), so it's an easy switch.

Various compression levels were tested using zstd's benchmarking feature
and in the end the highest level of compression picked, since:
1. Unlike some other compression algorithms, zstd's decompression speed
   is generally not affected by the compression level.
2. We only have to perform the compression once (when compiling Python).
3. Even at the highest compression ratio, it only takes 20 seconds to
   compress the Python archives compared to the 10 minutes it takes to
   compile Python itself (when using PGO+LTO).

For the Ubuntu 22.04 Python 3.12.3 archive, switching from gzip to zstd
(level 22, with long window mode enabled) results in a 26% reduction in
compressed archive size.

GUS-W-15158299.
GUS-W-15505556.
edmorley added a commit to edmorley/pycall.rb that referenced this pull request Sep 7, 2024
Since Heroku's Python builds now use `--enable-shared` (for Python 3.10 and newer), so use of custom builds or a custom buildpack is no longer required.

See:
heroku/heroku-buildpack-python#1320
https://github.com/heroku/heroku-buildpack-python/blob/928d0c593f6b6def16d6ad8451f56a997760eace/builds/build_python_runtime.sh#L117-L133
edmorley added a commit to edmorley/pycall.rb that referenced this pull request Nov 22, 2024
Since Heroku's Python builds now use `--enable-shared` (for Python 3.10 and newer), so use of custom builds or a custom buildpack is no longer required.

See:
heroku/heroku-buildpack-python#1320
https://github.com/heroku/heroku-buildpack-python/blob/928d0c593f6b6def16d6ad8451f56a997760eace/builds/build_python_runtime.sh#L117-L133

In addition the Python buildpack now supports the `.python-version` file (and recommends it over `runtime.txt`):
https://devcenter.heroku.com/changelog-items/3005
mrkn pushed a commit to mrkn/pycall.rb that referenced this pull request Feb 7, 2025
Since Heroku's Python builds now use `--enable-shared` (for Python 3.10 and newer), so use of custom builds or a custom buildpack is no longer required.

See:
heroku/heroku-buildpack-python#1320
https://github.com/heroku/heroku-buildpack-python/blob/928d0c593f6b6def16d6ad8451f56a997760eace/builds/build_python_runtime.sh#L117-L133

In addition the Python buildpack now supports the `.python-version` file (and recommends it over `runtime.txt`):
https://devcenter.heroku.com/changelog-items/3005
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants