Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-40255: Implement Immortal Instances - Optimization 2 #31489

Closed

Conversation

eduardo-elizondo
Copy link
Contributor

@eduardo-elizondo eduardo-elizondo commented Feb 22, 2022

Immortalizing Runtime Heap After Startup

This is an optimization on top of PR19474.

The improvement here uses the assumption that everything that is alive after the runtime startup will (most likely) be alive during the entire lifecycle of the runtime. This then immortalizes the existing heap after the runtime has finished bootstrapping and right before the loading and execution of bytecode.

Benchmark Results

Overall: 0% slower compared to the main branch

pyperformance results 2to3: Mean +- std dev: [cpython_master] 432 ms +- 15 ms -> [immortal_instances_opt2] 440 ms +- 12 ms: 1.02x slower chaos: Mean +- std dev: [cpython_master] 126 ms +- 4 ms -> [immortal_instances_opt2] 119 ms +- 3 ms: 1.06x faster float: Mean +- std dev: [cpython_master] 128 ms +- 4 ms -> [immortal_instances_opt2] 133 ms +- 6 ms: 1.04x slower go: Mean +- std dev: [cpython_master] 244 ms +- 10 ms -> [immortal_instances_opt2] 232 ms +- 8 ms: 1.05x faster hexiom: Mean +- std dev: [cpython_master] 11.5 ms +- 0.6 ms -> [immortal_instances_opt2] 11.3 ms +- 0.3 ms: 1.02x faster json_dumps: Mean +- std dev: [cpython_master] 19.2 ms +- 0.7 ms -> [immortal_instances_opt2] 19.7 ms +- 0.8 ms: 1.02x slower logging_format: Mean +- std dev: [cpython_master] 10.4 us +- 0.3 us -> [immortal_instances_opt2] 10.8 us +- 0.3 us: 1.04x slower logging_silent: Mean +- std dev: [cpython_master] 201 ns +- 8 ns -> [immortal_instances_opt2] 205 ns +- 7 ns: 1.02x slower logging_simple: Mean +- std dev: [cpython_master] 9.77 us +- 0.32 us -> [immortal_instances_opt2] 9.46 us +- 0.41 us: 1.03x faster meteor_contest: Mean +- std dev: [cpython_master] 164 ms +- 5 ms -> [immortal_instances_opt2] 161 ms +- 5 ms: 1.02x faster nbody: Mean +- std dev: [cpython_master] 163 ms +- 6 ms -> [immortal_instances_opt2] 159 ms +- 6 ms: 1.03x faster nqueens: Mean +- std dev: [cpython_master] 159 ms +- 5 ms -> [immortal_instances_opt2] 152 ms +- 6 ms: 1.05x faster pathlib: Mean +- std dev: [cpython_master] 28.5 ms +- 0.7 ms -> [immortal_instances_opt2] 28.2 ms +- 0.9 ms: 1.01x faster pickle: Mean +- std dev: [cpython_master] 16.0 us +- 0.5 us -> [immortal_instances_opt2] 15.7 us +- 0.8 us: 1.02x faster pickle_dict: Mean +- std dev: [cpython_master] 37.3 us +- 0.6 us -> [immortal_instances_opt2] 35.2 us +- 1.4 us: 1.06x faster pickle_list: Mean +- std dev: [cpython_master] 5.77 us +- 0.24 us -> [immortal_instances_opt2] 5.53 us +- 0.22 us: 1.04x faster pidigits: Mean +- std dev: [cpython_master] 284 ms +- 15 ms -> [immortal_instances_opt2] 276 ms +- 7 ms: 1.03x faster python_startup: Mean +- std dev: [cpython_master] 12.6 ms +- 0.4 ms -> [immortal_instances_opt2] 11.9 ms +- 0.5 ms: 1.05x faster python_startup_no_site: Mean +- std dev: [cpython_master] 8.89 ms +- 0.39 ms -> [immortal_instances_opt2] 8.21 ms +- 0.34 ms: 1.08x faster raytrace: Mean +- std dev: [cpython_master] 529 ms +- 16 ms -> [immortal_instances_opt2] 542 ms +- 15 ms: 1.03x slower regex_compile: Mean +- std dev: [cpython_master] 233 ms +- 6 ms -> [immortal_instances_opt2] 239 ms +- 6 ms: 1.03x slower regex_dna: Mean +- std dev: [cpython_master] 239 ms +- 6 ms -> [immortal_instances_opt2] 257 ms +- 6 ms: 1.08x slower regex_effbot: Mean +- std dev: [cpython_master] 4.53 ms +- 0.12 ms -> [immortal_instances_opt2] 4.69 ms +- 0.17 ms: 1.04x slower regex_v8: Mean +- std dev: [cpython_master] 33.2 ms +- 0.8 ms -> [immortal_instances_opt2] 34.4 ms +- 1.1 ms: 1.04x slower richards: Mean +- std dev: [cpython_master] 82.8 ms +- 3.7 ms -> [immortal_instances_opt2] 85.1 ms +- 3.5 ms: 1.03x slower scimark_fft: Mean +- std dev: [cpython_master] 571 ms +- 12 ms -> [immortal_instances_opt2] 614 ms +- 19 ms: 1.08x slower scimark_lu: Mean +- std dev: [cpython_master] 195 ms +- 6 ms -> [immortal_instances_opt2] 207 ms +- 6 ms: 1.06x slower scimark_monte_carlo: Mean +- std dev: [cpython_master] 116 ms +- 5 ms -> [immortal_instances_opt2] 119 ms +- 5 ms: 1.02x slower scimark_sor: Mean +- std dev: [cpython_master] 211 ms +- 6 ms -> [immortal_instances_opt2] 222 ms +- 8 ms: 1.05x slower scimark_sparse_mat_mult: Mean +- std dev: [cpython_master] 8.28 ms +- 0.40 ms -> [immortal_instances_opt2] 8.74 ms +- 0.36 ms: 1.06x slower sympy_expand: Mean +- std dev: [cpython_master] 878 ms +- 34 ms -> [immortal_instances_opt2] 853 ms +- 16 ms: 1.03x faster sympy_integrate: Mean +- std dev: [cpython_master] 35.2 ms +- 1.0 ms -> [immortal_instances_opt2] 35.7 ms +- 1.2 ms: 1.01x slower sympy_sum: Mean +- std dev: [cpython_master] 291 ms +- 13 ms -> [immortal_instances_opt2] 296 ms +- 6 ms: 1.02x slower sympy_str: Mean +- std dev: [cpython_master] 514 ms +- 11 ms -> [immortal_instances_opt2] 523 ms +- 14 ms: 1.02x slower unpack_sequence: Mean +- std dev: [cpython_master] 77.9 ns +- 2.3 ns -> [immortal_instances_opt2] 70.0 ns +- 2.0 ns: 1.11x faster unpickle: Mean +- std dev: [cpython_master] 21.5 us +- 0.7 us -> [immortal_instances_opt2] 22.1 us +- 1.1 us: 1.03x slower unpickle_pure_python: Mean +- std dev: [cpython_master] 463 us +- 17 us -> [immortal_instances_opt2] 447 us +- 7 us: 1.04x faster xml_etree_process: Mean +- std dev: [cpython_master] 102 ms +- 2 ms -> [immortal_instances_opt2] 98.2 ms +- 2.4 ms: 1.04x faster

Benchmark hidden because not significant (13): deltablue, django_template, fannkuch, html5lib, json_loads, pickle_pure_python, pyflate, spectral_norm, telco, unpickle_list, xml_etree_parse, xml_etree_iterparse, xml_etree_generate

Geometric mean: 1.00x slower

Implementation Details

To achieve this, in pymain_main we call the newly introduced internal API: _PyGC_ImmortalizeHeap. This uses the internal gc.freeze API to take all the existing objects in the heap and move them to the permanent generation. Then, it iterates the permanent generation marking all the containers as immortal, including the first traversal layer.

Pseudo-Topological Ordered Finalization

Due to this change, there will now be slight differences in the module finalization order due to the immortality of various instances. More specifically, at runtime shutdown, the modules that exist during __del__ will now change. Note that the behavior of __del__ during runtime shutdown is an implementation detail and clearly labeled undefined behavior by the official documentation.

That being said, to reduce the variance introduced here by the runtime heap immortalization, we introduce a pseudo-topological order finalization of modules and its globals. Unfortunately, at runtime shutdown, we can’t fully guarantee the finalization dependency graph. Therefore, it does a best attempt at doing a topological finalization by executing the following heuristic:

  1. Start clearing from the newest to oldest User defined modules
  2. Finalize globals starting only with '_', excluding modules
  3. Finalize globals, excluding modules and __builtins__
  4. Finalize the remaining objects excluding __builtins__
  5. Start clearing from the newest to oldest Standard Library modules
  6. Finalize globals starting only with '_', excluding modules
  7. Finalize globals, excluding modules and __builtins__
  8. Finalize the remaining objects excluding __builtins__

This ordering guarantees that for any user defined module, the entire standard library will still be available at runtime shutdown for any custom __del__ function. Not only that but it also keep the current flexibility of using underscores to have more control over the destruction of these objects during the shutdown of the runtime. The current heuristics can still be extended to provide a more well defined destruction ordered (i.e perhaps splitting stdlib between C modules and Lib modules).

Test Output Checks

A large portion of the tests that check __del__ behavior rely on using the subprocess module to execute and parse the result of a given python program after it exists. This change caused the behavior of subprocess stdout PIPE to change due. That is, if we don’t PIPE stdout, the output is correctly printed to the terminal. However, when using stdout PIPE the output is suppressed but not picked up by the subprocess module. To get around this, rather than using stdout to test the output of __del__, we use stderr which still behaves correctly even during the topological ordered destruction.

Permanent Generation Finalization

Currently the permanent generation is not cleaned up at runtime shutdown. This preserves that behavior but it could be extended to do a best effort destruction of these instances.

https://bugs.python.org/issue40255

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants