Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split compiler into code-gen, optimizer and assembler. #87092

Closed
markshannon opened this issue Jan 13, 2021 · 9 comments
Closed

Split compiler into code-gen, optimizer and assembler. #87092

markshannon opened this issue Jan 13, 2021 · 9 comments
Assignees

Comments

@markshannon
Copy link
Member

markshannon commented Jan 13, 2021

BPO 42926
Nosy @gvanrossum, @markshannon, @corona10, @brandtbucher, @iritkatriel

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2021-01-13.15:03:23.987>
labels = []
title = 'Split compiler into code-gen, optimizer and assembler.'
updated_at = <Date 2022-02-03.22:11:26.525>
user = 'https://github.com/markshannon'

bugs.python.org fields:

activity = <Date 2022-02-03.22:11:26.525>
actor = 'iritkatriel'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = []
creation = <Date 2021-01-13.15:03:23.987>
creator = 'Mark.Shannon'
dependencies = []
files = []
hgrepos = []
issue_num = 42926
keywords = ['patch']
message_count = 2.0
messages = ['385033', '385043']
nosy_count = 5.0
nosy_names = ['gvanrossum', 'Mark.Shannon', 'corona10', 'brandtbucher', 'iritkatriel']
pr_nums = ['31116']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue42926'
versions = []

@markshannon
Copy link
Member Author

Currently the compiler operates in three main passes:

Code-gen
Optimize
Assemble

The problem is that these passes use the same basic-block based CFG, leading to unnecessary coupling and inefficiencies.
A basic block CFG is awkward and error-prone for the code-gen, but not very efficient for the optimizer and assembler.

A better design would be for the code-gen to create a single linear sequence of instructions. The optimizer would take this and produce a list of extended-blocks for the assembler to consume.

code-gen -> (list of instructions) -> optimizer
optimizer -> (list of extended blocks) -> assembler

(Extended blocks have a single entry and multiple exits, unlike basic blocks which have a single entry and single exit)

This would:

  1. Reduce memory use considerably (the size of instruction and block data structures would be about 60% of current)
  2. Be faster (Less CFG management).
  3. Produce better code (extended blocks are a better unit for optimization that basic blocks).
  4. Be easier to maintain:
    a) Code-gen wouldn't have to worry about creating a correct CFG.
    b) The optimizer wouldn't need to handle empty blocks and track which basic blocks form an extended block.

Apart from the changes to the compiler, it would help if we made all branch instructions absolute (or have a backward dual) to accommodate free reordering of blocks in the optimizer.

@gvanrossum
Copy link
Member

SGTM. But I’m not the one who has to work with it.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Jul 28, 2022
…compiler's codegen stage does not work directly with basic blocks
@iritkatriel iritkatriel moved this to In Progress in Fancy CPython Board Jul 28, 2022
@iritkatriel iritkatriel self-assigned this Jul 28, 2022
markshannon pushed a commit that referenced this issue Aug 4, 2022
…er's codegen stage does not work directly with basic blocks (GH-95398)
iritkatriel added a commit that referenced this issue Aug 11, 2022
…he target pointer is only calculated just before optimization stage (GH-95655)
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Aug 24, 2022
@iritkatriel
Copy link
Member

I've been struggling to determine where to draw the line between the optimization stage and the assembly stage.

I think the solution is to add a fourth stage, between optimization and assembly, which prepares the CFG for assembly. It does all the normalisation of pseudo-stuff (instructions, targets, etc) into actual stuff (real opcodes, offsets, etc).

I don't know if there is a name for this stage in compilers parlance. We could call it "resolve instructions" or something like that.

It will include everything to do with:

(1) calculating stackdepth and except targets, replacing exception related opcodes by NOP
(2) moving cold blocks to end of block-list
(3) everything related to making sure line numbers are correct
(4) replacing pseudo-op jumps by actual jumps, including figuring out the direction and translating conditional backwards jumps into real jumps
(5) Finally, calculating the jump offsets

@gvanrossum
Copy link
Member

I don't know what that should be called either, but after this is done, are EXTENDED_ARG prefixes for jumps all set? I'm guessing, maybe make a similar list of responsibilities for the assembler? Because I'm not sure what those are.

@iritkatriel
Copy link
Member

The responsibility of the assembler is to turn a list of instructions to a code object:

(1) write the bytecode for the instructions
(2) create the lineno table
(3) create the exception table
(4) create a code object from those + metadata about the compilation unit.

Bytecode generation in stage (1) adds EXTENDED_ARG bytecodes when an instruction has a large oparg that requires it. The part that calculates jump offsets ((5) in the "resolve" list) takes into account the EXTENDED_ARGs when calculating block sizes.

The idea is that all the complex calculations would be in the new stage, and we can write tests for them. Then the assembler's job is to transform the instructions representation to what we need in the code object (but all the exception/jump targets, lineno, block order etc is already resolved before). Then we can write tests just for this translation stage.

@gvanrossum
Copy link
Member

SG. May the name can be inspired by “settle” or “tidy” or “clean”? Or maybe “resolve” is fine.

@markshannon
Copy link
Member Author

From the list above:
(1) calculating stackdepth and except targets, replacing exception related opcodes by NOP
(2) moving cold blocks to end of block-list
(3) everything related to making sure line numbers are correct
(4) replacing pseudo-op jumps by actual jumps, including figuring out the direction and translating conditional backwards jumps into real jumps
(5) Finally, calculating the jump offsets

And splitting (1) into 1x (exception targets) and 1s (stack depth calculation) and adding O (other cfg optimizations)

In a normal compiler, there are de-sugaring and semantic analysis passes after parsing, but before optimization.
(3) and (1x) belongs in there, as does computing except targets.

(2) belongs in the optimizer

(1s), (4) and (5) belong in the assembler.

Is this a good order of passes? 3, 1x, 2, O, 4, 5, 1s

@iritkatriel
Copy link
Member

I like the idea or moving some of the resolution to before optimisations. I think the order you suggest is fine, by and large. I’ll need to check whether all the line number business is safe to move up front. Some of it involves duplicating blocks,etc.

@iritkatriel
Copy link
Member

The lineno calculation needs to duplicate blocks that have no lineno but more than one predecessors. So it needs to come after mark_reachable, which is currently somewhere in the middle of the optimization stage. So that part of Mark's suggested reordering might be a problem. At least it needs to be done carefully.

iritkatriel added a commit that referenced this issue Sep 20, 2022
carljm added a commit to carljm/cpython that referenced this issue Oct 6, 2022
* main: (66 commits)
  pythongh-65961: Raise `DeprecationWarning` when `__package__` differs from `__spec__.parent` (python#97879)
  docs(typing): add "see PEP 675" to LiteralString (python#97926)
  pythongh-97850: Remove all known instances of module_repr() (python#97876)
  I changed my surname early this year (python#96671)
  pythongh-93738: Documentation C syntax (:c:type:<C type> -> :c:expr:<C type>) (python#97768)
  pythongh-91539: improve performance of get_proxies_environment  (python#91566)
  build(deps): bump actions/stale from 5 to 6 (python#97701)
  pythonGH-95172 Make the same version `versionadded` oneline (python#95172)
  pythongh-88050: Fix asyncio subprocess to kill process cleanly when process is blocked (python#32073)
  pythongh-93738: Documentation C syntax (Function glob patterns -> literal markup) (python#97774)
  pythongh-93357: Port test cases to IsolatedAsyncioTestCase, part 2 (python#97896)
  pythongh-95196: Disable incorrect pickling of the C implemented classmethod descriptors (pythonGH-96383)
  pythongh-97758: Fix a crash in getpath_joinpath() called without arguments (pythonGH-97759)
  pythongh-74696: Pass root_dir to custom archivers which support it (pythonGH-94251)
  pythongh-97661: Improve accuracy of sqlite3.Cursor.fetchone docs (python#97662)
  pythongh-87092: bring compiler code closer to a preprocessing-opt-assembler organisation (pythonGH-97644)
  pythonGH-96704: Add {Task,Handle}.get_context(), use it in call_exception_handler() (python#96756)
  pythongh-93738: Documentation C syntax (:c:type:`PyTypeObject*` -> :c:expr:`PyTypeObject*`) (python#97778)
  pythongh-97825: fix AttributeError when calling subprocess.check_output(input=None) with encoding or errors args (python#97826)
  Add re.VERBOSE flag documentation example (python#97678)
  ...
iritkatriel added a commit that referenced this issue Apr 24, 2023
carljm added a commit to carljm/cpython that referenced this issue Apr 24, 2023
* main: (53 commits)
  pythongh-102498 Clean up unused variables and imports in the email module  (python#102482)
  pythongh-99184: Bypass instance attribute access in `repr` of `weakref.ref` (python#99244)
  pythongh-99032: datetime docs: Encoding is no longer relevant (python#93365)
  pythongh-94300: Update datetime.strptime documentation (python#95318)
  pythongh-103776: Remove explicit uses of $(SHELL) from Makefile (pythonGH-103778)
  pythongh-87092: fix a few cases of incorrect error handling in compiler (python#103456)
  pythonGH-103727: Avoid advancing tokenizer too far in f-string mode (pythonGH-103775)
  Revert "Add tests for empty range equality (python#103751)" (python#103770)
  pythongh-94518: Port 23-argument `_posixsubprocess.fork_exec` to Argument Clinic (python#94519)
  pythonGH-65022: Fix description of copyreg.pickle function (python#102656)
  pythongh-103323: Get the "Current" Thread State from a Thread-Local Variable (pythongh-103324)
  pythongh-91687: modernize dataclass example typing (python#103773)
  pythongh-103746: Test `types.UnionType` and `Literal` types together (python#103747)
  pythongh-103765: Fix 'Warning: py:class reference target not found: ModuleSpec' (pythonGH-103769)
  pythongh-87452: Improve the Popen.returncode docs
  Removed unnecessary escaping of asterisks (python#103714)
  pythonGH-102973: Slim down Fedora packages in the dev container (python#103283)
  pythongh-103091: Add PyUnstable_Type_AssignVersionTag (python#103095)
  Add tests for empty range equality (python#103751)
  pythongh-103712: Increase the length of the type name in AttributeError messages (python#103713)
  ...
carljm added a commit to carljm/cpython that referenced this issue Apr 24, 2023
* superopt: (82 commits)
  pythongh-101517: fix line number propagation in code generated for except* (python#103550)
  pythongh-103780: Use patch instead of mock in asyncio unix events test (python#103782)
  pythongh-102498 Clean up unused variables and imports in the email module  (python#102482)
  pythongh-99184: Bypass instance attribute access in `repr` of `weakref.ref` (python#99244)
  pythongh-99032: datetime docs: Encoding is no longer relevant (python#93365)
  pythongh-94300: Update datetime.strptime documentation (python#95318)
  pythongh-103776: Remove explicit uses of $(SHELL) from Makefile (pythonGH-103778)
  pythongh-87092: fix a few cases of incorrect error handling in compiler (python#103456)
  pythonGH-103727: Avoid advancing tokenizer too far in f-string mode (pythonGH-103775)
  Revert "Add tests for empty range equality (python#103751)" (python#103770)
  pythongh-94518: Port 23-argument `_posixsubprocess.fork_exec` to Argument Clinic (python#94519)
  pythonGH-65022: Fix description of copyreg.pickle function (python#102656)
  pythongh-103323: Get the "Current" Thread State from a Thread-Local Variable (pythongh-103324)
  pythongh-91687: modernize dataclass example typing (python#103773)
  pythongh-103746: Test `types.UnionType` and `Literal` types together (python#103747)
  pythongh-103765: Fix 'Warning: py:class reference target not found: ModuleSpec' (pythonGH-103769)
  pythongh-87452: Improve the Popen.returncode docs
  Removed unnecessary escaping of asterisks (python#103714)
  pythonGH-102973: Slim down Fedora packages in the dev container (python#103283)
  pythongh-103091: Add PyUnstable_Type_AssignVersionTag (python#103095)
  ...
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Apr 27, 2023
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Apr 27, 2023
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Apr 27, 2023
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Apr 28, 2023
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Apr 29, 2023
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Apr 29, 2023
carljm added a commit to carljm/cpython that referenced this issue May 1, 2023
* main: (26 commits)
  pythongh-104028: Reduce object creation while calling callback function from gc (pythongh-104030)
  pythongh-104036: Fix direct invocation of test_typing (python#104037)
  pythongh-102213: Optimize the performance of `__getattr__` (pythonGH-103761)
  pythongh-103895: Improve how invalid `Exception.__notes__` are displayed (python#103897)
  Adjust expression from `==` to `!=` in alignment with the meaning of the paragraph. (pythonGH-104021)
  pythongh-88496: Fix IDLE test hang on macOS (python#104025)
  Improve int test coverage (python#104024)
  pythongh-88773: Added teleport method to Turtle library (python#103974)
  pythongh-104015: Fix direct invocation of `test_dataclasses` (python#104017)
  pythongh-104012: Ensure test_calendar.CalendarTestCase.test_deprecation_warning consistently passes (python#104014)
  pythongh-103977: compile re expressions in platform.py only if required (python#103981)
  pythongh-98003: Inline call frames for CALL_FUNCTION_EX (pythonGH-98004)
  Replace Netlify with Read the Docs build previews (python#103843)
  Update name in acknowledgements and add mailmap (python#103696)
  pythongh-82054: allow test runner to split test_asyncio to execute in parallel by sharding. (python#103927)
  Remove non-existing tools from Sundry skiplist (python#103991)
  pythongh-103793: Defer formatting task name (python#103767)
  pythongh-87092: change assembler to use instruction sequence instead of CFG (python#103933)
  pythongh-103636: issue warning for deprecated calendar constants (python#103833)
  Various small fixes to dis docs (python#103923)
  ...
carljm added a commit to carljm/cpython that referenced this issue May 1, 2023
* main: (463 commits)
  pythongh-104057: Fix direct invocation of test_super (python#104064)
  pythongh-87092: Expose assembler to unit tests (python#103988)
  pythongh-97696: asyncio eager tasks factory (python#102853)
  pythongh-84436: Immortalize in _PyStructSequence_InitBuiltinWithFlags() (pythongh-104054)
  pythongh-104057: Fix direct invocation of test_module (pythonGH-104059)
  pythongh-100458: Clarify Enum.__format__() change of mixed-in types in the whatsnew/3.11.rst (pythonGH-100387)
  pythongh-104018: disallow "z" format specifier in %-format of byte strings (pythonGH-104033)
  pythongh-104016: Fixed off by 1 error in f string tokenizer (python#104047)
  pythonGH-103629: Update Unpack's repr in compliance with PEP 692 (python#104048)
  pythongh-102799: replace sys.exc_info by sys.exception in inspect and traceback modules (python#104032)
  Fix typo in "expected" word in few source files (python#104034)
  pythongh-103824: fix use-after-free error in Parser/tokenizer.c (python#103993)
  pythongh-104035: Do not ignore user-defined `__{get,set}state__` in slotted frozen dataclasses (python#104041)
  pythongh-104028: Reduce object creation while calling callback function from gc (pythongh-104030)
  pythongh-104036: Fix direct invocation of test_typing (python#104037)
  pythongh-102213: Optimize the performance of `__getattr__` (pythonGH-103761)
  pythongh-103895: Improve how invalid `Exception.__notes__` are displayed (python#103897)
  Adjust expression from `==` to `!=` in alignment with the meaning of the paragraph. (pythonGH-104021)
  pythongh-88496: Fix IDLE test hang on macOS (python#104025)
  Improve int test coverage (python#104024)
  ...
carljm added a commit to carljm/cpython that referenced this issue May 2, 2023
* main: (760 commits)
  pythonGH-104102: Optimize `pathlib.Path.glob()` handling of `../` pattern segments (pythonGH-104103)
  pythonGH-104104: Optimize `pathlib.Path.glob()` by avoiding repeated calls to `os.path.normcase()` (pythonGH-104105)
  pythongh-103822: [Calendar] change return value to enum for day and month APIs (pythonGH-103827)
  pythongh-65022: Fix description of tuple return value in copyreg (python#103892)
  pythonGH-103525: Improve exception message from `pathlib.PurePath()` (pythonGH-103526)
  pythongh-84436: Add integration C API tests for immortal objects (pythongh-103962)
  pythongh-103743: Add PyUnstable_Object_GC_NewWithExtraData (pythonGH-103744)
  pythongh-102997: Update Windows installer to SQLite 3.41.2. (python#102999)
  pythonGH-103484: Fix redirected permanently URLs (python#104001)
  Improve assert_type phrasing (python#104081)
  pythongh-102997: Update macOS installer to SQLite 3.41.2. (pythonGH-102998)
  pythonGH-103472: close response in HTTPConnection._tunnel (python#103473)
  pythongh-88496: IDLE - fix another test on macOS (python#104075)
  pythongh-94673: Hide Objects in PyTypeObject Behind Accessors (pythongh-104074)
  pythongh-94673: Properly Initialize and Finalize Static Builtin Types for Each Interpreter (pythongh-104072)
  pythongh-104016: Skip test for deeply neste f-strings on wasi (python#104071)
  pythongh-104057: Fix direct invocation of test_super (python#104064)
  pythongh-87092: Expose assembler to unit tests (python#103988)
  pythongh-97696: asyncio eager tasks factory (python#102853)
  pythongh-84436: Immortalize in _PyStructSequence_InitBuiltinWithFlags() (pythongh-104054)
  ...
@github-project-automation github-project-automation bot moved this from In Progress to Done in Fancy CPython Board May 3, 2023
carljm added a commit to carljm/cpython that referenced this issue May 15, 2023
* main: (29 commits)
  pythongh-101819: Fix _io clinic input for unused base class method stubs (python#104418)
  pythongh-101819: Isolate `_io` (python#101948)
  Bump mypy from 1.2.0 to 1.3.0 in /Tools/clinic (python#104501)
  pythongh-104494: Update certain Tkinter pack/place tests for Tk 8.7 errors (python#104495)
  pythongh-104050: Run mypy on `clinic.py` in CI (python#104421)
  pythongh-104490: Consistently define phony make targets (python#104491)
  pythongh-67056: document that registering/unregistering an atexit func from within an atexit func is undefined (python#104473)
  pythongh-104487: PYTHON_FOR_REGEN must be minimum Python 3.10 (python#104488)
  pythongh-101282: move BOLT config after PGO (pythongh-104493)
  pythongh-104469 Convert _testcapi/float.c to use AC (pythongh-104470)
  pythongh-104456: Fix ref leak in _ctypes.COMError (python#104457)
  pythongh-98539: Make _SSLTransportProtocol.abort() safe to call when closed (python#104474)
  pythongh-104337: Clarify random.gammavariate doc entry  (python#104410)
  Minor improvements to typing docs (python#104465)
  pythongh-87092: avoid gcc warning on uninitialized struct field in assemble.c (python#104460)
  pythonGH-71383: IDLE - Document testing subsets of modules (python#104463)
  pythongh-104454: Fix refleak in AttributeError_reduce (python#104455)
  pythongh-75710: IDLE - add docstrings and comments to editor module (python#104446)
  pythongh-91896: Revert some very noisy DeprecationWarnings for `ByteString` (python#104424)
  Add a mention of PYTHONBREAKPOINT to breakpoint() docs (python#104430)
  ...
carljm added a commit to carljm/cpython that referenced this issue May 15, 2023
* main: (204 commits)
  pythongh-101819: Fix _io clinic input for unused base class method stubs (python#104418)
  pythongh-101819: Isolate `_io` (python#101948)
  Bump mypy from 1.2.0 to 1.3.0 in /Tools/clinic (python#104501)
  pythongh-104494: Update certain Tkinter pack/place tests for Tk 8.7 errors (python#104495)
  pythongh-104050: Run mypy on `clinic.py` in CI (python#104421)
  pythongh-104490: Consistently define phony make targets (python#104491)
  pythongh-67056: document that registering/unregistering an atexit func from within an atexit func is undefined (python#104473)
  pythongh-104487: PYTHON_FOR_REGEN must be minimum Python 3.10 (python#104488)
  pythongh-101282: move BOLT config after PGO (pythongh-104493)
  pythongh-104469 Convert _testcapi/float.c to use AC (pythongh-104470)
  pythongh-104456: Fix ref leak in _ctypes.COMError (python#104457)
  pythongh-98539: Make _SSLTransportProtocol.abort() safe to call when closed (python#104474)
  pythongh-104337: Clarify random.gammavariate doc entry  (python#104410)
  Minor improvements to typing docs (python#104465)
  pythongh-87092: avoid gcc warning on uninitialized struct field in assemble.c (python#104460)
  pythonGH-71383: IDLE - Document testing subsets of modules (python#104463)
  pythongh-104454: Fix refleak in AttributeError_reduce (python#104455)
  pythongh-75710: IDLE - add docstrings and comments to editor module (python#104446)
  pythongh-91896: Revert some very noisy DeprecationWarnings for `ByteString` (python#104424)
  Add a mention of PYTHONBREAKPOINT to breakpoint() docs (python#104430)
  ...
corona10 added a commit to corona10/cpython that referenced this issue Jun 2, 2023
corona10 added a commit that referenced this issue Jun 2, 2023
gh-105243)

gh-87092: avoid gcc warning on uninitialized struct field in assemble.c (part2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants