-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split compiler into code-gen, optimizer and assembler. #87092
Comments
Currently the compiler operates in three main passes: Code-gen The problem is that these passes use the same basic-block based CFG, leading to unnecessary coupling and inefficiencies. A better design would be for the code-gen to create a single linear sequence of instructions. The optimizer would take this and produce a list of extended-blocks for the assembler to consume. code-gen -> (list of instructions) -> optimizer (Extended blocks have a single entry and multiple exits, unlike basic blocks which have a single entry and single exit) This would:
Apart from the changes to the compiler, it would help if we made all branch instructions absolute (or have a backward dual) to accommodate free reordering of blocks in the optimizer. |
SGTM. But I’m not the one who has to work with it. |
…compiler's codegen stage does not work directly with basic blocks
…er's codegen stage does not work directly with basic blocks (GH-95398)
…he target pointer is only calculated just before optimization stage (GH-95655)
I've been struggling to determine where to draw the line between the optimization stage and the assembly stage. I think the solution is to add a fourth stage, between optimization and assembly, which prepares the CFG for assembly. It does all the normalisation of pseudo-stuff (instructions, targets, etc) into actual stuff (real opcodes, offsets, etc). I don't know if there is a name for this stage in compilers parlance. We could call it "resolve instructions" or something like that. It will include everything to do with: (1) calculating stackdepth and except targets, replacing exception related opcodes by NOP |
I don't know what that should be called either, but after this is done, are EXTENDED_ARG prefixes for jumps all set? I'm guessing, maybe make a similar list of responsibilities for the assembler? Because I'm not sure what those are. |
The responsibility of the assembler is to turn a list of instructions to a code object: (1) write the bytecode for the instructions Bytecode generation in stage (1) adds EXTENDED_ARG bytecodes when an instruction has a large oparg that requires it. The part that calculates jump offsets ((5) in the "resolve" list) takes into account the EXTENDED_ARGs when calculating block sizes. The idea is that all the complex calculations would be in the new stage, and we can write tests for them. Then the assembler's job is to transform the instructions representation to what we need in the code object (but all the exception/jump targets, lineno, block order etc is already resolved before). Then we can write tests just for this translation stage. |
SG. May the name can be inspired by “settle” or “tidy” or “clean”? Or maybe “resolve” is fine. |
From the list above: And splitting (1) into 1x (exception targets) and 1s (stack depth calculation) and adding In a normal compiler, there are de-sugaring and semantic analysis passes after parsing, but before optimization. (2) belongs in the optimizer (1s), (4) and (5) belong in the assembler. Is this a good order of passes? |
I like the idea or moving some of the resolution to before optimisations. I think the order you suggest is fine, by and large. I’ll need to check whether all the line number business is safe to move up front. Some of it involves duplicating blocks,etc. |
The lineno calculation needs to duplicate blocks that have no lineno but more than one predecessors. So it needs to come after mark_reachable, which is currently somewhere in the middle of the optimization stage. So that part of Mark's suggested reordering might be a problem. At least it needs to be done carefully. |
…e the CFG optimization stage (GH-96935)
* main: (66 commits) pythongh-65961: Raise `DeprecationWarning` when `__package__` differs from `__spec__.parent` (python#97879) docs(typing): add "see PEP 675" to LiteralString (python#97926) pythongh-97850: Remove all known instances of module_repr() (python#97876) I changed my surname early this year (python#96671) pythongh-93738: Documentation C syntax (:c:type:<C type> -> :c:expr:<C type>) (python#97768) pythongh-91539: improve performance of get_proxies_environment (python#91566) build(deps): bump actions/stale from 5 to 6 (python#97701) pythonGH-95172 Make the same version `versionadded` oneline (python#95172) pythongh-88050: Fix asyncio subprocess to kill process cleanly when process is blocked (python#32073) pythongh-93738: Documentation C syntax (Function glob patterns -> literal markup) (python#97774) pythongh-93357: Port test cases to IsolatedAsyncioTestCase, part 2 (python#97896) pythongh-95196: Disable incorrect pickling of the C implemented classmethod descriptors (pythonGH-96383) pythongh-97758: Fix a crash in getpath_joinpath() called without arguments (pythonGH-97759) pythongh-74696: Pass root_dir to custom archivers which support it (pythonGH-94251) pythongh-97661: Improve accuracy of sqlite3.Cursor.fetchone docs (python#97662) pythongh-87092: bring compiler code closer to a preprocessing-opt-assembler organisation (pythonGH-97644) pythonGH-96704: Add {Task,Handle}.get_context(), use it in call_exception_handler() (python#96756) pythongh-93738: Documentation C syntax (:c:type:`PyTypeObject*` -> :c:expr:`PyTypeObject*`) (python#97778) pythongh-97825: fix AttributeError when calling subprocess.check_output(input=None) with encoding or errors args (python#97826) Add re.VERBOSE flag documentation example (python#97678) ...
* main: (53 commits) pythongh-102498 Clean up unused variables and imports in the email module (python#102482) pythongh-99184: Bypass instance attribute access in `repr` of `weakref.ref` (python#99244) pythongh-99032: datetime docs: Encoding is no longer relevant (python#93365) pythongh-94300: Update datetime.strptime documentation (python#95318) pythongh-103776: Remove explicit uses of $(SHELL) from Makefile (pythonGH-103778) pythongh-87092: fix a few cases of incorrect error handling in compiler (python#103456) pythonGH-103727: Avoid advancing tokenizer too far in f-string mode (pythonGH-103775) Revert "Add tests for empty range equality (python#103751)" (python#103770) pythongh-94518: Port 23-argument `_posixsubprocess.fork_exec` to Argument Clinic (python#94519) pythonGH-65022: Fix description of copyreg.pickle function (python#102656) pythongh-103323: Get the "Current" Thread State from a Thread-Local Variable (pythongh-103324) pythongh-91687: modernize dataclass example typing (python#103773) pythongh-103746: Test `types.UnionType` and `Literal` types together (python#103747) pythongh-103765: Fix 'Warning: py:class reference target not found: ModuleSpec' (pythonGH-103769) pythongh-87452: Improve the Popen.returncode docs Removed unnecessary escaping of asterisks (python#103714) pythonGH-102973: Slim down Fedora packages in the dev container (python#103283) pythongh-103091: Add PyUnstable_Type_AssignVersionTag (python#103095) Add tests for empty range equality (python#103751) pythongh-103712: Increase the length of the type name in AttributeError messages (python#103713) ...
* superopt: (82 commits) pythongh-101517: fix line number propagation in code generated for except* (python#103550) pythongh-103780: Use patch instead of mock in asyncio unix events test (python#103782) pythongh-102498 Clean up unused variables and imports in the email module (python#102482) pythongh-99184: Bypass instance attribute access in `repr` of `weakref.ref` (python#99244) pythongh-99032: datetime docs: Encoding is no longer relevant (python#93365) pythongh-94300: Update datetime.strptime documentation (python#95318) pythongh-103776: Remove explicit uses of $(SHELL) from Makefile (pythonGH-103778) pythongh-87092: fix a few cases of incorrect error handling in compiler (python#103456) pythonGH-103727: Avoid advancing tokenizer too far in f-string mode (pythonGH-103775) Revert "Add tests for empty range equality (python#103751)" (python#103770) pythongh-94518: Port 23-argument `_posixsubprocess.fork_exec` to Argument Clinic (python#94519) pythonGH-65022: Fix description of copyreg.pickle function (python#102656) pythongh-103323: Get the "Current" Thread State from a Thread-Local Variable (pythongh-103324) pythongh-91687: modernize dataclass example typing (python#103773) pythongh-103746: Test `types.UnionType` and `Literal` types together (python#103747) pythongh-103765: Fix 'Warning: py:class reference target not found: ModuleSpec' (pythonGH-103769) pythongh-87452: Improve the Popen.returncode docs Removed unnecessary escaping of asterisks (python#103714) pythonGH-102973: Slim down Fedora packages in the dev container (python#103283) pythongh-103091: Add PyUnstable_Type_AssignVersionTag (python#103095) ...
* main: (26 commits) pythongh-104028: Reduce object creation while calling callback function from gc (pythongh-104030) pythongh-104036: Fix direct invocation of test_typing (python#104037) pythongh-102213: Optimize the performance of `__getattr__` (pythonGH-103761) pythongh-103895: Improve how invalid `Exception.__notes__` are displayed (python#103897) Adjust expression from `==` to `!=` in alignment with the meaning of the paragraph. (pythonGH-104021) pythongh-88496: Fix IDLE test hang on macOS (python#104025) Improve int test coverage (python#104024) pythongh-88773: Added teleport method to Turtle library (python#103974) pythongh-104015: Fix direct invocation of `test_dataclasses` (python#104017) pythongh-104012: Ensure test_calendar.CalendarTestCase.test_deprecation_warning consistently passes (python#104014) pythongh-103977: compile re expressions in platform.py only if required (python#103981) pythongh-98003: Inline call frames for CALL_FUNCTION_EX (pythonGH-98004) Replace Netlify with Read the Docs build previews (python#103843) Update name in acknowledgements and add mailmap (python#103696) pythongh-82054: allow test runner to split test_asyncio to execute in parallel by sharding. (python#103927) Remove non-existing tools from Sundry skiplist (python#103991) pythongh-103793: Defer formatting task name (python#103767) pythongh-87092: change assembler to use instruction sequence instead of CFG (python#103933) pythongh-103636: issue warning for deprecated calendar constants (python#103833) Various small fixes to dis docs (python#103923) ...
* main: (463 commits) pythongh-104057: Fix direct invocation of test_super (python#104064) pythongh-87092: Expose assembler to unit tests (python#103988) pythongh-97696: asyncio eager tasks factory (python#102853) pythongh-84436: Immortalize in _PyStructSequence_InitBuiltinWithFlags() (pythongh-104054) pythongh-104057: Fix direct invocation of test_module (pythonGH-104059) pythongh-100458: Clarify Enum.__format__() change of mixed-in types in the whatsnew/3.11.rst (pythonGH-100387) pythongh-104018: disallow "z" format specifier in %-format of byte strings (pythonGH-104033) pythongh-104016: Fixed off by 1 error in f string tokenizer (python#104047) pythonGH-103629: Update Unpack's repr in compliance with PEP 692 (python#104048) pythongh-102799: replace sys.exc_info by sys.exception in inspect and traceback modules (python#104032) Fix typo in "expected" word in few source files (python#104034) pythongh-103824: fix use-after-free error in Parser/tokenizer.c (python#103993) pythongh-104035: Do not ignore user-defined `__{get,set}state__` in slotted frozen dataclasses (python#104041) pythongh-104028: Reduce object creation while calling callback function from gc (pythongh-104030) pythongh-104036: Fix direct invocation of test_typing (python#104037) pythongh-102213: Optimize the performance of `__getattr__` (pythonGH-103761) pythongh-103895: Improve how invalid `Exception.__notes__` are displayed (python#103897) Adjust expression from `==` to `!=` in alignment with the meaning of the paragraph. (pythonGH-104021) pythongh-88496: Fix IDLE test hang on macOS (python#104025) Improve int test coverage (python#104024) ...
* main: (760 commits) pythonGH-104102: Optimize `pathlib.Path.glob()` handling of `../` pattern segments (pythonGH-104103) pythonGH-104104: Optimize `pathlib.Path.glob()` by avoiding repeated calls to `os.path.normcase()` (pythonGH-104105) pythongh-103822: [Calendar] change return value to enum for day and month APIs (pythonGH-103827) pythongh-65022: Fix description of tuple return value in copyreg (python#103892) pythonGH-103525: Improve exception message from `pathlib.PurePath()` (pythonGH-103526) pythongh-84436: Add integration C API tests for immortal objects (pythongh-103962) pythongh-103743: Add PyUnstable_Object_GC_NewWithExtraData (pythonGH-103744) pythongh-102997: Update Windows installer to SQLite 3.41.2. (python#102999) pythonGH-103484: Fix redirected permanently URLs (python#104001) Improve assert_type phrasing (python#104081) pythongh-102997: Update macOS installer to SQLite 3.41.2. (pythonGH-102998) pythonGH-103472: close response in HTTPConnection._tunnel (python#103473) pythongh-88496: IDLE - fix another test on macOS (python#104075) pythongh-94673: Hide Objects in PyTypeObject Behind Accessors (pythongh-104074) pythongh-94673: Properly Initialize and Finalize Static Builtin Types for Each Interpreter (pythongh-104072) pythongh-104016: Skip test for deeply neste f-strings on wasi (python#104071) pythongh-104057: Fix direct invocation of test_super (python#104064) pythongh-87092: Expose assembler to unit tests (python#103988) pythongh-97696: asyncio eager tasks factory (python#102853) pythongh-84436: Immortalize in _PyStructSequence_InitBuiltinWithFlags() (pythongh-104054) ...
* main: (29 commits) pythongh-101819: Fix _io clinic input for unused base class method stubs (python#104418) pythongh-101819: Isolate `_io` (python#101948) Bump mypy from 1.2.0 to 1.3.0 in /Tools/clinic (python#104501) pythongh-104494: Update certain Tkinter pack/place tests for Tk 8.7 errors (python#104495) pythongh-104050: Run mypy on `clinic.py` in CI (python#104421) pythongh-104490: Consistently define phony make targets (python#104491) pythongh-67056: document that registering/unregistering an atexit func from within an atexit func is undefined (python#104473) pythongh-104487: PYTHON_FOR_REGEN must be minimum Python 3.10 (python#104488) pythongh-101282: move BOLT config after PGO (pythongh-104493) pythongh-104469 Convert _testcapi/float.c to use AC (pythongh-104470) pythongh-104456: Fix ref leak in _ctypes.COMError (python#104457) pythongh-98539: Make _SSLTransportProtocol.abort() safe to call when closed (python#104474) pythongh-104337: Clarify random.gammavariate doc entry (python#104410) Minor improvements to typing docs (python#104465) pythongh-87092: avoid gcc warning on uninitialized struct field in assemble.c (python#104460) pythonGH-71383: IDLE - Document testing subsets of modules (python#104463) pythongh-104454: Fix refleak in AttributeError_reduce (python#104455) pythongh-75710: IDLE - add docstrings and comments to editor module (python#104446) pythongh-91896: Revert some very noisy DeprecationWarnings for `ByteString` (python#104424) Add a mention of PYTHONBREAKPOINT to breakpoint() docs (python#104430) ...
* main: (204 commits) pythongh-101819: Fix _io clinic input for unused base class method stubs (python#104418) pythongh-101819: Isolate `_io` (python#101948) Bump mypy from 1.2.0 to 1.3.0 in /Tools/clinic (python#104501) pythongh-104494: Update certain Tkinter pack/place tests for Tk 8.7 errors (python#104495) pythongh-104050: Run mypy on `clinic.py` in CI (python#104421) pythongh-104490: Consistently define phony make targets (python#104491) pythongh-67056: document that registering/unregistering an atexit func from within an atexit func is undefined (python#104473) pythongh-104487: PYTHON_FOR_REGEN must be minimum Python 3.10 (python#104488) pythongh-101282: move BOLT config after PGO (pythongh-104493) pythongh-104469 Convert _testcapi/float.c to use AC (pythongh-104470) pythongh-104456: Fix ref leak in _ctypes.COMError (python#104457) pythongh-98539: Make _SSLTransportProtocol.abort() safe to call when closed (python#104474) pythongh-104337: Clarify random.gammavariate doc entry (python#104410) Minor improvements to typing docs (python#104465) pythongh-87092: avoid gcc warning on uninitialized struct field in assemble.c (python#104460) pythonGH-71383: IDLE - Document testing subsets of modules (python#104463) pythongh-104454: Fix refleak in AttributeError_reduce (python#104455) pythongh-75710: IDLE - add docstrings and comments to editor module (python#104446) pythongh-91896: Revert some very noisy DeprecationWarnings for `ByteString` (python#104424) Add a mention of PYTHONBREAKPOINT to breakpoint() docs (python#104430) ...
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: