You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Zephyr build infrastructure currently has support for a two-stage build, but the data that's calculated from the first stage to inform the second stage may result in changes that invalidate other analyses.
Adding general support for multiple stages reduces the risk of subtle bugs resulting from rewriting/adding objects, and opens the possibility of better space optimization by reducing object content to only what's required.
Problem description
The Zephyr build infrastructure currently has support for a two-stage build.
In the first stage all sources are compiled, and an initial application image is generated as zephyr_prebuilt.elf. This file is used as input for multiple post-processing steps that occur in stage 2, some of which are:
If CONFIG_USERSPACE=y the prebuilt image is scanned by gen_kobject_list to identify the addresses and characteristics of all kernel objects. A gperf hash table is generated that allows runtime lookup of kernel objects by address; this table is then added to the inputs for the final link
If CONFIG_DEMAND_PAGING=y page tables covering the memory used by prebuilt.elf are generated by arch/x86/gen_mmu.py.
If CONFIG_GEN_ISR_TABLES=y the prebuilt image is scanned by arch/common/gen_isr_tables.py to generate an interrupt vector object.
The second stage completes by linking a new zephyr.elf image that combines much of what went into the first image with the new content from the scripts identified above, among others.
This all works only if the addresses of kernel objects or page tables doesn't change between the first and second stage links. Incorrect placement of output sections in the linker scripts can cause this to happen.
In fact, this may not work reliably. The mmu code notes that the additional gperf data supporting userspace may invalidate the RAM footprint that produced the page table. A workaround is provided.
This also forces a sub-optimal solution to representing device dependency data (see #32127), in that the need to avoid changing the address of kernel objects forces generation of device arrays that are exactly the same size as the raw inputs from devicetree, when in fact many of the dependencies collapse (because they identify virtual nodes with a common device parent), and in several situations multiple devices could reference the same dependency set.
Proposed change
Extend the CMake build infrastructure to allow multiple stages, selected based on required post-processing features and their independencies. Always start with the first prebuilt linked application. Then have CMake encode a series of phases, each of which was enabled only if at least one of the mutually-independent processing steps in it was required. The result of an enabled stage would serve as the input to the next enabled stage. The final application would have as many link phases as it requires, but no more.
First Example (no new features)
The result of processing in support of CONFIG_USERSPACE=y and CONFIG_GEN_ISR_TABLES=y are safe to run in a single stage:
the first generates new objects that by design are placed in memory after all kernel objects
the second latter writes to dedicated memory
Neither of these can change the address of kernel objects, so they're safe to run on the same input and feed the same output. However, the first can change the amount of memory used in the kernel image. These would be run together (with some others equally independent), then a second stage application linked.
The second stage application would then be processed for CONFIG_DEMAND_PAGING=y, and having full knowledge of the actual memory used would generate correct page tables that don't require a runtime fixup.
The final application would be linked from the inputs to the second stage plus the page tables.
Second Example (optimized dependency data)
Here the proposal is to replace the device-unique arrays holding dependency information not with arrays that have optimized dependency information but must be padded to retain size, but instead with reference to packed arrays, some of which will be shared by all devices that have the same structural dependencies.
In this case the second phase would use the gen_handles.py script of #32127 reworked to generate these space-optimized dependency lists. These would then be linked into a second-phase application in the addresses of multiple objects may change.
The analysis for CONFIG_USERSPACE=y and CONFIG_GEN_ISR_TABLES=y and others would then be run on the second stage object, producing a third-stage object that has the final kernel footprint excluding page tables.
A fourth stage would be run if CONFIG_DEMAND_PAGING=y, providing the application plus its page tables.
Concerns and Unresolved Questions
There will be some overhead from multiple links, but this must be weighed against the complexity of the alternatives.
Alternatives
Do nothing, and lose the opportunity to perform executable post-processing that that optimizes the final image in ways that invalidate other post-processing analysis.
The text was updated successfully, but these errors were encountered:
@tejlmand@galak@dcpleung@carlescufi@andyross@nashif and others: FYI. I do not propose to do this (I don't have the CMake skills), but it would enable some significant improvements in use of devicetree data by allowing post-processing that doesn't have to respect the "do not move anything" requirements of kernel object and page table analysis.
carlescufi
changed the title
RFC: Extend Zephyr Cmake infrastructure to support multiple application link phases
RFC: Extend Zephyr CMake infrastructure to support multiple application link phases
Feb 9, 2021
Introduction
The Zephyr build infrastructure currently has support for a two-stage build, but the data that's calculated from the first stage to inform the second stage may result in changes that invalidate other analyses.
Adding general support for multiple stages reduces the risk of subtle bugs resulting from rewriting/adding objects, and opens the possibility of better space optimization by reducing object content to only what's required.
Problem description
The Zephyr build infrastructure currently has support for a two-stage build.
In the first stage all sources are compiled, and an initial application image is generated as
zephyr_prebuilt.elf
. This file is used as input for multiple post-processing steps that occur in stage 2, some of which are:CONFIG_USERSPACE=y
the prebuilt image is scanned by gen_kobject_list to identify the addresses and characteristics of all kernel objects. A gperf hash table is generated that allows runtime lookup of kernel objects by address; this table is then added to the inputs for the final linkCONFIG_DEMAND_PAGING=y
page tables covering the memory used byprebuilt.elf
are generated byarch/x86/gen_mmu.py
.CONFIG_GEN_ISR_TABLES=y
the prebuilt image is scanned byarch/common/gen_isr_tables.py
to generate an interrupt vector object.The second stage completes by linking a new
zephyr.elf
image that combines much of what went into the first image with the new content from the scripts identified above, among others.This all works only if the addresses of kernel objects or page tables doesn't change between the first and second stage links. Incorrect placement of output sections in the linker scripts can cause this to happen.
In fact, this may not work reliably. The mmu code notes that the additional gperf data supporting userspace may invalidate the RAM footprint that produced the page table. A workaround is provided.
This also forces a sub-optimal solution to representing device dependency data (see #32127), in that the need to avoid changing the address of kernel objects forces generation of device arrays that are exactly the same size as the raw inputs from devicetree, when in fact many of the dependencies collapse (because they identify virtual nodes with a common device parent), and in several situations multiple devices could reference the same dependency set.
Proposed change
Extend the CMake build infrastructure to allow multiple stages, selected based on required post-processing features and their independencies. Always start with the first
prebuilt
linked application. Then have CMake encode a series of phases, each of which was enabled only if at least one of the mutually-independent processing steps in it was required. The result of an enabled stage would serve as the input to the next enabled stage. The final application would have as many link phases as it requires, but no more.First Example (no new features)
The result of processing in support of
CONFIG_USERSPACE=y
andCONFIG_GEN_ISR_TABLES=y
are safe to run in a single stage:Neither of these can change the address of kernel objects, so they're safe to run on the same input and feed the same output. However, the first can change the amount of memory used in the kernel image. These would be run together (with some others equally independent), then a second stage application linked.
The second stage application would then be processed for
CONFIG_DEMAND_PAGING=y
, and having full knowledge of the actual memory used would generate correct page tables that don't require a runtime fixup.The final application would be linked from the inputs to the second stage plus the page tables.
Second Example (optimized dependency data)
Here the proposal is to replace the device-unique arrays holding dependency information not with arrays that have optimized dependency information but must be padded to retain size, but instead with reference to packed arrays, some of which will be shared by all devices that have the same structural dependencies.
In this case the second phase would use the
gen_handles.py
script of #32127 reworked to generate these space-optimized dependency lists. These would then be linked into a second-phase application in the addresses of multiple objects may change.The analysis for
CONFIG_USERSPACE=y
andCONFIG_GEN_ISR_TABLES=y
and others would then be run on the second stage object, producing a third-stage object that has the final kernel footprint excluding page tables.A fourth stage would be run if
CONFIG_DEMAND_PAGING=y
, providing the application plus its page tables.Concerns and Unresolved Questions
There will be some overhead from multiple links, but this must be weighed against the complexity of the alternatives.
Alternatives
Do nothing, and lose the opportunity to perform executable post-processing that that optimizes the final image in ways that invalidate other post-processing analysis.
The text was updated successfully, but these errors were encountered: