From 75dd482ad44a779bf39f10ce89cf14ab2b358e78 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=ADctor=20Mayoral=20Vilches?= Date: Thu, 12 Aug 2021 09:30:41 +0200 Subject: [PATCH 01/24] Initial draft of REP-2008 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ROS 2 Hardware Acceleration Architecture and Conventions Signed-off-by: Víctor Mayoral Vilches --- rep-2008.rst | 778 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 778 insertions(+) create mode 100644 rep-2008.rst diff --git a/rep-2008.rst b/rep-2008.rst new file mode 100644 index 000000000..61ea72adb --- /dev/null +++ b/rep-2008.rst @@ -0,0 +1,778 @@ +REP: 2008 +Title: ROS 2 Hardware Acceleration Architecture and Conventions +Author: Víctor Mayoral-Vilches +Status: Active +Type: Informational +Content-Type: text/x-rst +Created: 10-Aug-2021 +Post-History: 10-Aug-2021 + + +Abstract +======== + +This REP describes the architectural pillars and conventions required to introduce hardware acceleration in ROS 2 in a scalable and technology-agnostic manner. A set of categories meant to classify supported hardware acceleration solutions is provided. Inclusion in a category is based on the fulfillment of the hardware acceleration capabilities. + + +Motivation +========== + +With the decline of Moore’s Law, hardware acceleration has proven itself as the answer for achieving higher performance gains in robotics [1]_. By creating specialized compute architectures that rely on specific hardware (i.e. through FPGAs or GPUs), hardware acceleration empowers faster robots, with reduced computation times (real fast, as opposed to real-time), lower power consumption and more deterministic behaviours. The core idea is that instead of following the traditional control-driven approach for software development in robotics, a mixed control- and data-driven one allows to design custom compute architectures that further exploit parallelism. To do so, we need to integrate in the ROS 2 ecosystem external frameworks, tools and libraries that facilitate creating parallel compute architectures. + +The purpose of this REP is to provide standard guidelines for how to use hardware acceleration in combination with ROS 2. These guidelines are realized in the form of a series of ROS 2 packages that integrate these external resources and provide a ROS 2-centric open architecture for hardware acceleration. + +The architecture proposed extends the ROS 2 build system (``ament``), the ROS 2 build tools (``colcon``) and add a new firmware pillar to simplify the production and deployment of acceleration kernels. The architecture is agnostic to the computing target (i.e. considers support for edge, workstation, data center or cloud targets), technology-agnostic (considers initial support for FPGAs and GPUs), application-agnostic and modular, which enhances portability to new hardware solutions and other silicon vendors. The core components of the architecture are disclosed under an Apache 2.0 license, available and maintained at the `ROS 2 Hardware Acceleration Working Group GitHub organization `_. + +Value for stakeholders: + +- Package maintainers can use these guidelines to integrate hardware acceleration capabilities in their ROS 2 packages. + +- Consumers can use the guidelines in the REP, as well as the corresponding category of each hardware solution, to set expectations on the hardware acceleration capabilities that could be obtained from each vendor's hardware solution. + +- Silicon vendors and solution manufacturers can use these guidelines to connect their firmware and technologies to the ROS 2 ecosystem, obtaining direct support for hardware acceleration in all ROS 2 packages that support it. + + +The outcome of this REP should be that maintainers who want to leverage hardware acceleration in their packages, can do so with consistent guidelines and with support across multiple technologies (FPGAs and GPUs) by following the conventions set. This way, maintainers will be able to create ROS 2 packages with support for hardware acceleration that can run across hardware acceleration technologies, including FPGAs and GPUs. + +In turn, the documentation of categories and hardware acceleration capabilities will improve. +The guidelines in here provide a ROS 2-centric open architecture for hardware acceleration, which silicon vendors can decide to adopt when engaging with the ROS 2 community. + + +Antigoals +^^^^^^^^^ + +The motivation behind this REP does not include: + +* Dictating policy implementation specifics on maintainers to achieve hardware acceleration + + * Policy requirements (which framework to use, specific C++ preprocessor instructions, etc.) are *intentionally generic* and to remain technology-agnostic (from a hardware acceleration perspective). + * Maintainers can come up with their own policies depending on the technology targeted (FPGAs, GPUs, etc.) and framework used (``HLS``, ``ROCm``, ``CUDA``, etc). + + +* Enforcing specific hardware solutions on maintainers + + * No maintainer is required to target any specific hardware solution by any of the guidelines in this REP. + * Instead, maintainers are free to choose among any of the supported technologies and hardware solutions. + + +Architecture pillars +==================== + +:: + + ROS 2 stack Hardware Acceleration Architecture @ ROS 2 stack + + +-------------+ +---------------------------+ + | | | acceleration_examples | + | user land | +-------------------+-----------+----------------+--------------------------------+ + | | | Drivers | Libraries | Cloud | + +-------------+ +--------------+----+---------+-----------------------------------+---------------+ + | | | ament_sol1 | ament_soln | | accel_fw_sol1 | accel_fw_soln | + | | +--------------+--------------+------------------+----------------+---------------+ + | tooling | | ament_acceleration | colcon_acceleration | acceleration_firmware | + | | +---------------------------------------------------------------------------------+ + | | | build system | meta build | firmware | + +-------------+ +--------------------------+---------------------+--------------------------------+ + | rcl | ^ ^ ^ + +-------------+ | | | + | rmw | | | | + +-------------+ | | | + | rmw_adapter | Pillar I Pillar II Pillar III + +-------------+ + + + +.. _pillarI: + +Pillar I - Extensions to ament +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +The first pillar represents extensions of the ``ament`` ROS 2 build system. These CMake extensions help achieve the objective of simplifying the creation of acceleration kernels. By providing an experience and a syntax similar to other ROS 2 libraries targeting CPUs, maintainers will be able to integrate kernels into their packages easily. The ``ament_acceleration`` ROS 2 package abstracts the build system extensions from technology-specific frameworks and software platforms. This allows to easily support hardware acceleration across FPGAs and GPUs while using the same syntax, simplifying the work of maintainers. The code listing below provides an example that instructs the ``CMakeLists.txt`` file of a ROS 2 package to build a ``vadd`` kernel with the corresponding sources and configuration: + +:: + + acceleration_kernel( + NAME vadd + FILE src/vadd.cpp + INCLUDE + include + TARGET kv260 + ) + + +Under the hood, each specialization of ``ament_acceleration`` should rely on the corresponding technology libraries to enable it. For example, `ament_vitis `_ relies on Vitis Unified Software Platform and on the Xilinx Runtime (XRT) library to generate acceleration kernels and facilitate OpenCL communication between the application code and the kernels. Vitis and XRT are completely hidden from the robotics engineer, simplifying the creation of kernels through simple CMake macros. The same kernel expressed with ``ament_vitis`` is presented below: + + +:: + + vitis_acceleration_kernel( + NAME vadd + FILE src/vadd.cpp + CONFIG src/kv260.cfg + INCLUDE + include + TYPE + sw_emu + # hw_emu + # hw + PACKAGE + ) + + + +While ``ament_acceleration`` CMake macros are preferred and will be encouraged, maintainers are free to choose among the CMake macros available. After all, it'll be hard to define a generic set of macros that fits all use cases across technologies. + +Through ``ament_acceleration`` and technology-specific specializations (like ``ament_vitis``), the ROS 2 build system is automatically enhanced to support producing acceleration kernel and related artifacts as part of the build process when invoking ``colcon build``. To facilitate the work of maintainers, this additional functionality is configurable through ``mixins`` that can be added to the build step of a ROS 2 workspace, triggering all the hardware acceleration logic only when appropriate (e.g. when ``--mixin kv260`` is appended to the build effort, it'll trigger the build of kernels targeting the KV260 hardware solution). For a reference implementation of these enhacements, refer to `ament_vitis `_. + + +.. _pillarII: + +Pillar II - Extensions to colcon +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +The second pillar extends the ``colcon`` ROS 2 meta built tools to integrate hardware acceleration flows into the ROS 2 CLI tooling and allow to manage it. Examples of these extensions include emulation capabilities to speed-up the development process and/or facilitate it without access to the real hardware, or raw image production tools, which are convenient when packing together acceleration kernels for embedded targets. These extensions are implemented by the `colcon_acceleration `_ ROS 2 package. On a preliminary implementation, these extensions are provided the following ``colcon acceleration`` subverbs: + +:: + + colcon acceleration subverbs: + board Report the board supported in the deployed firmware + emulation Manage hardware emulation capabilities + hls Vitis HLS capabilities management extension + hypervisor Configure the Xen hypervisor + linux Configure the Linux kernel + list List supported firmware for hardware acceleration + mkinitramfs Creates compressed cpio initramfs (ramdisks) from raw image + mount Mount raw images + package Packages workspace with kernels for distribution + platform Report the platform enabled in the deployed firmware + select Select an existing firmware and default to it. + umount Umount raw images + v++ Vitis v++ compiler wrapper + version Report version of the tool + + +Using the ``list`` and ``select`` subverbs, it's possible to inspect and configure the different hardware acceleration solutions. The rest of the subverbs allow to manage hardware acceleration artifacts and platforms in a simplified manner. + +In turn, the list of subverbs will improve and grow to cover other technology solutions. + + +.. _pillarIII: + +Pillar III - firmware +^^^^^^^^^^^^^^^^^^^^^ + +The third pillar is firmware. Represented by the abstract ``acceleration_firmware`` package, it is meant to provide firmware artifacts for each supported technology so that the kernels can be compiled against them, simplifying the process for consumers and maintainers, and further aligning with the ROS typical development flow. + +Each ROS 2 workspace can have one or multiple firmware packages deployed. The selection of the active firmware in the workspace is performed by the ``colcon acceleration select`` subverb (pillarII_). To get a technology solution aligned with this REP's architecture, each vendor should provide and maintain an ``acceleration_firmware_`` package specialization that delivers the corresponding artifacts in line with its supported categories_ and capabilities_. Firmware artifacts should be deployed at ``/acceleration/firmware/`` and be ready to be used by the ROS 2 build system extensions at (pillarI_) . For a reference implementation of specialized vendor firmware package, see `acceleration_firmware_kv260 `_. + +By splitting vendors across packages with a common front-end (``acceleration_firmware``), consumers and maintainers can easily switch between hardware acceleration solutions and + + + +.. _specification: + +Specification +============= + +To drive the creation, maintenance and testing of acceleration kernels in ROS 2 packages that are agnostic to the computing target (i.e. consider support for edge, workstation, data center or cloud targets) and technology-agnostic (considers initial support for FPGAs and GPUs), this REP builds on top of open standards. Particularly, ``OpenCL 1.2`` ([2]_, [3]_) is encouraged for a well established standardized interoperability between the host-side (CPU) and the acceleration kernel. Unless stated otherwise, the hardware acceleration terminology used in this document follows the OpenCL nomenclature for hardware acceleration. + +A ROS 2 package supports hardware acceleration if it provides support for at least one of the *supported hardware acceleration solutions* that comply with this REP. + +A hardware acceleration solution from a given vendor is *supported* if it at least has a `Compatible` category. + + +.. _Kernel Levels: + +Kernel levels in ROS 2 +^^^^^^^^^^^^^^^^^^^^^^^ +To favour modularity, organize kernels and allow robotics architects to select only those accelerators needed to meet the requirements of their application, acceleration kernels in ROS 2 will be classified in 3 levels according to the ROS layer/underlayer they impact: + +.. _Level I kernels: + +- *Level 1* - **ROS 2 applications and libraries**: This group corresponds with acceleration kernels that speed-up OSI L7 applications or libraries on top of ROS 2. Any computation on top of ROS 2 is a good a candidate for this category. Examples include selected components in the navigation, manipulation, perception or control stacks. + +.. _Level II kernels: + +- *Level 2* - **ROS 2 core packages**: This includes kernels that accelerate or offload OSI L7 ROS 2 core components and tools to a dedicated acceleration solution (e.g. an FPGA). Namely, we consider ``rclcpp``, ``rcl``, ``rmw``, and the corresponding ``rmw_adapters`` for each supported communication middleware. Examples includes ROS 2 executors for more deterministic behaviours [4]_, or complete hardware offloaded ROS 2 Nodes [5]_. + +.. _Level III kernels: + +- *Level 3* - **ROS 2 underlayers**: Groups together all accelerators below the ROS 2 core layers belonging to OSI L2-L7, including the communication middleware (e.g. DDS). Examples of such accelerators include a complete or partial DDS implementation, an offloaded networking stack or a data link layer for real-time deterministic, low latency and high throughput interactions. + +Hardware acceleration solutions complying with this REP should aspire to support multiple kernel levels in ROS 2 to maximize consumer experience. + +.. _benchmarking: + +Benchmarking +^^^^^^^^^^^^ +Benchmarking is the act of running a computer program to assess its relative performance. In the context of hardware acceleration, it's fundamental to assess the relative performance of an acceleration kernel versus its CPU scalar computing baseline. Similarly, benchmarking helps comparing acceleration kernels across hardware acceleration technology solution (e.g. FPGA_A vs FPGA_B or FPGA_A vs GPU_A, etc.) and across kernel implementations (within the same hardware acceleration technology solution). + +There're different types of benchmarking approaches. The following diagram depicts the most popular inspired by [6]_: + +:: + + Probe Probe + + + + | | + +--------|------------|-------+ +-----------------------------+ + | | | | | | + | +--|------------|-+ | | | + | | v v | | | - latency <--------------+ Probe + | | | | | - throughput<--------------+ Probe + | | Function | | | - memory <--------------+ Probe + | | | | | - CPU <--------------+ Probe + | +-----------------+ | | | + | System under test | | System under test | + +-----------------------------+ +-----------------------------+ + + + Functional Non-functional + + + +-------------+ +----------------------------+ + | Test App. | | +-----------------------+ | + | + + + + | | | Application | | + +--|-|--|--|--+---------------+ | | <------------+ Probe + | | | | | | | +-----------------------+ | + | v v v v | | | + | Probes | | <------------+ Probe + | | | | + | System under test | | System under test | + | | | <------------+ Probe + | | | | + | | | | + +-----------------------------+ +----------------------------+ + + + Black-Box Grey-box + + +In addition, the following aspects should be considered when benchmarking acceleration kernels in ROS 2: + +- `embedded`: Benchmarks should run in embedded *easily* +- `ROS 2-native`: Benchmarks should consider the particularities of ROS 2 and its computational graph. If necessary, they should instrument the communications middleware and its underlying layers. +- `intra-process, inter-process and intra-network`: Measures conducted should consider communication within a process in the same SoC, between processes in an SoC and between different SoCs connected in the same network (intra-network). *Inter-network measures are beyond the scope of this REP*. +- `compute substrate-agnostic`: benchmarks should be able to run on different hardware acceleration technology solutions. For that purpose, a CPU-centric framework (as opposed to an acceleration technology-specific framework) for benchmarking and/or tracing is the ideal choice. +- `automated`: benchmarks and related source code should be designed with automation in mind. Once ready, creating a benchmark and producing results should be (ideally) a fully automated process. +- `hardware farm mindset`: benchmarks will be conducted on hardware embedded platforms sitting in a farm-like environment (redundancy of tests, multiple SoCs/boards) with the intent of validating and comparing different technologies. + +Accounting for all of this, in this REP, we adopt a grey-box and non-functional benchmarking approach for hardware acceleration that allows to evaluate the relative performance of accelerated ROS 2 individual nodes and complete computational graphs. To realize it in a technology agnostic-manner, we select the Linux Tracing Toolkit next generation (`LTTng `_) which will be used for tracing and benchmarking. + +Differences between tracing and benchmarking +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Tracing and benchmarking can be defined as follows: + +- `tracing`: a technique used to understand what goes on in a running software system. +- `benchmarking`: a method of comparing the performance of various systems by running a common test. + +From these definitions, inherently one can determine that both benchmarking and tracing are connected in the sense that the test/benchmark will use a series of measurements for comparison. These measurements will come from tracing probes. In other words, tracing will collect data that will then be fed into a benchmark program for comparison. + + +.. _acceleration examples: + +Acceleration examples +^^^^^^^^^^^^^^^^^^^^^ + +For the sake of illustrating maintainers and consumers how to build their own acceleration kernels and guarantee interoperability across technologies, a ROS 2 meta-package named `acceleration_examples `_ will be maintained and made available. This meta-package will contain various packages with simple common acceleration examples. Each one of these examples should support all hardware acceleration solutions complying with this REP. + +In turn, a CI system will be set to build periodically and for every commit the meta-package. + +.. _capabilities: + +Capabilities +^^^^^^^^^^^^ + +The following list describes the hardware acceleration capabilities that hardware acceleration vendors must consider when connecting their firmware and technology solutions to the ROS 2 ecosystem. + +.. _Kernel Levels Capabilities: + +1. **Kernel Levels:** + + .. _1.i: + + i. Must provide at least one package producing `Level I kernels`_ + + .. _1.ii: + + ii. Must provide at least one package producing `Level II kernels`_ + + .. _1.iii: + + iii. Must provide at least one package producing `Level III kernels`_ + + +.. _Build System Enhancements: + +2. **Build System Enhancements:** + + .. _2.i: + + i. Must have an ``ament_`` (e.g. ``ament_vitis``) package that integrates technology-specific frameworks and software platforms into the ROS 2 build system through a series of CMake macros that maintainers can use to generate acceleration kernels from their `CMakeLists.txt` files. + + .. _2.ii: + + ii. Must extend the generic ``ament_acceleration`` CMake macros to support the corresponding technology-specific frameworks and software platforms with appropriate defaults. + + +.. _Build Tools Enhancements: + +3. **Build Tools Enhancements:** + + .. _3.i: + + i. Must provide tools for hardware emulation (e.g. via QEMU) that emulate the hardware acceleration solution + + .. _3.ii: + + ii. Must provide tools for hardware emulation (e.g. via QEMU) that simulate the hardware acceleration solution (to speed-up development and facilitate debugging) + + .. _3.iii: + + iii. Must provide tools for managing (mount, umount, deploy, etc.) raw disk images or any other binary format necessary by the acceleration technology solution. + + .. _3.iv: + + iv. (*only applicable to edge/embedded computing targets*) Must provide tools for deploying custom Linux kernels, with at least two default options: a vanilla kernel and a fully preemptible one (`PREEMPT_RT` patches). + + .. _3.iv.a: + + a. *Must provide modern Linux kernels (5.4.0+)* + + .. _3.iv.b: + + b. *Must provide modern Linux LTS kernels (5.10.0)* + + + .. _3.v: + + v. (*only applicable to edge/embedded computing targets*) Must provide tools for deploying a hypervisor (e.g. Xen) for mixed critical applications in the resulting raw images. + + .. _3.v.a: + + a. *Must provide tools for enabling the deployment of hypervisor Virtual Machines (VMs) without a control domain (e.g. dom0less VMs in Xen)* + + .. _3.v.b: + + b. *Must provide tools for enabling the deployment of guest VMs (e.g. domUs in Xen) in a secondary non-volatile storage system (e.g. a raw image partition).* + + .. _3.v.c: + + c. *Must provide tools for enabling the deployment of the control domain VM (e.g. dom0 in Xen) in a secondary non-volatile storage system (e.g. a raw image partition).* + + .. _3.v.d: + + d. *Must provide tools for enabling the deployment VMs without a control domain (e.g. dom0less VMs in Xen) in a secondary non-volatile storage system (e.g. a raw image partition).* + + .. _3.vi: + + vi. (*only applicable to edge/embedded computing targets*) Must provide tools for network booting. + + .. _3.vi.a: + + a. *Must provide tools for network booting the kernel, device tree blob and boot script artifacts (by chainloading or similar)* + + .. _3.vi.b: + + b. *Must provide tools for network booting the root file system* + + .. _3.vi.c: + + c. *Must provide tools for network booting multiple embedded solutions (i.e. creating a proper folder structure to maintain boot artifacts)* + + .. _3.vi.d: + + d. *Must provide tools for network booting securely exchanging secure SSH keys to each one of embedded solutions* + + .. _3.vi.e: + + e. *Must provide tools for flashing into secondary storage all the network boot artifacts (kernel, device tree, boot scripts, rootfs, etc.). This should leave the embedded solution fully functional and aligned with the corresponding artifacts just flashed.* + + +.. _benchmarking_capability: + +4. **Benchmarking:** + + .. _4.i: + + i. Must provide tools to assess the relative performance of an acceleration kernel in an isolated manner + + .. _4.ii: + + ii. Must provide tools to assess the relative performance of an acceleration kernel alongside a ROS 2 computational graph by following benchmarking_ + + +.. _Documentation: + +5. **Documentation:** + + .. _5.i: + + i. Must have in-code documentation for each subverb "feature" (e.g. for ``list``: List supported firmware for hardware acceleration) + + + +.. _Testing and CI: + +6. **Testing and CI:** + + .. _6.i: + + i. Must run (and pass, often) CI tests for `acceleration examples`_. + + + +.. _categories: + +Categories +^^^^^^^^^^ + +There are 4 hardware acceleration categories below which will classify hardware acceleration solutions and technologies, each roughly summarized as: + +* Category **Official**: + + * Highest level of support, backed by a vendor + * Hardware acceleration solution compliant with this REP and fully integrated in the ROS 2 ecosystem. + * Developer tools available to facilitate the development process. + * All `acceleration examples`_ should run on the hardware acceleration solution. + * Acceleration kernels available for multiple `Kernel Levels`_, with at least `Level I kernels`_. + +* Category **Community**: + + * Community-level support + * Hardware acceleration solution compliant with only a subset of this REP. + * A subset of the developer tools available. + * Acceleration kernels available for at least `Level I kernels`_. + +* Category **Compatible**: + + * Interoperability demonstrated. + * Hardware acceleration solution compliant with a lesser subset of this REP. + * Some developer tools available. + * Acceleration kernels available for at least `Level I kernels`_. + +* Category **Non-compatible**: + + * Default category + +While each category will have different capabilities, it's always possible to overachieve in certain capabilities, even if other capabilities prevent a package from moving up to the next category level. + + +Category Comparison Chart +^^^^^^^^^^^^^^^^^^^^^^^^^ + +The chart below compares Quality Levels 1 through 5 relative to the 'Level 1' requirements' numbering scheme above. + +✓ = required + +● = recommended + +○ = optional + +.. list-table:: Categories + :widths: 7 7 7 7 7 + :header-rows: 1 + :stub-columns: 1 + :align: left + + + * - + - Official + - Community + - Compatible + - Non-compatible + * - Kernel Levels + - + - + - + - + * - `1.i`_ + - ✓ + - ✓ + - ✓ + - + * - 1.ii_ + - ● + - ● + - ○ + - + * - 1.iii_ + - ● + - ○ + - ○ + - + * - Build System + - + - + - + - + * - 2.i_ + - ✓ + - ✓ + - ✓ + - + * - 2.ii_ + - ● + - ● + - ○ + - + * - Build Tools + - + - + - + - + * - 3.i_ + - ● + - ● + - ○ + - + * - 3.ii_ + - ● + - ● + - ○ + - + * - 3.iii_ + - ✓ + - ✓ + - ● + - + * - 3.iv_ + - ✓ + - ✓ + - ○ + - + * - 3.iv.a_ + - ✓ + - ● + - ○ + - + * - 3.iv.b_ + - ● + - ● + - ○ + - + * - 3.v_ + - ✓ + - ● + - ○ + - + * - 3.v.a_ + - ✓ + - ● + - ○ + - + * - 3.v.b_ + - ✓ + - ● + - ○ + - + * - 3.v.c_ + - ✓ + - ● + - ○ + - + * - 3.v.d_ + - ✓ + - ● + - ○ + - + * - 3.vi_ + - ● + - ● + - ○ + - + * - 3.vi.a_ + - ● + - ● + - ○ + - + * - 3.vi.b_ + - ● + - ● + - ○ + - + * - 3.vi.c_ + - ● + - ● + - ○ + - + * - 3.vi.d_ + - ● + - ● + - ○ + - + * - 3.vi.e_ + - ● + - ● + - ○ + - + * - Benchmarking + - + - + - + - + * - 4.i_ + - ● + - ● + - ○ + - + * - 4.ii_ + - ● + - ● + - ○ + - + * - Documentation + - + - + - + - + * - 5.i_ + - ✓ + - ✓ + - ● + - + * - Testing and CI + - + - + - + - + * - 6.i_ + - ✓ + - ● + - ○ + - + + + +Reference Implementation and recommendations +============================================ + +Reference implementations complying with this REP and extending the ROS 2 build system and tools for hardware acceleration are available at the `Hardware Accelerationg WG GitHub organization `_. This includes also the abstraction layer ``ament_acceleration`` and firmware from vendor specializalizations like `ament_vitis `_. + +For additional implementations and recommendations, check out the `Hardware Accelerationg WG GitHub organization `_. + + + +.. _labels for maintainers: + +Labels for maintainers +^^^^^^^^^^^^^^^^^^^^^^ +Maintainers are encouraged to mark their packages including hardware acceleration with a label that indicates so, for each one of the hardware acceleration technology solutions they support. This REP proposes that such indication takes the following shape: + +.. image:: https://img.shields.io/badge/hardware_acceleration-KV260-ec1c24.svg + +which can be created with: + +:: + + https://img.shields.io/badge/hardware_acceleration--ec1c24.svg + + e.g. to deliver the result above + https://img.shields.io/badge/hardware_acceleration-KV260-ec1c24.svg + + +Template for Maintainers +^^^^^^^^^^^^^^^^^^^^^^^^ +Besides indicating the acceleration solutions supported by a given package through `labels for maintainers`_, maintainers are encouraged to also add additional information of the acceleration kernels the package to provide consumers with a quick intuition of the added value of using hardware acceleration. + +The following **Markdown** table provides an example of such additional information for the `accelerated_vadd_publisher `_ that ships within `acceleration examples`_. + + +:: + + ### Hardware acceleration + + | Kernel | Description | Acceleration factor | Technology | CPU baseline | Acceleration measurement | + |------------|-------------|-----------------------|------------|--------------|--------------------------| + | [`vadd`](https://github.com/ros-acceleration/acceleration_examples/blob/main/accelerated_vadd_publisher/src/vadd.cpp) | An offloaded version of the trivial [vadd_publisher](https://github.com/ros-acceleration/acceleration_examples/tree/main/vadd_publisher) ROS 2 publisher which adds two inputs to a vector in a loop and publishes them at 10 Hz. Vector add operations are offloaded into to the FPGA. The offloading operation into the FPGA allows the publisher to go from 2 Hz to 6 Hz but still misses its target (10 Hz) | 3 | KV260 | 2 Hz (measured with `ros2 topic hz` | 6 Hz `ros2 topic hz` | + + +In case it's a meta-package, and there're several packages with one or multipole kernels, the following format can be used instead: + +:: + + ### Hardware acceleration + + | Package | Kernel | Description | Acceleration factor | Technology | CPU baseline | Acceleration measurement | + |---------|------------|-------------|-----------------------|------------|--------------|--------------------------| + | [accelerated_vadd_publisher](https://github.com/ros-acceleration/acceleration_examples/tree/main/accelerated_vadd_publisher) | [`vadd`](https://github.com/ros-acceleration/acceleration_examples/blob/main/accelerated_vadd_publisher/src/vadd.cpp) | An offloaded version of the trivial [vadd_publisher](https://github.com/ros-acceleration/acceleration_examples/tree/main/vadd_publisher) ROS 2 publisher which adds two inputs to a vector in a loop and publishes them at 10 Hz. Vector add operations are offloaded into to the FPGA. The offloading operation into the FPGA allows the publisher to go from 2 Hz to 6 Hz but still misses its target (10 Hz) | 3 | KV260 | 2 Hz (measured with `ros2 topic hz` | 6 Hz `ros2 topic hz` | + | [faster_vadd_publisher](https://github.com/ros-acceleration/acceleration_examples/tree/main/faster_vadd_publisher) | [`vadd`](https://github.com/ros-acceleration/acceleration_examples/blob/main/faster_vadd_publisher/src/vadd.cpp) | An accelerated version of the trivial [vadd_publisher](https://github.com/ros-acceleration/acceleration_examples/tree/main/vadd_publisher) ROS 2 publisher which adds two inputs to a vector in a loop and publishes them at 10 Hz. Vector add operations are offloaded into to the FPGA. The acceleration and optimizations of the dataflow allows the publisher to go from 2 Hz to 10 Hz, meeting its target | 5 | KV260 | 2 Hz (measured with `ros2 topic hz` | 10 Hz `ros2 topic hz` | + + +This REP encourages maintainers to report the impact of hardware acceleration. Each maintainer is free to customize the reporting template. + +Template for Vendors +^^^^^^^^^^^^^^^^^^^^ +Silicon vendors and solution manufacturers can help set the expectations of the level of support their hardware acceleration technology provides in alignment with this REP by providing a template in the README.md files of their ``firmware`` and/or ``ament`` extensions. Doing so will facilitate the process for consumers and maintainers. + +The following template provides an example in **Markdown** syntax: + +:: + + | Capability | Status | + |------------|--------| + | **`1.` Kernel Levels** | | + | [`1.i` level I kernels](https://ros.org/reps/rep-2008.html#i) | ✓ | + | [`1.ii` level II kernels](https://ros.org/reps/rep-2008.html#ii) | :warning: (see [this](https://github.com/Lien182/ReconROS)) | + | [`1.iii` level III kernels](https://ros.org/reps/rep-2008.html#iii) | | + | **`2.` Build System** | | + | [`2.i` ament extensions](https://ros.org/reps/rep-2008.html#id13) | ✓ | + | [`2.ii` `ament_acceleration` support](https://ros.org/reps/rep-2008.html#id14) | | + | **`3`. Build Tools** | | + | [`3.i` hardware emulation (`hw_emu`) ](https://ros.org/reps/rep-2008.html#id15) | | + | [`3.ii` hardware emulation (`sw_emu`)](https://ros.org/reps/rep-2008.html#id16) | ✓ | + | [`3.iii` image tooling](https://ros.org/reps/rep-2008.html#id17) | ✓ | + | [`3.iv` Linux kernel ](https://ros.org/reps/rep-2008.html#iv) | ✓ | + | [`3.iv.a` modern Linux kernel](https://ros.org/reps/rep-2008.html#iv-a) | ✓ | + | [`3.iv.b` LTS Linux kernel](https://ros.org/reps/rep-2008.html#iv-b) | | + | [`3.v` hypervisor ](https://ros.org/reps/rep-2008.html#v) | ✓ | + | [`3.v.a` no control domain VMs](https://ros.org/reps/rep-2008.html#v-a) | ✓ | + | [`3.v.b` guest VMs in disk](https://ros.org/reps/rep-2008.html#v-b) | ✓ | + | [`3.v.c` control domain in disk](https://ros.org/reps/rep-2008.html#v-c) | ✓ | + | [`3.v.d` no control domain VMs in disk](https://ros.org/reps/rep-2008.html#v-d) | | + | [`3.vi` network booting ](https://ros.org/reps/rep-2008.html#vi) | | + | [`3.vi.a` boot artifacts ](https://ros.org/reps/rep-2008.html#vi-a) | | + | [`3.vi.b` rootfs ](https://ros.org/reps/rep-2008.html#vi-b) | | + | [`3.vi.c` multi-network boot](https://ros.org/reps/rep-2008.html#vi-c) | | + | [`3.vi.d` secure network booting](https://ros.org/reps/rep-2008.html#vi-d) | | + | [`3.vi.e` save in disk network boot](https://ros.org/reps/rep-2008.html#vi-e) | | + | **`4.` Benchmarking** | | + | [`4.i` kernel benchmarking](https://ros.org/reps/rep-2008.html#id18) | ✓ | + | [`4.ii` ROS 2 acceleration benchmarking](https://ros.org/reps/rep-2008.html#id19) | | + | **`5.` Documentation** | | + | [`5.i` in-code documentation](https://ros.org/reps/rep-2008.html#id20) | ✓ | + | **`6.` Testing and CI** | | + | [`6.i` `acceleration_examples` ](https://ros.org/reps/rep-2008.html#id21) | ✓ | + + + + + +References and Footnotes +======================== + +.. [1] Z. Wan, B. Yu, T. Y. Li, J. Tang, Y. Zhu, Y. Wang, A. Raychowdhury, and S. Liu, “A survey of fpga-based robotic computing,” IEEE Circuits and Systems Magazine, vol. 21, no. 2, pp. 48–74, 2021. + + +.. [2] OpenCL 1.2 API and C Language Specification (November 14, 2012). + https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf + + +.. [3] OpenCL 1.2 Reference Pages. + https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/ + + +.. [4] Y. Yang and T. Azumi, “Exploring real-time executor on ros 2,” in 2020 IEEE International Conference on Embedded Software and Systems (ICESS). IEEE, 2020, pp. 1–8. + + +.. [5] C. Lienen and M. Platzner, “Design of distributed reconfigurable robotics systems with reconros,” 2021. + https://arxiv.org/pdf/2107.07208.pdf + + +.. [6] A. Pemmaiah​, D. Pangercic, D. Aggarwal, K. Neumann, K. Marcey, "Performance Testing in ROS 2". + https://drive.google.com/file/d/15nX80RK6aS8abZvQAOnMNUEgh7px9V5S/view + + + +Copyright +========= + +This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive. From 4126aa26acd02bd369d6f22c17308f7e8f85e5a7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=ADctor=20Mayoral=20Vilches?= Date: Thu, 2 Sep 2021 12:52:21 +0200 Subject: [PATCH 02/24] Update rep-2008.rst --- rep-2008.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/rep-2008.rst b/rep-2008.rst index 61ea72adb..c47bb84c9 100644 --- a/rep-2008.rst +++ b/rep-2008.rst @@ -95,7 +95,6 @@ The first pillar represents extensions of the ``ament`` ROS 2 build system. Thes FILE src/vadd.cpp INCLUDE include - TARGET kv260 ) From 86f2d7736c498da627c711da7547f002d87e559e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=ADctor=20Mayoral=20Vilches?= Date: Fri, 1 Oct 2021 10:08:10 +0200 Subject: [PATCH 03/24] Update rep-2008.rst Co-authored-by: Greg Balke <11024792+gbalke@users.noreply.github.com> --- rep-2008.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rep-2008.rst b/rep-2008.rst index c47bb84c9..813f10b4f 100644 --- a/rep-2008.rst +++ b/rep-2008.rst @@ -162,7 +162,7 @@ The third pillar is firmware. Represented by the abstract ``acceleration_firmwar Each ROS 2 workspace can have one or multiple firmware packages deployed. The selection of the active firmware in the workspace is performed by the ``colcon acceleration select`` subverb (pillarII_). To get a technology solution aligned with this REP's architecture, each vendor should provide and maintain an ``acceleration_firmware_`` package specialization that delivers the corresponding artifacts in line with its supported categories_ and capabilities_. Firmware artifacts should be deployed at ``/acceleration/firmware/`` and be ready to be used by the ROS 2 build system extensions at (pillarI_) . For a reference implementation of specialized vendor firmware package, see `acceleration_firmware_kv260 `_. -By splitting vendors across packages with a common front-end (``acceleration_firmware``), consumers and maintainers can easily switch between hardware acceleration solutions and +By splitting vendors across packages with a common front-end (``acceleration_firmware``), consumers and maintainers can easily switch between hardware acceleration solutions. From 24bb8e4839427534997e95fa81d1438702b6abb5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=ADctor=20Mayoral=20Vilches?= Date: Thu, 31 Mar 2022 10:47:30 +0200 Subject: [PATCH 04/24] Add VSCode meta-data to .gitignore MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Víctor Mayoral Vilches --- .gitignore | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/.gitignore b/.gitignore index 4c3bc3c4d..0563bb60d 100644 --- a/.gitignore +++ b/.gitignore @@ -1,4 +1,5 @@ rep-0000.rst *.html *.pyc -*~ \ No newline at end of file +*~ +.vscode \ No newline at end of file From 611d0e37d2eb72c114d58671bc960c42d0a1964c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=ADctor=20Mayoral=20Vilches?= Date: Thu, 31 Mar 2022 10:51:17 +0200 Subject: [PATCH 05/24] Remove acceleration_firmware common firmware front-end MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Refer to https://github.com/ros-acceleration/acceleration_firmware/issues/1. Shortly, the rationale of having was to provide a mechanism to splitting vendors across firmware packages while keeping a common front-end. After looking at it critically, there's currently no strong argument to keep this abstract layer at this level since this same behavior can be obtained through colcon-hardware-acceleration extensions so for now, we've decided to remove this layer from the architecture. In the future, if needed, acceleration_firmware will be re-added and should include vendor-common scripts to deploy and manage firmware Signed-off-by: Víctor Mayoral Vilches --- rep-2008.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/rep-2008.rst b/rep-2008.rst index 61ea72adb..97f38c2ab 100644 --- a/rep-2008.rst +++ b/rep-2008.rst @@ -67,9 +67,9 @@ Architecture pillars | user land | +-------------------+-----------+----------------+--------------------------------+ | | | Drivers | Libraries | Cloud | +-------------+ +--------------+----+---------+-----------------------------------+---------------+ - | | | ament_sol1 | ament_soln | | accel_fw_sol1 | accel_fw_soln | - | | +--------------+--------------+------------------+----------------+---------------+ - | tooling | | ament_acceleration | colcon_acceleration | acceleration_firmware | + | | | ament_sol1 | ament_soln | | | | + | | +--------------+--------------+------------------+ accel_fw_sol1 | accel_fw_soln + + | tooling | | ament_acceleration | colcon_acceleration | | | | | +---------------------------------------------------------------------------------+ | | | build system | meta build | firmware | +-------------+ +--------------------------+---------------------+--------------------------------+ @@ -159,11 +159,11 @@ In turn, the list of subverbs will improve and grow to cover other technology so Pillar III - firmware ^^^^^^^^^^^^^^^^^^^^^ -The third pillar is firmware. Represented by the abstract ``acceleration_firmware`` package, it is meant to provide firmware artifacts for each supported technology so that the kernels can be compiled against them, simplifying the process for consumers and maintainers, and further aligning with the ROS typical development flow. +The third pillar is firmware, it is meant to provide firmware artifacts for each supported technology so that the kernels can be compiled against them, simplifying the process for consumers and maintainers, and further aligning with the ROS typical development flow. Each ROS 2 workspace can have one or multiple firmware packages deployed. The selection of the active firmware in the workspace is performed by the ``colcon acceleration select`` subverb (pillarII_). To get a technology solution aligned with this REP's architecture, each vendor should provide and maintain an ``acceleration_firmware_`` package specialization that delivers the corresponding artifacts in line with its supported categories_ and capabilities_. Firmware artifacts should be deployed at ``/acceleration/firmware/`` and be ready to be used by the ROS 2 build system extensions at (pillarI_) . For a reference implementation of specialized vendor firmware package, see `acceleration_firmware_kv260 `_. -By splitting vendors across packages with a common front-end (``acceleration_firmware``), consumers and maintainers can easily switch between hardware acceleration solutions and +By splitting vendors across packages, consumers and maintainers can easily switch between hardware acceleration solutions. From e589cf77d6f472a039b3d9f060b73ef71e51128a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=ADctor=20Mayoral=20Vilches?= Date: Thu, 31 Mar 2022 11:55:49 +0200 Subject: [PATCH 06/24] Add methodology for ROS 2 hw acceleration, fixes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit These changes introduce in REP-2008 a proposed methodology for ROS 2 hardware acceleration and provides 2 use cases wherein it's demonstrated using the ROS Perception stack. Minor changes and refines in the content also added Signed-off-by: Víctor Mayoral Vilches --- rep-2008.rst | 177 ++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 124 insertions(+), 53 deletions(-) diff --git a/rep-2008.rst b/rep-2008.rst index 97f38c2ab..a24cd88ae 100644 --- a/rep-2008.rst +++ b/rep-2008.rst @@ -21,13 +21,13 @@ With the decline of Moore’s Law, hardware acceleration has proven itself as th The purpose of this REP is to provide standard guidelines for how to use hardware acceleration in combination with ROS 2. These guidelines are realized in the form of a series of ROS 2 packages that integrate these external resources and provide a ROS 2-centric open architecture for hardware acceleration. -The architecture proposed extends the ROS 2 build system (``ament``), the ROS 2 build tools (``colcon``) and add a new firmware pillar to simplify the production and deployment of acceleration kernels. The architecture is agnostic to the computing target (i.e. considers support for edge, workstation, data center or cloud targets), technology-agnostic (considers initial support for FPGAs and GPUs), application-agnostic and modular, which enhances portability to new hardware solutions and other silicon vendors. The core components of the architecture are disclosed under an Apache 2.0 license, available and maintained at the `ROS 2 Hardware Acceleration Working Group GitHub organization `_. +The architecture proposed extends the ROS 2 build system (``ament``), the ROS 2 build tools (``colcon``) and add a new firmware pillar to simplify the production and deployment of acceleration kernels. The architecture is agnostic to the computing target (i.e. considers support for edge, workstation, data center or cloud targets), technology-agnostic (considers initial support for FPGAs and GPUs), application-agnostic and modular, which enhances portability to new hardware solutions and other silicon vendors. The core components of the architecture are disclosed under an Apache 2.0 license, available and maintained at the `ROS 2 Hardware Acceleration Working Group GitHub organization `_. Value for stakeholders: - Package maintainers can use these guidelines to integrate hardware acceleration capabilities in their ROS 2 packages. -- Consumers can use the guidelines in the REP, as well as the corresponding category of each hardware solution, to set expectations on the hardware acceleration capabilities that could be obtained from each vendor's hardware solution. +- Consumers can use the guidelines in the REP, as well as the corresponding category of each hardware solution, to set expectations on the hardware acceleration capabilities that could be obtained from each vendor's hardware solution. - Silicon vendors and solution manufacturers can use these guidelines to connect their firmware and technologies to the ROS 2 ecosystem, obtaining direct support for hardware acceleration in all ROS 2 packages that support it. @@ -47,7 +47,7 @@ The motivation behind this REP does not include: * Policy requirements (which framework to use, specific C++ preprocessor instructions, etc.) are *intentionally generic* and to remain technology-agnostic (from a hardware acceleration perspective). * Maintainers can come up with their own policies depending on the technology targeted (FPGAs, GPUs, etc.) and framework used (``HLS``, ``ROCm``, ``CUDA``, etc). - + * Enforcing specific hardware solutions on maintainers @@ -58,7 +58,7 @@ The motivation behind this REP does not include: Architecture pillars ==================== -:: +:: ROS 2 stack Hardware Acceleration Architecture @ ROS 2 stack @@ -79,7 +79,7 @@ Architecture pillars +-------------+ | | | | rmw_adapter | Pillar I Pillar II Pillar III +-------------+ - + .. _pillarI: @@ -99,7 +99,7 @@ The first pillar represents extensions of the ``ament`` ROS 2 build system. Thes ) -Under the hood, each specialization of ``ament_acceleration`` should rely on the corresponding technology libraries to enable it. For example, `ament_vitis `_ relies on Vitis Unified Software Platform and on the Xilinx Runtime (XRT) library to generate acceleration kernels and facilitate OpenCL communication between the application code and the kernels. Vitis and XRT are completely hidden from the robotics engineer, simplifying the creation of kernels through simple CMake macros. The same kernel expressed with ``ament_vitis`` is presented below: +Under the hood, each specialization of ``ament_acceleration`` should rely on the corresponding technology libraries to enable it. For example, `ament_vitis `_ relies on Vitis Unified Software Platform and on the Xilinx Runtime (XRT) library to generate acceleration kernels and facilitate OpenCL communication between the application code and the kernels. Vitis and XRT are completely hidden from the robotics engineer, simplifying the creation of kernels through simple CMake macros. The same kernel expressed with ``ament_vitis`` is presented below: :: @@ -159,9 +159,9 @@ In turn, the list of subverbs will improve and grow to cover other technology so Pillar III - firmware ^^^^^^^^^^^^^^^^^^^^^ -The third pillar is firmware, it is meant to provide firmware artifacts for each supported technology so that the kernels can be compiled against them, simplifying the process for consumers and maintainers, and further aligning with the ROS typical development flow. +The third pillar is firmware, it is meant to provide firmware artifacts for each supported technology so that the kernels can be compiled against them, simplifying the process for consumers and maintainers, and further aligning with the ROS typical development flow. -Each ROS 2 workspace can have one or multiple firmware packages deployed. The selection of the active firmware in the workspace is performed by the ``colcon acceleration select`` subverb (pillarII_). To get a technology solution aligned with this REP's architecture, each vendor should provide and maintain an ``acceleration_firmware_`` package specialization that delivers the corresponding artifacts in line with its supported categories_ and capabilities_. Firmware artifacts should be deployed at ``/acceleration/firmware/`` and be ready to be used by the ROS 2 build system extensions at (pillarI_) . For a reference implementation of specialized vendor firmware package, see `acceleration_firmware_kv260 `_. +Each ROS 2 workspace can have one or multiple firmware packages deployed. The selection of the active firmware in the workspace is performed by the ``colcon acceleration select`` subverb (pillarII_). To get a technology solution aligned with this REP's architecture, each vendor should provide and maintain an ``acceleration_firmware_`` package specialization that delivers the corresponding artifacts in line with its supported categories_ and capabilities_. Firmware artifacts should be deployed at ``/acceleration/firmware/`` and be ready to be used by the ROS 2 build system extensions at (pillarI_) . For a reference implementation of specialized vendor firmware package, see `acceleration_firmware_kv260 `_. By splitting vendors across packages, consumers and maintainers can easily switch between hardware acceleration solutions. @@ -254,7 +254,7 @@ In addition, the following aspects should be considered when benchmarking accele - `automated`: benchmarks and related source code should be designed with automation in mind. Once ready, creating a benchmark and producing results should be (ideally) a fully automated process. - `hardware farm mindset`: benchmarks will be conducted on hardware embedded platforms sitting in a farm-like environment (redundancy of tests, multiple SoCs/boards) with the intent of validating and comparing different technologies. -Accounting for all of this, in this REP, we adopt a grey-box and non-functional benchmarking approach for hardware acceleration that allows to evaluate the relative performance of accelerated ROS 2 individual nodes and complete computational graphs. To realize it in a technology agnostic-manner, we select the Linux Tracing Toolkit next generation (`LTTng `_) which will be used for tracing and benchmarking. +Accounting for all of this, in this REP, we adopt a grey-box and non-functional benchmarking approach for hardware acceleration that allows to evaluate the relative performance of accelerated ROS 2 individual nodes and complete computational graphs. To realize it in a technology agnostic-manner, we select the Linux Tracing Toolkit next generation (`LTTng `_) which will be used for tracing and benchmarking. Differences between tracing and benchmarking ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -267,6 +267,67 @@ Tracing and benchmarking can be defined as follows: From these definitions, inherently one can determine that both benchmarking and tracing are connected in the sense that the test/benchmark will use a series of measurements for comparison. These measurements will come from tracing probes. In other words, tracing will collect data that will then be fed into a benchmark program for comparison. +Methodology for ROS 2 Hardware Acceleration +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +:: + + rebuild + + +---------------+ + | | + | | + |4. benchmark +--+ + | acceleration| | + +--> | | + | +---------------+ | + | | acceleration + | | tracing + trace dataflow | | + +--------------+ | +---------------+ | + | | +---+ +<+ + +------------v---+ +--------v-------+ | | + | | | | | | + | 3.2 accelerate | | 3.1 accelerate <---------> 3. hardware | + | graph | | nodes | trace | acceleration | + | | | | nodes | <-+ + +----------------+ +----------------+ | | | + +---------------+ | + | + | CPU + | tracing + +--------------+ | + +----------------+ rebuild | | | + | +----------> | | + start +----------> 1. trace graph | | 2. benchmark +--+ + | | | CPU | + +----+------^--^-+ | | + | | | +-------+------+ + | | | | + +------+ | | + LTTng +--------------------+ + re-instrument + + + + + + + +The following proposes a methodology to analyze a ROS 2 application and design appropriate acceleration: + +1. instrument both the core components of ROS 2 and the target kernels using `LTTng `_. Refer to `ros2_tracing `_ for tools, documentation and ROS 2 core layers tracepoints; +2. trace and benchmark the kernels on the CPU to establish a compute baseline; +3. develop a hardware accelerated implementation on alternate hardware (e.g., GPU, FPGA, etc): + + - **3.1** accelerate computations at the Node or Component level for each one of those identified in **2.** as good candidates. + - **3.2** accelerate inter-Node exchanges and reduce the overhead of the ROS 2 message-passing system across all its abstraction layers. + +4. trace, benchmark against the CPU baseline, and improve the accelerated implementation. + +The proposed ROS 2 methodology for hardware acceleration is demonstrated in [7]_ and [8]_. + + .. _acceleration examples: Acceleration examples @@ -289,15 +350,15 @@ The following list describes the hardware acceleration capabilities that hardwar .. _1.i: - i. Must provide at least one package producing `Level I kernels`_ + i. Must provide at least one package producing `Level I kernels`_ .. _1.ii: - ii. Must provide at least one package producing `Level II kernels`_ + ii. Must provide at least one package producing `Level II kernels`_ .. _1.iii: - iii. Must provide at least one package producing `Level III kernels`_ + iii. Must provide at least one package producing `Level III kernels`_ .. _Build System Enhancements: @@ -386,7 +447,7 @@ The following list describes the hardware acceleration capabilities that hardwar e. *Must provide tools for flashing into secondary storage all the network boot artifacts (kernel, device tree, boot scripts, rootfs, etc.). This should leave the embedded solution fully functional and aligned with the corresponding artifacts just flashed.* - + .. _benchmarking_capability: 4. **Benchmarking:** @@ -394,7 +455,7 @@ The following list describes the hardware acceleration capabilities that hardwar .. _4.i: i. Must provide tools to assess the relative performance of an acceleration kernel in an isolated manner - + .. _4.ii: ii. Must provide tools to assess the relative performance of an acceleration kernel alongside a ROS 2 computational graph by following benchmarking_ @@ -430,7 +491,7 @@ There are 4 hardware acceleration categories below which will classify hardware * Category **Official**: * Highest level of support, backed by a vendor - * Hardware acceleration solution compliant with this REP and fully integrated in the ROS 2 ecosystem. + * Hardware acceleration solution compliant with this REP and fully integrated in the ROS 2 ecosystem. * Developer tools available to facilitate the development process. * All `acceleration examples`_ should run on the hardware acceleration solution. * Acceleration kernels available for multiple `Kernel Levels`_, with at least `Level I kernels`_. @@ -480,15 +541,15 @@ The chart below compares Quality Levels 1 through 5 relative to the 'Level 1' re - Compatible - Non-compatible * - Kernel Levels - - - - - - - - + - + - + - + - * - `1.i`_ - ✓ - ✓ - ✓ - - + - * - 1.ii_ - ● - ● @@ -500,25 +561,25 @@ The chart below compares Quality Levels 1 through 5 relative to the 'Level 1' re - ○ - * - Build System - - - - - - - - + - + - + - + - * - 2.i_ - ✓ - ✓ - ✓ - - + - * - 2.ii_ - ● - ● - ○ - * - Build Tools - - - - - - - - + - + - + - + - * - 3.i_ - ● - ● @@ -533,7 +594,7 @@ The chart below compares Quality Levels 1 through 5 relative to the 'Level 1' re - ✓ - ✓ - ● - - + - * - 3.iv_ - ✓ - ✓ @@ -605,10 +666,10 @@ The chart below compares Quality Levels 1 through 5 relative to the 'Level 1' re - ○ - * - Benchmarking - - - - - - - - + - + - + - + - * - 4.i_ - ● - ● @@ -620,25 +681,25 @@ The chart below compares Quality Levels 1 through 5 relative to the 'Level 1' re - ○ - * - Documentation - - - - - - - - + - + - + - + - * - 5.i_ - ✓ - ✓ - ● - - + - * - Testing and CI - - - - - - - - + - + - + - + - * - 6.i_ - ✓ - ● - ○ - - + - @@ -665,19 +726,19 @@ which can be created with: https://img.shields.io/badge/hardware_acceleration--ec1c24.svg - e.g. to deliver the result above - https://img.shields.io/badge/hardware_acceleration-KV260-ec1c24.svg + e.g. to deliver the result above + https://img.shields.io/badge/hardware_acceleration-KV260-ec1c24.svg Template for Maintainers ^^^^^^^^^^^^^^^^^^^^^^^^ -Besides indicating the acceleration solutions supported by a given package through `labels for maintainers`_, maintainers are encouraged to also add additional information of the acceleration kernels the package to provide consumers with a quick intuition of the added value of using hardware acceleration. +Besides indicating the acceleration solutions supported by a given package through `labels for maintainers`_, maintainers are encouraged to also add additional information of the acceleration kernels the package to provide consumers with a quick intuition of the added value of using hardware acceleration. The following **Markdown** table provides an example of such additional information for the `accelerated_vadd_publisher `_ that ships within `acceleration examples`_. :: - + ### Hardware acceleration | Kernel | Description | Acceleration factor | Technology | CPU baseline | Acceleration measurement | @@ -707,7 +768,7 @@ The following template provides an example in **Markdown** syntax: :: - | Capability | Status | + | Capability | Status | |------------|--------| | **`1.` Kernel Levels** | | | [`1.i` level I kernels](https://ros.org/reps/rep-2008.html#i) | ✓ | @@ -749,10 +810,11 @@ The following template provides an example in **Markdown** syntax: References and Footnotes ======================== -.. [1] Z. Wan, B. Yu, T. Y. Li, J. Tang, Y. Zhu, Y. Wang, A. Raychowdhury, and S. Liu, “A survey of fpga-based robotic computing,” IEEE Circuits and Systems Magazine, vol. 21, no. 2, pp. 48–74, 2021. +.. [1] Z. Wan, B. Yu, T. Y. Li, J. Tang, Y. Zhu, Y. Wang, A. Raychowdhury, and S. Liu, “A survey of fpga-based robotic computing,” + IEEE Circuits and Systems Magazine, vol. 21, no. 2, pp. 48–74, 2021. -.. [2] OpenCL 1.2 API and C Language Specification (November 14, 2012). +.. [2] OpenCL 1.2 API and C Language Specification (November 14, 2012). https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf @@ -760,17 +822,26 @@ References and Footnotes https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/ -.. [4] Y. Yang and T. Azumi, “Exploring real-time executor on ros 2,” in 2020 IEEE International Conference on Embedded Software and Systems (ICESS). IEEE, 2020, pp. 1–8. +.. [4] Y. Yang and T. Azumi, “Exploring real-time executor on ros 2,”. + 2020 IEEE International Conference on Embedded Software and Systems (ICESS). IEEE, 2020, pp. 1–8. .. [5] C. Lienen and M. Platzner, “Design of distributed reconfigurable robotics systems with reconros,” 2021. https://arxiv.org/pdf/2107.07208.pdf -.. [6] A. Pemmaiah​, D. Pangercic, D. Aggarwal, K. Neumann, K. Marcey, "Performance Testing in ROS 2". +.. [6] A. Pemmaiah​, D. Pangercic, D. Aggarwal, K. Neumann, K. Marcey, "Performance Testing in ROS 2". https://drive.google.com/file/d/15nX80RK6aS8abZvQAOnMNUEgh7px9V5S/view +.. [7] "Methodology for ROS 2 Hardware Acceleration". ros-acceleration/community #20. ROS 2 Hardware Acceleration Working Group. + https://github.com/ros-acceleration/community/issues/20 + + +.. [8] Acceleration Robotics, "Hardware accelerated ROS 2 pipelines and towards the Robotic Processing Unit (RPU)". + https://news.accelerationrobotics.com/hardware-accelerated-ros2-pipelines/ + + Copyright ========= From be36b47cfed44218cd956490d1ff9d0cd93f5123 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=ADctor=20Mayoral=20Vilches?= Date: Thu, 31 Mar 2022 12:05:39 +0200 Subject: [PATCH 07/24] Style adjustment in figure for acceleleration methodology MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Víctor Mayoral Vilches --- rep-2008.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/rep-2008.rst b/rep-2008.rst index 214fa8509..33b5c38c6 100644 --- a/rep-2008.rst +++ b/rep-2008.rst @@ -282,9 +282,9 @@ Methodology for ROS 2 Hardware Acceleration | +---------------+ | | | acceleration | | tracing - trace dataflow | | - +--------------+ | +---------------+ | - | | +---+ +<+ + trace dataflow | | + +--------------+ | +---------------+ | + | | +---+ +<+ +------------v---+ +--------v-------+ | | | | | | | | | 3.2 accelerate | | 3.1 accelerate <---------> 3. hardware | From f8af0e471c4e76244fd30d8314e9fcbde52b806f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=ADctor=20Mayoral=20Vilches?= Date: Mon, 18 Apr 2022 09:54:16 +0200 Subject: [PATCH 08/24] Add pillar IV, cloud extensions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Added cloud extensions as a pillar wihch helps robotic architects bridge the gap and simplify the use of hardware acceleration (GPUs, FPGAs, etc.) in the cloud for ROS allowing to easily build kernels that target cloud instances while aligning to the standard ROS 2 launch system. Signed-off-by: Víctor Mayoral Vilches --- rep-2008.rst | 64 ++++++++++++++++++++++++++++++++-------------------- 1 file changed, 40 insertions(+), 24 deletions(-) diff --git a/rep-2008.rst b/rep-2008.rst index 33b5c38c6..5ccc82c2a 100644 --- a/rep-2008.rst +++ b/rep-2008.rst @@ -1,11 +1,11 @@ REP: 2008 Title: ROS 2 Hardware Acceleration Architecture and Conventions -Author: Víctor Mayoral-Vilches +Author: Víctor Mayoral-Vilches Status: Active Type: Informational Content-Type: text/x-rst Created: 10-Aug-2021 -Post-History: 10-Aug-2021 +Post-History: 18-April-2022 Abstract @@ -17,7 +17,7 @@ This REP describes the architectural pillars and conventions required to introdu Motivation ========== -With the decline of Moore’s Law, hardware acceleration has proven itself as the answer for achieving higher performance gains in robotics [1]_. By creating specialized compute architectures that rely on specific hardware (i.e. through FPGAs or GPUs), hardware acceleration empowers faster robots, with reduced computation times (real fast, as opposed to real-time), lower power consumption and more deterministic behaviours. The core idea is that instead of following the traditional control-driven approach for software development in robotics, a mixed control- and data-driven one allows to design custom compute architectures that further exploit parallelism. To do so, we need to integrate in the ROS 2 ecosystem external frameworks, tools and libraries that facilitate creating parallel compute architectures. +With the decline of Moore’s Law, hardware acceleration has proven itself as the answer for achieving higher performance gains in robotics [1]_. By creating specialized compute architectures that rely on specific hardware (i.e. through FPGAs or GPUs), hardware acceleration empowers faster robots, with reduced computation times (real fast, as opposed to real-time), lower power consumption and more deterministic behaviours [9]_ [10]_. The core idea is that instead of following the traditional control-driven approach for software development in robotics, a mixed control- and data-driven one allows to design custom compute architectures that further exploit parallelism. To do so, we need to integrate in the ROS 2 ecosystem external frameworks, tools and libraries that facilitate creating parallel compute architectures. The purpose of this REP is to provide standard guidelines for how to use hardware acceleration in combination with ROS 2. These guidelines are realized in the form of a series of ROS 2 packages that integrate these external resources and provide a ROS 2-centric open architecture for hardware acceleration. @@ -32,7 +32,7 @@ Value for stakeholders: - Silicon vendors and solution manufacturers can use these guidelines to connect their firmware and technologies to the ROS 2 ecosystem, obtaining direct support for hardware acceleration in all ROS 2 packages that support it. -The outcome of this REP should be that maintainers who want to leverage hardware acceleration in their packages, can do so with consistent guidelines and with support across multiple technologies (FPGAs and GPUs) by following the conventions set. This way, maintainers will be able to create ROS 2 packages with support for hardware acceleration that can run across hardware acceleration technologies, including FPGAs and GPUs. +The outcome of this REP should be that maintainers who want to leverage hardware acceleration in their packages, can do so with consistent guidelines and with support across multiple technologies (FPGAs and GPUs initially, but extending towards more accelerators in the future) by following the conventions set. This way, maintainers will be able to create ROS 2 packages with support for hardware acceleration that can run across hardware acceleration technologies, including FPGAs and GPUs. In turn, the documentation of categories and hardware acceleration capabilities will improve. The guidelines in here provide a ROS 2-centric open architecture for hardware acceleration, which silicon vendors can decide to adopt when engaging with the ROS 2 community. @@ -60,26 +60,25 @@ Architecture pillars :: - ROS 2 stack Hardware Acceleration Architecture @ ROS 2 stack - - +-------------+ +---------------------------+ - | | | acceleration_examples | - | user land | +-------------------+-----------+----------------+--------------------------------+ - | | | Drivers | Libraries | Cloud | - +-------------+ +--------------+----+---------+-----------------------------------+---------------+ - | | | ament_sol1 | ament_soln | | | | - | | +--------------+--------------+------------------+ accel_fw_sol1 | accel_fw_soln + - | tooling | | ament_acceleration | colcon_acceleration | | | - | | +---------------------------------------------------------------------------------+ - | | | build system | meta build | firmware | - +-------------+ +--------------------------+---------------------+--------------------------------+ - | rcl | ^ ^ ^ - +-------------+ | | | - | rmw | | | | - +-------------+ | | | - | rmw_adapter | Pillar I Pillar II Pillar III - +-------------+ - + ROS 2 stack Hardware Acceleration Architecture @ ROS 2 stack + + +-----------+ +---------------------------+ + | | | acceleration_examples | + |user land | +-----------------+------------------+--+-----------------+ + | | | Drivers | Libraries | Firmware |Cloud | + +-----------+ +-----------------+-+-------------------------+-----------+ + | | | ament_1| ament_2 | | | | | + | | +---------------------------------------+ fw_1|fw_2|cloud1| + | tooling | | ament_acceleration|colcon_acceleration| | | | + | | +---------------------------------------------+-----------+ + | | | build system | meta build | firmware | cloud| + +-----------+ +--------+----------+-------+-----------+-+--------+--+---+ + | rcl | | | | | + +-----------+ | | | | + | rmw | | | | | + +-----------+ + + + + + |rmw_adapter| Pillar I Pillar II Pillar III Pillar IV + +-----------+ .. _pillarI: @@ -165,6 +164,18 @@ Each ROS 2 workspace can have one or multiple firmware packages deployed. The se By splitting vendors across packages, consumers and maintainers can easily switch between hardware acceleration solutions. +.. _pillarIV: + +Pillar IV - cloud extensions +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + +Leveraging the cloud provides roboticists with unlimited resources to further accelerate computations. Besides lots of CPU, cloud computing providers such as `GCP `_, `Azure `_ or `AWS `_ offer instances that provide big FPGAs and GPUs for on-cloud hardware acceleration. This means once the ROS graph is partially in the cloud, architects can use computing there, including custom accelerators, to reduce and optimize robotic computations. But tapping into all that power while aligned with common ROS and robotics development flows is non-trivial. + +This fourth pillar, ``cloud extensions``, helps robotic architects bridge the gap and simplify the use of hardware acceleration in the cloud for ROS. It does so by extending the three prior pillars (pillarI_, pillarII_ and pillarIII_) and adding cloud capabilities to them. Ultimately, the cloud extensions allow to easily build hardware acceleration kernels that target cloud instances while aligning to the unified APIs for cloud provisioning, set up, deployment and launch derived from the standard ROS 2 launch system. + +In turn, a reference implementation leveraging hardware acceleration in the cloud with ROS 2 will be facilitated with one of the cloud service providers. + .. _specification: @@ -840,6 +851,11 @@ References and Footnotes .. [8] Acceleration Robotics, "Hardware accelerated ROS 2 pipelines and towards the Robotic Processing Unit (RPU)". https://news.accelerationrobotics.com/hardware-accelerated-ros2-pipelines/ +.. [9] Mayoral-Vilches, V., & Corradi, G. (2021). Adaptive computing in robotics, leveraging ros 2 to enable software-defined hardware for fpgas. + https://www.xilinx.com/support/documentation/white_papers/wp537-adaptive-computing-robotics.pdf + +.. [10] Mayoral-Vilches, V. (2021). Kria Robotics Stack. + https://www.xilinx.com/content/dam/xilinx/support/documentation/white_papers/wp540-kria-robotics-stack.pdf Copyright From cdc36f31020427901c6de71e89db66a80f5fac3c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=ADctor=20Mayoral=20Vilches?= Date: Fri, 10 Jun 2022 13:20:20 +0200 Subject: [PATCH 09/24] Switch to Standards Track, minor fixes and references MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Víctor Mayoral Vilches --- rep-2008.rst | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/rep-2008.rst b/rep-2008.rst index 5ccc82c2a..dd7fa39fd 100644 --- a/rep-2008.rst +++ b/rep-2008.rst @@ -2,10 +2,10 @@ REP: 2008 Title: ROS 2 Hardware Acceleration Architecture and Conventions Author: Víctor Mayoral-Vilches Status: Active -Type: Informational +Type: Standards Track Content-Type: text/x-rst Created: 10-Aug-2021 -Post-History: 18-April-2022 +Post-History: 10-June-2022 Abstract @@ -85,7 +85,7 @@ Architecture pillars Pillar I - Extensions to ament ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The first pillar represents extensions of the ``ament`` ROS 2 build system. These CMake extensions help achieve the objective of simplifying the creation of acceleration kernels. By providing an experience and a syntax similar to other ROS 2 libraries targeting CPUs, maintainers will be able to integrate kernels into their packages easily. The ``ament_acceleration`` ROS 2 package abstracts the build system extensions from technology-specific frameworks and software platforms. This allows to easily support hardware acceleration across FPGAs and GPUs while using the same syntax, simplifying the work of maintainers. The code listing below provides an example that instructs the ``CMakeLists.txt`` file of a ROS 2 package to build a ``vadd`` kernel with the corresponding sources and configuration: +The first pillar represents extensions of the ``ament`` ROS 2 build system. These CMake extensions help achieve the objective of simplifying the creation of acceleration kernels. By providing an experience and a syntax similar to other ROS 2 libraries targeting CPUs, maintainers will be able to integrate kernels into their packages easily. The `ament_acceleration `_ ROS 2 package abstracts the build system extensions from technology-specific frameworks and software platforms. This allows to easily support hardware acceleration across FPGAs and GPUs while using the same syntax, simplifying the work of maintainers. The code listing below provides an example that instructs the ``CMakeLists.txt`` file of a ROS 2 package to build a ``vadd`` kernel with the corresponding sources and configuration: :: @@ -713,10 +713,15 @@ The chart below compares Quality Levels 1 through 5 relative to the 'Level 1' re +Backwards Compatibility +======================= +The proposed features and conventions add new functionality while not modifying existing functionality. + + Reference Implementation and recommendations ============================================ -Reference implementations complying with this REP and extending the ROS 2 build system and tools for hardware acceleration are available at the `Hardware Accelerationg WG GitHub organization `_. This includes also the abstraction layer ``ament_acceleration`` and firmware from vendor specializalizations like `ament_vitis `_. +Reference implementations complying with this REP and extending the ROS 2 build system and tools for hardware acceleration are available at the `Hardware Accelerationg WG GitHub organization `_. This includes also the abstraction layer `ament_acceleration `_ and firmware from vendor specializalizations like `ament_vitis `_. A paper describing in more detail the reference implementation is available at [11]_. For additional implementations and recommendations, check out the `Hardware Accelerationg WG GitHub organization `_. @@ -851,12 +856,15 @@ References and Footnotes .. [8] Acceleration Robotics, "Hardware accelerated ROS 2 pipelines and towards the Robotic Processing Unit (RPU)". https://news.accelerationrobotics.com/hardware-accelerated-ros2-pipelines/ -.. [9] Mayoral-Vilches, V., & Corradi, G. (2021). Adaptive computing in robotics, leveraging ros 2 to enable software-defined hardware for fpgas. +.. [9] Mayoral-Vilches, V., & Corradi, G. (2021). "Adaptive computing in robotics, leveraging ros 2 to enable software-defined hardware for fpgas". https://www.xilinx.com/support/documentation/white_papers/wp537-adaptive-computing-robotics.pdf -.. [10] Mayoral-Vilches, V. (2021). Kria Robotics Stack. +.. [10] Mayoral-Vilches, V. (2021). "Kria Robotics Stack". https://www.xilinx.com/content/dam/xilinx/support/documentation/white_papers/wp540-kria-robotics-stack.pdf +.. [11] Mayoral-Vilches, V., Neuman, S. M., Plancher, B., & Reddi, V. J. (2022). "RobotCore: An Open Architecture for Hardware Acceleration in ROS 2". + https://arxiv.org/pdf/2205.03929.pdf + Copyright ========= From 8b30b2608c70ecee8211b482c7a922b9eec07eb3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=ADctor=20Mayoral=20Vilches?= Date: Fri, 10 Jun 2022 13:23:37 +0200 Subject: [PATCH 10/24] Switch Status: Draft to meet REP 1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Víctor Mayoral Vilches --- rep-2008.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rep-2008.rst b/rep-2008.rst index dd7fa39fd..eeeb853a4 100644 --- a/rep-2008.rst +++ b/rep-2008.rst @@ -1,7 +1,7 @@ REP: 2008 Title: ROS 2 Hardware Acceleration Architecture and Conventions Author: Víctor Mayoral-Vilches -Status: Active +Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 10-Aug-2021 From 0d6b3eaf03b7154b8e41240320486c3a054d5087 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=ADctor=20Mayoral=20Vilches?= Date: Tue, 19 Jul 2022 06:59:13 +0200 Subject: [PATCH 11/24] Undo changes in .gitignore MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Víctor Mayoral Vilches --- .gitignore | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/.gitignore b/.gitignore index 0563bb60d..4c3bc3c4d 100644 --- a/.gitignore +++ b/.gitignore @@ -1,5 +1,4 @@ rep-0000.rst *.html *.pyc -*~ -.vscode \ No newline at end of file +*~ \ No newline at end of file From a35594cf689352b90b1eedb6ee381fc2d056bb9b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=ADctor=20Mayoral=20Vilches?= Date: Tue, 19 Jul 2022 07:07:22 +0200 Subject: [PATCH 12/24] Remove colcon extensions from architecture pillars MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Move it instead to reference implementation Signed-off-by: Víctor Mayoral Vilches --- rep-2008.rst | 60 ++++++++++++++-------------------------------------- 1 file changed, 16 insertions(+), 44 deletions(-) diff --git a/rep-2008.rst b/rep-2008.rst index eeeb853a4..9b5388a6b 100644 --- a/rep-2008.rst +++ b/rep-2008.rst @@ -5,7 +5,7 @@ Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 10-Aug-2021 -Post-History: 10-June-2022 +Post-History: 19-July-2022 Abstract @@ -72,12 +72,12 @@ Architecture pillars | tooling | | ament_acceleration|colcon_acceleration| | | | | | +---------------------------------------------+-----------+ | | | build system | meta build | firmware | cloud| - +-----------+ +--------+----------+-------+-----------+-+--------+--+---+ - | rcl | | | | | - +-----------+ | | | | - | rmw | | | | | - +-----------+ + + + + - |rmw_adapter| Pillar I Pillar II Pillar III Pillar IV + +-----------+ +--------+----------+-------------------+-+--------+--+---+ + | rcl | | | | + +-----------+ | | | + | rmw | | | | + +-----------+ + + + + |rmw_adapter| Pillar I Pillar II Pillar III +-----------+ @@ -124,55 +124,25 @@ Through ``ament_acceleration`` and technology-specific specializations (like ``a .. _pillarII: -Pillar II - Extensions to colcon -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The second pillar extends the ``colcon`` ROS 2 meta built tools to integrate hardware acceleration flows into the ROS 2 CLI tooling and allow to manage it. Examples of these extensions include emulation capabilities to speed-up the development process and/or facilitate it without access to the real hardware, or raw image production tools, which are convenient when packing together acceleration kernels for embedded targets. These extensions are implemented by the `colcon_acceleration `_ ROS 2 package. On a preliminary implementation, these extensions are provided the following ``colcon acceleration`` subverbs: - -:: - - colcon acceleration subverbs: - board Report the board supported in the deployed firmware - emulation Manage hardware emulation capabilities - hls Vitis HLS capabilities management extension - hypervisor Configure the Xen hypervisor - linux Configure the Linux kernel - list List supported firmware for hardware acceleration - mkinitramfs Creates compressed cpio initramfs (ramdisks) from raw image - mount Mount raw images - package Packages workspace with kernels for distribution - platform Report the platform enabled in the deployed firmware - select Select an existing firmware and default to it. - umount Umount raw images - v++ Vitis v++ compiler wrapper - version Report version of the tool - - -Using the ``list`` and ``select`` subverbs, it's possible to inspect and configure the different hardware acceleration solutions. The rest of the subverbs allow to manage hardware acceleration artifacts and platforms in a simplified manner. - -In turn, the list of subverbs will improve and grow to cover other technology solutions. - - -.. _pillarIII: - -Pillar III - firmware -^^^^^^^^^^^^^^^^^^^^^ +Pillar II - firmware +^^^^^^^^^^^^^^^^^^^^ -The third pillar is firmware, it is meant to provide firmware artifacts for each supported technology so that the kernels can be compiled against them, simplifying the process for consumers and maintainers, and further aligning with the ROS typical development flow. +The second pillar is firmware, it is meant to provide firmware artifacts for each supported technology so that the kernels can be compiled against them, simplifying the process for consumers and maintainers, and further aligning with the ROS typical development flow. Each ROS 2 workspace can have one or multiple firmware packages deployed. The selection of the active firmware in the workspace is performed by the ``colcon acceleration select`` subverb (pillarII_). To get a technology solution aligned with this REP's architecture, each vendor should provide and maintain an ``acceleration_firmware_`` package specialization that delivers the corresponding artifacts in line with its supported categories_ and capabilities_. Firmware artifacts should be deployed at ``/acceleration/firmware/`` and be ready to be used by the ROS 2 build system extensions at (pillarI_) . For a reference implementation of specialized vendor firmware package, see `acceleration_firmware_kv260 `_. By splitting vendors across packages, consumers and maintainers can easily switch between hardware acceleration solutions. -.. _pillarIV: +.. _pillarIII: -Pillar IV - cloud extensions -^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Pillar III - cloud extensions +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Leveraging the cloud provides roboticists with unlimited resources to further accelerate computations. Besides lots of CPU, cloud computing providers such as `GCP `_, `Azure `_ or `AWS `_ offer instances that provide big FPGAs and GPUs for on-cloud hardware acceleration. This means once the ROS graph is partially in the cloud, architects can use computing there, including custom accelerators, to reduce and optimize robotic computations. But tapping into all that power while aligned with common ROS and robotics development flows is non-trivial. -This fourth pillar, ``cloud extensions``, helps robotic architects bridge the gap and simplify the use of hardware acceleration in the cloud for ROS. It does so by extending the three prior pillars (pillarI_, pillarII_ and pillarIII_) and adding cloud capabilities to them. Ultimately, the cloud extensions allow to easily build hardware acceleration kernels that target cloud instances while aligning to the unified APIs for cloud provisioning, set up, deployment and launch derived from the standard ROS 2 launch system. +This third pillar, ``cloud extensions``, helps robotic architects bridge the gap and simplify the use of hardware acceleration in the cloud for ROS. It does so by extending the three prior pillars (pillarI_, pillarII_ and pillarIII_) and adding cloud capabilities to them. Ultimately, the cloud extensions allow to easily build hardware acceleration kernels that target cloud instances while aligning to the unified APIs for cloud provisioning, set up, deployment and launch derived from the standard ROS 2 launch system. In turn, a reference implementation leveraging hardware acceleration in the cloud with ROS 2 will be facilitated with one of the cloud service providers. @@ -723,6 +693,8 @@ Reference Implementation and recommendations Reference implementations complying with this REP and extending the ROS 2 build system and tools for hardware acceleration are available at the `Hardware Accelerationg WG GitHub organization `_. This includes also the abstraction layer `ament_acceleration `_ and firmware from vendor specializalizations like `ament_vitis `_. A paper describing in more detail the reference implementation is available at [11]_. +``colcon`` ROS 2 meta built tools can be extended to help integrate hardware acceleration flows into the ROS 2 CLI tooling. Examples of these extensions include emulation capabilities to speed-up the development process and/or facilitate it without access to the real hardware, or raw image production tools, which are convenient when packing together acceleration kernels for embedded targets. A reference implementation of these extensions is implemented at the `colcon-hardware-acceleration `_ ROS 2 package, which is available in the buildfarm. Refer to the package for more details on its capabilities. + For additional implementations and recommendations, check out the `Hardware Accelerationg WG GitHub organization `_. From 5a97eb856bab0db151c34dfcc1bb65855e182de6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=ADctor=20Mayoral=20Vilches?= Date: Tue, 19 Jul 2022 07:27:47 +0200 Subject: [PATCH 13/24] Minor adjustments in diagram MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Víctor Mayoral Vilches --- rep-2008.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/rep-2008.rst b/rep-2008.rst index 9b5388a6b..e2cc6878e 100644 --- a/rep-2008.rst +++ b/rep-2008.rst @@ -69,9 +69,9 @@ Architecture pillars +-----------+ +-----------------+-+-------------------------+-----------+ | | | ament_1| ament_2 | | | | | | | +---------------------------------------+ fw_1|fw_2|cloud1| - | tooling | | ament_acceleration|colcon_acceleration| | | | + | tooling | | ament_acceleration| colcon_hw_accel | | | | | | +---------------------------------------------+-----------+ - | | | build system | meta build | firmware | cloud| + | | | build system | meta build | firmware | cloud| +-----------+ +--------+----------+-------------------+-+--------+--+---+ | rcl | | | | +-----------+ | | | From c966858ec0b5fe4db2321a2bfb9a2f1d8620f2a3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=ADctor=20Mayoral=20Vilches?= Date: Tue, 19 Jul 2022 07:42:36 +0200 Subject: [PATCH 14/24] Remove cloud extensions from REP MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Víctor Mayoral Vilches --- rep-2008.rst | 41 ++++++++++++++--------------------------- 1 file changed, 14 insertions(+), 27 deletions(-) diff --git a/rep-2008.rst b/rep-2008.rst index e2cc6878e..ecde10c8d 100644 --- a/rep-2008.rst +++ b/rep-2008.rst @@ -64,20 +64,20 @@ Architecture pillars +-----------+ +---------------------------+ | | | acceleration_examples | - |user land | +-----------------+------------------+--+-----------------+ - | | | Drivers | Libraries | Firmware |Cloud | - +-----------+ +-----------------+-+-------------------------+-----------+ - | | | ament_1| ament_2 | | | | | - | | +---------------------------------------+ fw_1|fw_2|cloud1| - | tooling | | ament_acceleration| colcon_hw_accel | | | | - | | +---------------------------------------------+-----------+ - | | | build system | meta build | firmware | cloud| - +-----------+ +--------+----------+-------------------+-+--------+--+---+ - | rcl | | | | - +-----------+ | | | - | rmw | | | | - +-----------+ + + + - |rmw_adapter| Pillar I Pillar II Pillar III + |user land | +-----------------+------------------+--+----------+ + | | | Drivers | Libraries | Firmware | + +-----------+ +-----------------+-+-------------------------+----+ + | | | ament_1| ament_2 | | | | + | | +---------------------------------------+ fw_1|fw_2| + | tooling | | ament_acceleration| colcon_hw_accel | | | + | | +---------------------------------------------+----+ + | | | build system | meta build | firmware | + +-----------+ +--------+----------+-------------------+-+--------+ + | rcl | | | + +-----------+ | | + | rmw | | | + +-----------+ + + + |rmw_adapter| Pillar I Pillar II +-----------+ @@ -134,19 +134,6 @@ Each ROS 2 workspace can have one or multiple firmware packages deployed. The se By splitting vendors across packages, consumers and maintainers can easily switch between hardware acceleration solutions. -.. _pillarIII: - -Pillar III - cloud extensions -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - - -Leveraging the cloud provides roboticists with unlimited resources to further accelerate computations. Besides lots of CPU, cloud computing providers such as `GCP `_, `Azure `_ or `AWS `_ offer instances that provide big FPGAs and GPUs for on-cloud hardware acceleration. This means once the ROS graph is partially in the cloud, architects can use computing there, including custom accelerators, to reduce and optimize robotic computations. But tapping into all that power while aligned with common ROS and robotics development flows is non-trivial. - -This third pillar, ``cloud extensions``, helps robotic architects bridge the gap and simplify the use of hardware acceleration in the cloud for ROS. It does so by extending the three prior pillars (pillarI_, pillarII_ and pillarIII_) and adding cloud capabilities to them. Ultimately, the cloud extensions allow to easily build hardware acceleration kernels that target cloud instances while aligning to the unified APIs for cloud provisioning, set up, deployment and launch derived from the standard ROS 2 launch system. - -In turn, a reference implementation leveraging hardware acceleration in the cloud with ROS 2 will be facilitated with one of the cloud service providers. - - .. _specification: Specification From e41da1f2d1ccbc3e3d519c728ad524bef5cdb63d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=ADctor=20Mayoral=20Vilches?= Date: Tue, 19 Jul 2022 07:52:11 +0200 Subject: [PATCH 15/24] Remove details in build tools capabilities MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Víctor Mayoral Vilches --- rep-2008.rst | 149 +-------------------------------------------------- 1 file changed, 2 insertions(+), 147 deletions(-) diff --git a/rep-2008.rst b/rep-2008.rst index ecde10c8d..8f5b655b7 100644 --- a/rep-2008.rst +++ b/rep-2008.rst @@ -347,77 +347,12 @@ The following list describes the hardware acceleration capabilities that hardwar .. _3.i: - i. Must provide tools for hardware emulation (e.g. via QEMU) that emulate the hardware acceleration solution - - .. _3.ii: - - ii. Must provide tools for hardware emulation (e.g. via QEMU) that simulate the hardware acceleration solution (to speed-up development and facilitate debugging) - - .. _3.iii: - - iii. Must provide tools for managing (mount, umount, deploy, etc.) raw disk images or any other binary format necessary by the acceleration technology solution. - - .. _3.iv: - - iv. (*only applicable to edge/embedded computing targets*) Must provide tools for deploying custom Linux kernels, with at least two default options: a vanilla kernel and a fully preemptible one (`PREEMPT_RT` patches). - - .. _3.iv.a: - - a. *Must provide modern Linux kernels (5.4.0+)* - - .. _3.iv.b: - - b. *Must provide modern Linux LTS kernels (5.10.0)* - - - .. _3.v: - - v. (*only applicable to edge/embedded computing targets*) Must provide tools for deploying a hypervisor (e.g. Xen) for mixed critical applications in the resulting raw images. - - .. _3.v.a: - - a. *Must provide tools for enabling the deployment of hypervisor Virtual Machines (VMs) without a control domain (e.g. dom0less VMs in Xen)* - - .. _3.v.b: - - b. *Must provide tools for enabling the deployment of guest VMs (e.g. domUs in Xen) in a secondary non-volatile storage system (e.g. a raw image partition).* - - .. _3.v.c: - - c. *Must provide tools for enabling the deployment of the control domain VM (e.g. dom0 in Xen) in a secondary non-volatile storage system (e.g. a raw image partition).* - - .. _3.v.d: - - d. *Must provide tools for enabling the deployment VMs without a control domain (e.g. dom0less VMs in Xen) in a secondary non-volatile storage system (e.g. a raw image partition).* - - .. _3.vi: - - vi. (*only applicable to edge/embedded computing targets*) Must provide tools for network booting. - - .. _3.vi.a: - - a. *Must provide tools for network booting the kernel, device tree blob and boot script artifacts (by chainloading or similar)* - - .. _3.vi.b: - - b. *Must provide tools for network booting the root file system* - - .. _3.vi.c: - - c. *Must provide tools for network booting multiple embedded solutions (i.e. creating a proper folder structure to maintain boot artifacts)* - - .. _3.vi.d: - - d. *Must provide tools for network booting securely exchanging secure SSH keys to each one of embedded solutions* - - .. _3.vi.e: - - e. *Must provide tools for flashing into secondary storage all the network boot artifacts (kernel, device tree, boot scripts, rootfs, etc.). This should leave the embedded solution fully functional and aligned with the corresponding artifacts just flashed.* + i. Provides ROS 2 meta build tool (colcon) extensions to facilitate hardware acceleration. .. _benchmarking_capability: -4. **Benchmarking:** +1. **Benchmarking:** .. _4.i: @@ -552,86 +487,6 @@ The chart below compares Quality Levels 1 through 5 relative to the 'Level 1' re - ● - ○ - - * - 3.ii_ - - ● - - ● - - ○ - - - * - 3.iii_ - - ✓ - - ✓ - - ● - - - * - 3.iv_ - - ✓ - - ✓ - - ○ - - - * - 3.iv.a_ - - ✓ - - ● - - ○ - - - * - 3.iv.b_ - - ● - - ● - - ○ - - - * - 3.v_ - - ✓ - - ● - - ○ - - - * - 3.v.a_ - - ✓ - - ● - - ○ - - - * - 3.v.b_ - - ✓ - - ● - - ○ - - - * - 3.v.c_ - - ✓ - - ● - - ○ - - - * - 3.v.d_ - - ✓ - - ● - - ○ - - - * - 3.vi_ - - ● - - ● - - ○ - - - * - 3.vi.a_ - - ● - - ● - - ○ - - - * - 3.vi.b_ - - ● - - ● - - ○ - - - * - 3.vi.c_ - - ● - - ● - - ○ - - - * - 3.vi.d_ - - ● - - ● - - ○ - - - * - 3.vi.e_ - - ● - - ● - - ○ - - * - Benchmarking - - From eec1a2a947df7f19678b84c712c35b6d2c831e95 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=ADctor=20Mayoral=20Vilches?= Date: Tue, 19 Jul 2022 07:55:41 +0200 Subject: [PATCH 16/24] Removed some of the parts to further simplify. Mainly: - Removed Template for Maintainers - Removed labels for Maintainers - Removed much of the content of Template for vendors and pointed to one example. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Víctor Mayoral Vilches --- rep-2008.rst | 91 +--------------------------------------------------- 1 file changed, 1 insertion(+), 90 deletions(-) diff --git a/rep-2008.rst b/rep-2008.rst index 8f5b655b7..a5d17ebfb 100644 --- a/rep-2008.rst +++ b/rep-2008.rst @@ -540,100 +540,11 @@ Reference implementations complying with this REP and extending the ROS 2 build For additional implementations and recommendations, check out the `Hardware Accelerationg WG GitHub organization `_. - -.. _labels for maintainers: - -Labels for maintainers -^^^^^^^^^^^^^^^^^^^^^^ -Maintainers are encouraged to mark their packages including hardware acceleration with a label that indicates so, for each one of the hardware acceleration technology solutions they support. This REP proposes that such indication takes the following shape: - -.. image:: https://img.shields.io/badge/hardware_acceleration-KV260-ec1c24.svg - -which can be created with: - -:: - - https://img.shields.io/badge/hardware_acceleration--ec1c24.svg - - e.g. to deliver the result above - https://img.shields.io/badge/hardware_acceleration-KV260-ec1c24.svg - - -Template for Maintainers -^^^^^^^^^^^^^^^^^^^^^^^^ -Besides indicating the acceleration solutions supported by a given package through `labels for maintainers`_, maintainers are encouraged to also add additional information of the acceleration kernels the package to provide consumers with a quick intuition of the added value of using hardware acceleration. - -The following **Markdown** table provides an example of such additional information for the `accelerated_vadd_publisher `_ that ships within `acceleration examples`_. - - -:: - - ### Hardware acceleration - - | Kernel | Description | Acceleration factor | Technology | CPU baseline | Acceleration measurement | - |------------|-------------|-----------------------|------------|--------------|--------------------------| - | [`vadd`](https://github.com/ros-acceleration/acceleration_examples/blob/main/accelerated_vadd_publisher/src/vadd.cpp) | An offloaded version of the trivial [vadd_publisher](https://github.com/ros-acceleration/acceleration_examples/tree/main/vadd_publisher) ROS 2 publisher which adds two inputs to a vector in a loop and publishes them at 10 Hz. Vector add operations are offloaded into to the FPGA. The offloading operation into the FPGA allows the publisher to go from 2 Hz to 6 Hz but still misses its target (10 Hz) | 3 | KV260 | 2 Hz (measured with `ros2 topic hz` | 6 Hz `ros2 topic hz` | - - -In case it's a meta-package, and there're several packages with one or multipole kernels, the following format can be used instead: - -:: - - ### Hardware acceleration - - | Package | Kernel | Description | Acceleration factor | Technology | CPU baseline | Acceleration measurement | - |---------|------------|-------------|-----------------------|------------|--------------|--------------------------| - | [accelerated_vadd_publisher](https://github.com/ros-acceleration/acceleration_examples/tree/main/accelerated_vadd_publisher) | [`vadd`](https://github.com/ros-acceleration/acceleration_examples/blob/main/accelerated_vadd_publisher/src/vadd.cpp) | An offloaded version of the trivial [vadd_publisher](https://github.com/ros-acceleration/acceleration_examples/tree/main/vadd_publisher) ROS 2 publisher which adds two inputs to a vector in a loop and publishes them at 10 Hz. Vector add operations are offloaded into to the FPGA. The offloading operation into the FPGA allows the publisher to go from 2 Hz to 6 Hz but still misses its target (10 Hz) | 3 | KV260 | 2 Hz (measured with `ros2 topic hz` | 6 Hz `ros2 topic hz` | - | [faster_vadd_publisher](https://github.com/ros-acceleration/acceleration_examples/tree/main/faster_vadd_publisher) | [`vadd`](https://github.com/ros-acceleration/acceleration_examples/blob/main/faster_vadd_publisher/src/vadd.cpp) | An accelerated version of the trivial [vadd_publisher](https://github.com/ros-acceleration/acceleration_examples/tree/main/vadd_publisher) ROS 2 publisher which adds two inputs to a vector in a loop and publishes them at 10 Hz. Vector add operations are offloaded into to the FPGA. The acceleration and optimizations of the dataflow allows the publisher to go from 2 Hz to 10 Hz, meeting its target | 5 | KV260 | 2 Hz (measured with `ros2 topic hz` | 10 Hz `ros2 topic hz` | - - -This REP encourages maintainers to report the impact of hardware acceleration. Each maintainer is free to customize the reporting template. - Template for Vendors ^^^^^^^^^^^^^^^^^^^^ Silicon vendors and solution manufacturers can help set the expectations of the level of support their hardware acceleration technology provides in alignment with this REP by providing a template in the README.md files of their ``firmware`` and/or ``ament`` extensions. Doing so will facilitate the process for consumers and maintainers. -The following template provides an example in **Markdown** syntax: - -:: - - | Capability | Status | - |------------|--------| - | **`1.` Kernel Levels** | | - | [`1.i` level I kernels](https://ros.org/reps/rep-2008.html#i) | ✓ | - | [`1.ii` level II kernels](https://ros.org/reps/rep-2008.html#ii) | :warning: (see [this](https://github.com/Lien182/ReconROS)) | - | [`1.iii` level III kernels](https://ros.org/reps/rep-2008.html#iii) | | - | **`2.` Build System** | | - | [`2.i` ament extensions](https://ros.org/reps/rep-2008.html#id13) | ✓ | - | [`2.ii` `ament_acceleration` support](https://ros.org/reps/rep-2008.html#id14) | | - | **`3`. Build Tools** | | - | [`3.i` hardware emulation (`hw_emu`) ](https://ros.org/reps/rep-2008.html#id15) | | - | [`3.ii` hardware emulation (`sw_emu`)](https://ros.org/reps/rep-2008.html#id16) | ✓ | - | [`3.iii` image tooling](https://ros.org/reps/rep-2008.html#id17) | ✓ | - | [`3.iv` Linux kernel ](https://ros.org/reps/rep-2008.html#iv) | ✓ | - | [`3.iv.a` modern Linux kernel](https://ros.org/reps/rep-2008.html#iv-a) | ✓ | - | [`3.iv.b` LTS Linux kernel](https://ros.org/reps/rep-2008.html#iv-b) | | - | [`3.v` hypervisor ](https://ros.org/reps/rep-2008.html#v) | ✓ | - | [`3.v.a` no control domain VMs](https://ros.org/reps/rep-2008.html#v-a) | ✓ | - | [`3.v.b` guest VMs in disk](https://ros.org/reps/rep-2008.html#v-b) | ✓ | - | [`3.v.c` control domain in disk](https://ros.org/reps/rep-2008.html#v-c) | ✓ | - | [`3.v.d` no control domain VMs in disk](https://ros.org/reps/rep-2008.html#v-d) | | - | [`3.vi` network booting ](https://ros.org/reps/rep-2008.html#vi) | | - | [`3.vi.a` boot artifacts ](https://ros.org/reps/rep-2008.html#vi-a) | | - | [`3.vi.b` rootfs ](https://ros.org/reps/rep-2008.html#vi-b) | | - | [`3.vi.c` multi-network boot](https://ros.org/reps/rep-2008.html#vi-c) | | - | [`3.vi.d` secure network booting](https://ros.org/reps/rep-2008.html#vi-d) | | - | [`3.vi.e` save in disk network boot](https://ros.org/reps/rep-2008.html#vi-e) | | - | **`4.` Benchmarking** | | - | [`4.i` kernel benchmarking](https://ros.org/reps/rep-2008.html#id18) | ✓ | - | [`4.ii` ROS 2 acceleration benchmarking](https://ros.org/reps/rep-2008.html#id19) | | - | **`5.` Documentation** | | - | [`5.i` in-code documentation](https://ros.org/reps/rep-2008.html#id20) | ✓ | - | **`6.` Testing and CI** | | - | [`6.i` `acceleration_examples` ](https://ros.org/reps/rep-2008.html#id21) | ✓ | - - - +For a **Markdown** syntax example of such template, refer to `acceleration_firmware_kr260 `_. References and Footnotes From 3b668c03f4598c7a210389068541daec46056ffa Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=ADctor=20Mayoral=20Vilches?= Date: Tue, 19 Jul 2022 07:57:04 +0200 Subject: [PATCH 17/24] Remark colcon extensions as optional MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Víctor Mayoral Vilches --- rep-2008.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rep-2008.rst b/rep-2008.rst index a5d17ebfb..2a8f6de30 100644 --- a/rep-2008.rst +++ b/rep-2008.rst @@ -21,7 +21,7 @@ With the decline of Moore’s Law, hardware acceleration has proven itself as th The purpose of this REP is to provide standard guidelines for how to use hardware acceleration in combination with ROS 2. These guidelines are realized in the form of a series of ROS 2 packages that integrate these external resources and provide a ROS 2-centric open architecture for hardware acceleration. -The architecture proposed extends the ROS 2 build system (``ament``), the ROS 2 build tools (``colcon``) and add a new firmware pillar to simplify the production and deployment of acceleration kernels. The architecture is agnostic to the computing target (i.e. considers support for edge, workstation, data center or cloud targets), technology-agnostic (considers initial support for FPGAs and GPUs), application-agnostic and modular, which enhances portability to new hardware solutions and other silicon vendors. The core components of the architecture are disclosed under an Apache 2.0 license, available and maintained at the `ROS 2 Hardware Acceleration Working Group GitHub organization `_. +The architecture proposed extends the ROS 2 build system (``ament``), optionally extends the ROS 2 build tools (``colcon``) and adds a new firmware pillar to simplify the production and deployment of acceleration kernels. The architecture is agnostic to the computing target (i.e. considers support for edge, workstation, data center or cloud targets), technology-agnostic (considers initial support for FPGAs and GPUs), application-agnostic and modular, which enhances portability to new hardware solutions and other silicon vendors. The core components of the architecture are disclosed under an Apache 2.0 license, available and maintained at the `ROS 2 Hardware Acceleration Working Group GitHub organization `_. Value for stakeholders: From 7b6217f54f5693971081454d17e7b94f099521c3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=ADctor=20Mayoral=20Vilches?= Date: Tue, 19 Jul 2022 08:00:23 +0200 Subject: [PATCH 18/24] Fix error introduced while simplifying capabilities MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Víctor Mayoral Vilches --- rep-2008.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rep-2008.rst b/rep-2008.rst index 2a8f6de30..b1e7c8377 100644 --- a/rep-2008.rst +++ b/rep-2008.rst @@ -352,7 +352,7 @@ The following list describes the hardware acceleration capabilities that hardwar .. _benchmarking_capability: -1. **Benchmarking:** +4. **Benchmarking:** .. _4.i: From b9d2b48ac3f76c6021bbeecfa5208d5a3b5dcfe5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=ADctor=20Mayoral=20Vilches?= Date: Wed, 12 Oct 2022 19:49:14 +0200 Subject: [PATCH 19/24] TSC review pass 1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Still need to address a few concerns Signed-off-by: Víctor Mayoral Vilches --- rep-2008.rst | 415 ++++++++++----------------------------------------- 1 file changed, 79 insertions(+), 336 deletions(-) diff --git a/rep-2008.rst b/rep-2008.rst index b1e7c8377..ccecb21e1 100644 --- a/rep-2008.rst +++ b/rep-2008.rst @@ -5,45 +5,50 @@ Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 10-Aug-2021 -Post-History: 19-July-2022 +Post-History: 12-October-2022 Abstract ======== -This REP describes the architectural pillars and conventions required to introduce hardware acceleration in ROS 2 in a scalable and technology-agnostic manner. A set of categories meant to classify supported hardware acceleration solutions is provided. Inclusion in a category is based on the fulfillment of the hardware acceleration capabilities. +This REP describes the architectural pillars and conventions required to introduce hardware acceleration in ROS 2 in a vendor-neutral, scalable and technology-agnostic manner. Motivation ========== -With the decline of Moore’s Law, hardware acceleration has proven itself as the answer for achieving higher performance gains in robotics [1]_. By creating specialized compute architectures that rely on specific hardware (i.e. through FPGAs or GPUs), hardware acceleration empowers faster robots, with reduced computation times (real fast, as opposed to real-time), lower power consumption and more deterministic behaviours [9]_ [10]_. The core idea is that instead of following the traditional control-driven approach for software development in robotics, a mixed control- and data-driven one allows to design custom compute architectures that further exploit parallelism. To do so, we need to integrate in the ROS 2 ecosystem external frameworks, tools and libraries that facilitate creating parallel compute architectures. +With the decline of Moore’s Law, hardware acceleration has proven itself as the answer for achieving higher performance gains in robotics [1]_. CPUs are widely used in the ROS 2 ecosystem due to their availability however they hardly provide real-time and safety guarantees while delivering high throughput. Hardware acceleration with accelerators like FPGAs, GPUs, DPUs or TPUs present an answer to this problem. One that allows the robotics architect to create custom computing architectures for robots that comply with real-time and throughput requirements simultaneously, while increasing the performance-per-watt and saving energy. -The purpose of this REP is to provide standard guidelines for how to use hardware acceleration in combination with ROS 2. These guidelines are realized in the form of a series of ROS 2 packages that integrate these external resources and provide a ROS 2-centric open architecture for hardware acceleration. + From a systems architecture perspective, hardware acceleration helps create specialized compute architectures that rely on specific hardware accelerators which empower faster robots, with reduced computation times (*real fast*, as opposed to *real-time*), lower power consumption and more deterministic behaviours [2]_ [3]_. The core idea is that instead of following the traditional CPU control-driven approach for software development in robotics, a mixed control- and data-driven one with accelerators allows to design custom compute architectures that further exploit parallelism in robotic algorithms. This is a paradigm shift that requires a change in the way we think about robotics software development and the way we design and architect robotic systems. -The architecture proposed extends the ROS 2 build system (``ament``), optionally extends the ROS 2 build tools (``colcon``) and adds a new firmware pillar to simplify the production and deployment of acceleration kernels. The architecture is agnostic to the computing target (i.e. considers support for edge, workstation, data center or cloud targets), technology-agnostic (considers initial support for FPGAs and GPUs), application-agnostic and modular, which enhances portability to new hardware solutions and other silicon vendors. The core components of the architecture are disclosed under an Apache 2.0 license, available and maintained at the `ROS 2 Hardware Acceleration Working Group GitHub organization `_. +Hardware acceleration can revolutionize ROS 2 computations, enabling new applications by speeding up robot response times while remaining power-efficient. However, *the diversity of acceleration options* (FPGAs, GPUs, DPUs or TPUs, among many other accelerators) *makes it difficult for roboticists to easily deploy accelerated systems without expertise in each specific accelerator platform*. To address this problem, this REP provides a reference architecture and a series of conventions to integrate in the ROS 2 ecosystem accelerators and hardware acceleration external frameworks, tools and libraries that facilitate creating parallel compute architectures. + +.. This REP aims to provide a vendor-neutral, scalable and technology-agnostic approach to hardware acceleration in ROS 2 that can be easily adopted by roboticists and researchers alike. + +**The purpose of this REP is thereby to provide standard guidelines on how to use hardware acceleration in combination with ROS 2**. These guidelines are realized in the form of a series of ROS 2 packages that implement an open architecture and abstract away these external resources providing a vendor-neutral, scalable and technology-agnostic ROS 2-centric development experience for hardware acceleration. The proposed open architecture also allows for the integration of external frameworks, tools and libraries, so that changing from one accelerator to another becomes trivial. This REP also recommends a methodology for developing ROS 2 packages that integrate hardware acceleration which can help developers guide their efforts. **This REP does not aim to dictate policy on which frameworks or languages shall be used to create acceleration kernels, nor it aims to abstract away from these technologies**. This decision is left to the architect building the acceleration. For example, the use of OpenCL, CUDA, Vulkan, HLS, Halide, Exo, etc. is beyond the scope of this REP and often attached to the specific accelerator in-use, as each silicon vendor supports only certain technologies. What this REP provides is an entry point for developers to define their own acceleration kernels, integrate them into ROS 2 applications and if appropriate, command the ROS 2 build system to build such accelerators. + +To do so, the architecture proposed extends the ROS 2 build system (``ament``), optionally extends the ROS 2 build tools (``colcon``) and adds a new firmware pillar to simplify the production and deployment of acceleration kernels. The architecture is agnostic to the computing target (i.e. considers support for edge, workstation, data center or cloud targets), technology-agnostic (considers initial support for FPGAs and GPUs), application-agnostic and modular, which enhances portability to new accelerators across silicon vendors. The core components of the architecture are implemented and open sourced under an Apache 2.0 license, and maintained at the `ROS 2 Hardware Acceleration Working Group GitHub organization `_. Value for stakeholders: -- Package maintainers can use these guidelines to integrate hardware acceleration capabilities in their ROS 2 packages. +- Package maintainers can use these guidelines to integrate hardware acceleration capabilities in their ROS 2 packages in an accelerator-agnostic manner. -- Consumers can use the guidelines in the REP, as well as the corresponding category of each hardware solution, to set expectations on the hardware acceleration capabilities that could be obtained from each vendor's hardware solution. +- Consumers can use the guidelines in the REP, as well as the corresponding benchmarks of each accelerator (if available), to set expectations on the hardware acceleration capabilities, *ease of use* and scalability that could be obtained from each vendor's hardware acceleration solution. -- Silicon vendors and solution manufacturers can use these guidelines to connect their firmware and technologies to the ROS 2 ecosystem, obtaining direct support for hardware acceleration in all ROS 2 packages that support it. +- Silicon vendors and solution manufacturers can use these guidelines to connect their accelerators and firmware (including frameworks, tools and libraries) to the ROS 2 ecosystem. By doing so, they will obtain direct support for hardware acceleration in all ROS 2 packages that support it. -The outcome of this REP should be that maintainers who want to leverage hardware acceleration in their packages, can do so with consistent guidelines and with support across multiple technologies (FPGAs and GPUs initially, but extending towards more accelerators in the future) by following the conventions set. This way, maintainers will be able to create ROS 2 packages with support for hardware acceleration that can run across hardware acceleration technologies, including FPGAs and GPUs. +The outcome of this REP should be that maintainers who want to leverage hardware acceleration in their packages, can do so with consistent guidelines and with support across multiple hardware acceleration technologies (FPGAs and GPUs initially, but extending towards more accelerators in the future) by following the conventions set. This way, maintainers will be able to create ROS 2 packages with support for hardware acceleration that can run across accelerators. In turn, as silicon vendors add support for their technologies and accelerators, ROS 2 packages aligned with this REP will automatically gain support for these new accelerators. -In turn, the documentation of categories and hardware acceleration capabilities will improve. The guidelines in here provide a ROS 2-centric open architecture for hardware acceleration, which silicon vendors can decide to adopt when engaging with the ROS 2 community. Antigoals ^^^^^^^^^ -The motivation behind this REP does not include: +The motivation behind this REP does **not** include: -* Dictating policy implementation specifics on maintainers to achieve hardware acceleration +* Dictating policy implementation specifics on maintainers to achieve hardware acceleration and/or create acceleration kernels (e.g use of CUDA, HLS, OpenCL, or similar) * Policy requirements (which framework to use, specific C++ preprocessor instructions, etc.) are *intentionally generic* and to remain technology-agnostic (from a hardware acceleration perspective). * Maintainers can come up with their own policies depending on the technology targeted (FPGAs, GPUs, etc.) and framework used (``HLS``, ``ROCm``, ``CUDA``, etc). @@ -52,32 +57,34 @@ The motivation behind this REP does not include: * Enforcing specific hardware solutions on maintainers * No maintainer is required to target any specific hardware solution by any of the guidelines in this REP. - * Instead, maintainers are free to choose among any of the supported technologies and hardware solutions. + * Instead, maintainers are free to choose from either using the proposed generic CMake macros meant to support all accelerators (e.g. ``acceleration_kernel`` ), or instead use vendor-specific CMake extensions for particular supported technologies and hardware solutions (e.g. ``vitis_acceleration_kernel`` ). Architecture pillars ==================== +Unless stated otherwise, the hardware acceleration terminology used in this document follows the OpenCL nomenclature ([9]_, [10]_) for hardware acceleration. The proposed architecture in this REP is depicted below: + :: ROS 2 stack Hardware Acceleration Architecture @ ROS 2 stack - +-----------+ +---------------------------+ - | | | acceleration_examples | - |user land | +-----------------+------------------+--+----------+ - | | | Drivers | Libraries | Firmware | - +-----------+ +-----------------+-+-------------------------+----+ - | | | ament_1| ament_2 | | | | - | | +---------------------------------------+ fw_1|fw_2| - | tooling | | ament_acceleration| colcon_hw_accel | | | - | | +---------------------------------------------+----+ - | | | build system | meta build | firmware | - +-----------+ +--------+----------+-------------------+-+--------+ - | rcl | | | - +-----------+ | | - | rmw | | | - +-----------+ + + - |rmw_adapter| Pillar I Pillar II + +-----------+ +-------------------------------------------------+ + | | | acceleration_examples | + |user land | +-----------------+-----------------------------------+----------+ + | | | Drivers | Libraries | Firmware | + +-----------+ +-------------------------------+-+-------------------------+----+ + | | | ament_vitis | ament_rocm | ... | | | | + | | +-----------------------------------------------------+ fw_1|fw_2| + | tooling | | ament_hardware_acceleration | colcon_hw_accel | | | + | | +-----------------------------------------------------------+----+ + | | | build system | meta build | firmware | + +-----------+ +------------------+--------------+-------------------+-+--------+ + | rcl | | | + +-----------+ | | + | rmw | | | + +-----------+ + + + |rmw_adapter| Pillar I Pillar II +-----------+ @@ -85,7 +92,7 @@ Architecture pillars Pillar I - Extensions to ament ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The first pillar represents extensions of the ``ament`` ROS 2 build system. These CMake extensions help achieve the objective of simplifying the creation of acceleration kernels. By providing an experience and a syntax similar to other ROS 2 libraries targeting CPUs, maintainers will be able to integrate kernels into their packages easily. The `ament_acceleration `_ ROS 2 package abstracts the build system extensions from technology-specific frameworks and software platforms. This allows to easily support hardware acceleration across FPGAs and GPUs while using the same syntax, simplifying the work of maintainers. The code listing below provides an example that instructs the ``CMakeLists.txt`` file of a ROS 2 package to build a ``vadd`` kernel with the corresponding sources and configuration: +The first pillar represents extensions of the ``ament`` ROS 2 build system. These CMake extensions help achieve the objective of simplifying the creation of acceleration kernels. By providing an experience and a syntax similar to other ROS 2 libraries targeting CPUs, maintainers will be able to integrate acceleration kernels into their packages easily. The `ament_hardware_acceleration `_ ROS 2 package abstracts the build system extensions from technology-specific frameworks and software platforms. This allows to easily support hardware acceleration across FPGAs and GPUs while using the same syntax, simplifying the work of maintainers. The code listing below provides an example that instructs the ``CMakeLists.txt`` file of a ROS 2 package to build a ``vadd`` acceleration kernel indicating the corresponding sources without the need to define a target accelerator: :: @@ -97,7 +104,7 @@ The first pillar represents extensions of the ``ament`` ROS 2 build system. Thes ) -Under the hood, each specialization of ``ament_acceleration`` should rely on the corresponding technology libraries to enable it. For example, `ament_vitis `_ relies on Vitis Unified Software Platform and on the Xilinx Runtime (XRT) library to generate acceleration kernels and facilitate OpenCL communication between the application code and the kernels. Vitis and XRT are completely hidden from the robotics engineer, simplifying the creation of kernels through simple CMake macros. The same kernel expressed with ``ament_vitis`` is presented below: +Under the hood, each specialization of ``ament_hardware_acceleration`` should rely on the corresponding frameworks, tools and libraries to enable building the acceleration kernel. For example, `ament_vitis `_ relies on Vitis Unified Software Platform (*Vitis* for short) to generate the appropriate acceleration kernels. The developer of such kernel would need to choose and implement how the CPU ROS abstractions (e.g. Nodes) would communicate with the acceleration kernel, if either through OpenCL or through the Xilinx Runtime (XRT) library, but that's abstracted away. In other words, the definition of the communication between the application code (ROS Node) and the acceleration kernels is decided by the developer and reflected in the source code of the Node. When using ``ament_hardware_acceleration`` macros such as ``acceleration_kernel`` , Vitis, OpenCL and XRT are completely hidden from the robotics engineer, simplifying the creation of kernels through simple CMake macros in the ``CMakeLists.txt`` file of the ROS package. If desired, the developer can express the same kernel using specializations ``ament_hardware_acceleration``. In the case of ``ament_vitis`` , the developer can use the ``vitis_acceleration_kernel`` macro to express the same kernel as above, but with finer-grained details: :: @@ -117,9 +124,30 @@ Under the hood, each specialization of ``ament_acceleration`` should rely on the -While ``ament_acceleration`` CMake macros are preferred and will be encouraged, maintainers are free to choose among the CMake macros available. After all, it'll be hard to define a generic set of macros that fits all use cases across technologies. +While ``ament_hardware_acceleration`` CMake macros are preferred and encouraged, maintainers are free to choose among all the CMake macros available within each of the specializalizations. After all, it'll be hard to define a generic set of macros that fits all use cases across technologies and silicon vendors. Maintainers are, however, encouraged to produce ROS packages that consider various accelerators. To do so, each extension of the ``ament`` ROS 2 build system for hardware acceleration purposes shall define CMake hardware acceleration variables. These variables are meant to be used by the maintainers to conditionally compile their ROS 2 packages for specific accelerators. The following table lists the variables defined by ``ament_hardware_acceleration`` and its specializalization ``ament_vitis`` for hardware acceleration purposes. Other specializalizations should follow along the same lines: + + +.. list-table:: + :widths: 25 25 50 + :header-rows: 1 + + * - ``ament`` ROS 2 build system extension + - variable + - description + * - `ament_hardware_acceleration `_ + - ``ROS_ACCELERATION`` + - This CMake variable will evaluate to ``True`` when targeting *any* of the supported ROS 2-enabled accelerators (see ``mixins`` enablement below). *Use while integrating acceleration kernels in a technology and vendor-agnostic manner*. + * - `ament_vitis `_ (specializes ``ament_hardware_acceleration``) + - ``ROS_VITIS`` + - *use to build ...* + * - + - ``ROS_XRT`` + - *use to consider cases with XRT as the communication framework between the application code and the acceleration kernels* -Through ``ament_acceleration`` and technology-specific specializations (like ``ament_vitis``), the ROS 2 build system is automatically enhanced to support producing acceleration kernel and related artifacts as part of the build process when invoking ``colcon build``. To facilitate the work of maintainers, this additional functionality is configurable through ``mixins`` that can be added to the build step of a ROS 2 workspace, triggering all the hardware acceleration logic only when appropriate (e.g. when ``--mixin kv260`` is appended to the build effort, it'll trigger the build of kernels targeting the KV260 hardware solution). For a reference implementation of these enhacements, refer to `ament_vitis `_. + +Through ``ament_hardware_acceleration`` and technology-specific specializations (like ``ament_vitis``), the ROS 2 build system is automatically enhanced to support producing acceleration kernel and related artifacts as part of the build process when invoking ``colcon build``. To facilitate the work of maintainers, this additional functionality is configurable through ``mixins`` that can be added to the build step of a ROS 2 workspace, triggering all the hardware acceleration logic only when appropriate (e.g. when ``--mixin kv260`` is appended to the build effort, it'll trigger the build of kernels targeting the KV260 hardware solution). For a reference implementation of these enhacements, refer to `ament_vitis `_. + +In turn, extensions to the existing CMake macros might be proposed which would allow to support more technologies and hardware solutions. For example, ``ament_vitis`` provides a ``vitis_acceleration_kernel`` macro that can be used to generate kernels for the Xilinx Vitis platform. Similarly, ``ament_rocm`` could provide a ``rocm_acceleration_kernel`` macro that can be used to generate kernels for the AMD ROCm platform. This way, maintainers can choose to use the generic ``acceleration_kernel`` macro, or instead use the technology-specific macros to target specific hardware solutions. Also, future extensions to the ``ament_hardware_acceleration`` package could be proposed to support the use of accelerators in binary formats, instead of build them from source code. This would allow to support accelerators which are are not fully integrated into ROS 2 through their corresponding technology libraries (e.g. FPGAs that are not supported by Vitis). .. _pillarII: @@ -129,7 +157,7 @@ Pillar II - firmware The second pillar is firmware, it is meant to provide firmware artifacts for each supported technology so that the kernels can be compiled against them, simplifying the process for consumers and maintainers, and further aligning with the ROS typical development flow. -Each ROS 2 workspace can have one or multiple firmware packages deployed. The selection of the active firmware in the workspace is performed by the ``colcon acceleration select`` subverb (pillarII_). To get a technology solution aligned with this REP's architecture, each vendor should provide and maintain an ``acceleration_firmware_`` package specialization that delivers the corresponding artifacts in line with its supported categories_ and capabilities_. Firmware artifacts should be deployed at ``/acceleration/firmware/`` and be ready to be used by the ROS 2 build system extensions at (pillarI_) . For a reference implementation of specialized vendor firmware package, see `acceleration_firmware_kv260 `_. +Each ROS 2 workspace can have one or multiple firmware packages deployed. The selection of the active firmware in the workspace is performed by the ``colcon acceleration select`` subverb (pillarII_). To get a technology solution aligned with this REP's architecture, each vendor should provide and maintain an ``acceleration_firmware_`` package specialization that delivers the corresponding artifacts. Firmware artifacts should be deployed at ``/acceleration/firmware/`` and be ready to be used by the ROS 2 build system extensions at (pillarI_) . For a reference implementation of specialized vendor firmware package, see `acceleration_firmware_kv260 `_. By splitting vendors across packages, consumers and maintainers can easily switch between hardware acceleration solutions. @@ -139,7 +167,9 @@ By splitting vendors across packages, consumers and maintainers can easily switc Specification ============= -To drive the creation, maintenance and testing of acceleration kernels in ROS 2 packages that are agnostic to the computing target (i.e. consider support for edge, workstation, data center or cloud targets) and technology-agnostic (considers initial support for FPGAs and GPUs), this REP builds on top of open standards. Particularly, ``OpenCL 1.2`` ([2]_, [3]_) is encouraged for a well established standardized interoperability between the host-side (CPU) and the acceleration kernel. Unless stated otherwise, the hardware acceleration terminology used in this document follows the OpenCL nomenclature for hardware acceleration. +**TODO: discuss the use of mixins, and CMake variables to enable/disable the build of acceleration kernels**. + +To drive the creation, maintenance and testing of acceleration kernels in ROS 2 packages that are agnostic to the computing target (i.e. consider support for edge, workstation, data center or cloud targets) and technology-agnostic (considers initial support for FPGAs and GPUs), this REP builds on top of open standards. Particularly, ``OpenCL 1.2`` ([9]_, [10]_) is encouraged for a well established standardized interoperability between the host-side (CPU) and the acceleration kernel. A ROS 2 package supports hardware acceleration if it provides support for at least one of the *supported hardware acceleration solutions* that comply with this REP. @@ -166,76 +196,9 @@ To favour modularity, organize kernels and allow robotics architects to select o Hardware acceleration solutions complying with this REP should aspire to support multiple kernel levels in ROS 2 to maximize consumer experience. -.. _benchmarking: - -Benchmarking -^^^^^^^^^^^^ -Benchmarking is the act of running a computer program to assess its relative performance. In the context of hardware acceleration, it's fundamental to assess the relative performance of an acceleration kernel versus its CPU scalar computing baseline. Similarly, benchmarking helps comparing acceleration kernels across hardware acceleration technology solution (e.g. FPGA_A vs FPGA_B or FPGA_A vs GPU_A, etc.) and across kernel implementations (within the same hardware acceleration technology solution). -There're different types of benchmarking approaches. The following diagram depicts the most popular inspired by [6]_: - -:: - - Probe Probe - + + - | | - +--------|------------|-------+ +-----------------------------+ - | | | | | | - | +--|------------|-+ | | | - | | v v | | | - latency <--------------+ Probe - | | | | | - throughput<--------------+ Probe - | | Function | | | - memory <--------------+ Probe - | | | | | - CPU <--------------+ Probe - | +-----------------+ | | | - | System under test | | System under test | - +-----------------------------+ +-----------------------------+ - - - Functional Non-functional - - - +-------------+ +----------------------------+ - | Test App. | | +-----------------------+ | - | + + + + | | | Application | | - +--|-|--|--|--+---------------+ | | <------------+ Probe - | | | | | | | +-----------------------+ | - | v v v v | | | - | Probes | | <------------+ Probe - | | | | - | System under test | | System under test | - | | | <------------+ Probe - | | | | - | | | | - +-----------------------------+ +----------------------------+ - - - Black-Box Grey-box - - -In addition, the following aspects should be considered when benchmarking acceleration kernels in ROS 2: - -- `embedded`: Benchmarks should run in embedded *easily* -- `ROS 2-native`: Benchmarks should consider the particularities of ROS 2 and its computational graph. If necessary, they should instrument the communications middleware and its underlying layers. -- `intra-process, inter-process and intra-network`: Measures conducted should consider communication within a process in the same SoC, between processes in an SoC and between different SoCs connected in the same network (intra-network). *Inter-network measures are beyond the scope of this REP*. -- `compute substrate-agnostic`: benchmarks should be able to run on different hardware acceleration technology solutions. For that purpose, a CPU-centric framework (as opposed to an acceleration technology-specific framework) for benchmarking and/or tracing is the ideal choice. -- `automated`: benchmarks and related source code should be designed with automation in mind. Once ready, creating a benchmark and producing results should be (ideally) a fully automated process. -- `hardware farm mindset`: benchmarks will be conducted on hardware embedded platforms sitting in a farm-like environment (redundancy of tests, multiple SoCs/boards) with the intent of validating and comparing different technologies. - -Accounting for all of this, in this REP, we adopt a grey-box and non-functional benchmarking approach for hardware acceleration that allows to evaluate the relative performance of accelerated ROS 2 individual nodes and complete computational graphs. To realize it in a technology agnostic-manner, we select the Linux Tracing Toolkit next generation (`LTTng `_) which will be used for tracing and benchmarking. - -Differences between tracing and benchmarking -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Tracing and benchmarking can be defined as follows: - -- `tracing`: a technique used to understand what goes on in a running software system. -- `benchmarking`: a method of comparing the performance of various systems by running a common test. - -From these definitions, inherently one can determine that both benchmarking and tracing are connected in the sense that the test/benchmark will use a series of measurements for comparison. These measurements will come from tracing probes. In other words, tracing will collect data that will then be fed into a benchmark program for comparison. - - -Methodology for ROS 2 Hardware Acceleration -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Methodology for developing ROS 2 packages that integrate hardware acceleration +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :: @@ -304,227 +267,6 @@ For the sake of illustrating maintainers and consumers how to build their own ac In turn, a CI system will be set to build periodically and for every commit the meta-package. -.. _capabilities: - -Capabilities -^^^^^^^^^^^^ - -The following list describes the hardware acceleration capabilities that hardware acceleration vendors must consider when connecting their firmware and technology solutions to the ROS 2 ecosystem. - -.. _Kernel Levels Capabilities: - -1. **Kernel Levels:** - - .. _1.i: - - i. Must provide at least one package producing `Level I kernels`_ - - .. _1.ii: - - ii. Must provide at least one package producing `Level II kernels`_ - - .. _1.iii: - - iii. Must provide at least one package producing `Level III kernels`_ - - -.. _Build System Enhancements: - -2. **Build System Enhancements:** - - .. _2.i: - - i. Must have an ``ament_`` (e.g. ``ament_vitis``) package that integrates technology-specific frameworks and software platforms into the ROS 2 build system through a series of CMake macros that maintainers can use to generate acceleration kernels from their `CMakeLists.txt` files. - - .. _2.ii: - - ii. Must extend the generic ``ament_acceleration`` CMake macros to support the corresponding technology-specific frameworks and software platforms with appropriate defaults. - - -.. _Build Tools Enhancements: - -3. **Build Tools Enhancements:** - - .. _3.i: - - i. Provides ROS 2 meta build tool (colcon) extensions to facilitate hardware acceleration. - - -.. _benchmarking_capability: - -4. **Benchmarking:** - - .. _4.i: - - i. Must provide tools to assess the relative performance of an acceleration kernel in an isolated manner - - .. _4.ii: - - ii. Must provide tools to assess the relative performance of an acceleration kernel alongside a ROS 2 computational graph by following benchmarking_ - - -.. _Documentation: - -5. **Documentation:** - - .. _5.i: - - i. Must have in-code documentation for each subverb "feature" (e.g. for ``list``: List supported firmware for hardware acceleration) - - - -.. _Testing and CI: - -6. **Testing and CI:** - - .. _6.i: - - i. Must run (and pass, often) CI tests for `acceleration examples`_. - - - -.. _categories: - -Categories -^^^^^^^^^^ - -There are 4 hardware acceleration categories below which will classify hardware acceleration solutions and technologies, each roughly summarized as: - -* Category **Official**: - - * Highest level of support, backed by a vendor - * Hardware acceleration solution compliant with this REP and fully integrated in the ROS 2 ecosystem. - * Developer tools available to facilitate the development process. - * All `acceleration examples`_ should run on the hardware acceleration solution. - * Acceleration kernels available for multiple `Kernel Levels`_, with at least `Level I kernels`_. - -* Category **Community**: - - * Community-level support - * Hardware acceleration solution compliant with only a subset of this REP. - * A subset of the developer tools available. - * Acceleration kernels available for at least `Level I kernels`_. - -* Category **Compatible**: - - * Interoperability demonstrated. - * Hardware acceleration solution compliant with a lesser subset of this REP. - * Some developer tools available. - * Acceleration kernels available for at least `Level I kernels`_. - -* Category **Non-compatible**: - - * Default category - -While each category will have different capabilities, it's always possible to overachieve in certain capabilities, even if other capabilities prevent a package from moving up to the next category level. - - -Category Comparison Chart -^^^^^^^^^^^^^^^^^^^^^^^^^ - -The chart below compares Quality Levels 1 through 5 relative to the 'Level 1' requirements' numbering scheme above. - -✓ = required - -● = recommended - -○ = optional - -.. list-table:: Categories - :widths: 7 7 7 7 7 - :header-rows: 1 - :stub-columns: 1 - :align: left - - - * - - - Official - - Community - - Compatible - - Non-compatible - * - Kernel Levels - - - - - - - - - * - `1.i`_ - - ✓ - - ✓ - - ✓ - - - * - 1.ii_ - - ● - - ● - - ○ - - - * - 1.iii_ - - ● - - ○ - - ○ - - - * - Build System - - - - - - - - - * - 2.i_ - - ✓ - - ✓ - - ✓ - - - * - 2.ii_ - - ● - - ● - - ○ - - - * - Build Tools - - - - - - - - - * - 3.i_ - - ● - - ● - - ○ - - - * - Benchmarking - - - - - - - - - * - 4.i_ - - ● - - ● - - ○ - - - * - 4.ii_ - - ● - - ● - - ○ - - - * - Documentation - - - - - - - - - * - 5.i_ - - ✓ - - ✓ - - ● - - - * - Testing and CI - - - - - - - - - * - 6.i_ - - ✓ - - ● - - ○ - - - - - Backwards Compatibility ======================= The proposed features and conventions add new functionality while not modifying existing functionality. @@ -533,7 +275,7 @@ The proposed features and conventions add new functionality while not modifying Reference Implementation and recommendations ============================================ -Reference implementations complying with this REP and extending the ROS 2 build system and tools for hardware acceleration are available at the `Hardware Accelerationg WG GitHub organization `_. This includes also the abstraction layer `ament_acceleration `_ and firmware from vendor specializalizations like `ament_vitis `_. A paper describing in more detail the reference implementation is available at [11]_. +Reference implementations complying with this REP and extending the ROS 2 build system and tools for hardware acceleration are available at the `Hardware Accelerationg WG GitHub organization `_. This includes also the abstraction layer `ament_hardware_acceleration `_ and firmware from vendor specializalizations like `ament_vitis `_. A paper describing in more detail the reference implementation is available at [11]_. ``colcon`` ROS 2 meta built tools can be extended to help integrate hardware acceleration flows into the ROS 2 CLI tooling. Examples of these extensions include emulation capabilities to speed-up the development process and/or facilitate it without access to the real hardware, or raw image production tools, which are convenient when packing together acceleration kernels for embedded targets. A reference implementation of these extensions is implemented at the `colcon-hardware-acceleration `_ ROS 2 package, which is available in the buildfarm. Refer to the package for more details on its capabilities. @@ -553,13 +295,11 @@ References and Footnotes .. [1] Z. Wan, B. Yu, T. Y. Li, J. Tang, Y. Zhu, Y. Wang, A. Raychowdhury, and S. Liu, “A survey of fpga-based robotic computing,” IEEE Circuits and Systems Magazine, vol. 21, no. 2, pp. 48–74, 2021. +.. [2] Mayoral-Vilches, V., & Corradi, G. (2021). "Adaptive computing in robotics, leveraging ros 2 to enable software-defined hardware for fpgas". + https://www.xilinx.com/support/documentation/white_papers/wp537-adaptive-computing-robotics.pdf -.. [2] OpenCL 1.2 API and C Language Specification (November 14, 2012). - https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf - - -.. [3] OpenCL 1.2 Reference Pages. - https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/ +.. [3] Mayoral-Vilches, V. (2021). "Kria Robotics Stack". + https://www.xilinx.com/content/dam/xilinx/support/documentation/white_papers/wp540-kria-robotics-stack.pdf .. [4] Y. Yang and T. Azumi, “Exploring real-time executor on ros 2,”. @@ -581,11 +321,14 @@ References and Footnotes .. [8] Acceleration Robotics, "Hardware accelerated ROS 2 pipelines and towards the Robotic Processing Unit (RPU)". https://news.accelerationrobotics.com/hardware-accelerated-ros2-pipelines/ -.. [9] Mayoral-Vilches, V., & Corradi, G. (2021). "Adaptive computing in robotics, leveraging ros 2 to enable software-defined hardware for fpgas". - https://www.xilinx.com/support/documentation/white_papers/wp537-adaptive-computing-robotics.pdf -.. [10] Mayoral-Vilches, V. (2021). "Kria Robotics Stack". - https://www.xilinx.com/content/dam/xilinx/support/documentation/white_papers/wp540-kria-robotics-stack.pdf +.. [9] OpenCL 1.2 API and C Language Specification (November 14, 2012). + https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf + + +.. [10] OpenCL 1.2 Reference Pages. + https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/ + .. [11] Mayoral-Vilches, V., Neuman, S. M., Plancher, B., & Reddi, V. J. (2022). "RobotCore: An Open Architecture for Hardware Acceleration in ROS 2". https://arxiv.org/pdf/2205.03929.pdf From ddd5a0e73d508a166b30167610bbf114cca8b9f4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=ADctor=20Mayoral=20Vilches?= Date: Thu, 13 Oct 2022 23:08:57 +0200 Subject: [PATCH 20/24] TSC review pass 2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Addressed most concerns. Ended up re-writing the REP to a certain extend to enhance readibility and improve its clarity. Main changes include: - removed categories and capabilities - removed completely benchmarking, pushed into REP-2014 - significant rewriting of the intro and motivation - renamed ament_acceleration to ament_hardware_acceleration - expanded on pillars and further explained its value - re-organized kernel levels - removed all mentions to CI Signed-off-by: Víctor Mayoral Vilches --- rep-2008.rst | 153 +++++++++++++++++++++++++++++++++------------------ 1 file changed, 99 insertions(+), 54 deletions(-) diff --git a/rep-2008.rst b/rep-2008.rst index ccecb21e1..ab8648646 100644 --- a/rep-2008.rst +++ b/rep-2008.rst @@ -29,13 +29,15 @@ Hardware acceleration can revolutionize ROS 2 computations, enabling new applica To do so, the architecture proposed extends the ROS 2 build system (``ament``), optionally extends the ROS 2 build tools (``colcon``) and adds a new firmware pillar to simplify the production and deployment of acceleration kernels. The architecture is agnostic to the computing target (i.e. considers support for edge, workstation, data center or cloud targets), technology-agnostic (considers initial support for FPGAs and GPUs), application-agnostic and modular, which enhances portability to new accelerators across silicon vendors. The core components of the architecture are implemented and open sourced under an Apache 2.0 license, and maintained at the `ROS 2 Hardware Acceleration Working Group GitHub organization `_. +.. _stakeholders: + Value for stakeholders: - Package maintainers can use these guidelines to integrate hardware acceleration capabilities in their ROS 2 packages in an accelerator-agnostic manner. - Consumers can use the guidelines in the REP, as well as the corresponding benchmarks of each accelerator (if available), to set expectations on the hardware acceleration capabilities, *ease of use* and scalability that could be obtained from each vendor's hardware acceleration solution. -- Silicon vendors and solution manufacturers can use these guidelines to connect their accelerators and firmware (including frameworks, tools and libraries) to the ROS 2 ecosystem. By doing so, they will obtain direct support for hardware acceleration in all ROS 2 packages that support it. +- Silicon vendors and hardware solution manufacturers can use these guidelines to connect their accelerators and firmware (including frameworks, tools and libraries) to the ROS 2 ecosystem. By doing so, they will obtain direct support for hardware acceleration in all ROS 2 packages that support it. The outcome of this REP should be that maintainers who want to leverage hardware acceleration in their packages, can do so with consistent guidelines and with support across multiple hardware acceleration technologies (FPGAs and GPUs initially, but extending towards more accelerators in the future) by following the conventions set. This way, maintainers will be able to create ROS 2 packages with support for hardware acceleration that can run across accelerators. In turn, as silicon vendors add support for their technologies and accelerators, ROS 2 packages aligned with this REP will automatically gain support for these new accelerators. @@ -50,7 +52,7 @@ The motivation behind this REP does **not** include: * Dictating policy implementation specifics on maintainers to achieve hardware acceleration and/or create acceleration kernels (e.g use of CUDA, HLS, OpenCL, or similar) - * Policy requirements (which framework to use, specific C++ preprocessor instructions, etc.) are *intentionally generic* and to remain technology-agnostic (from a hardware acceleration perspective). + * Policy requirements about which framework to use (e.g. ROCm, JetPack, etc.) or wihch specific C++ preprocessor instructions are *intentionally generic* and to remain technology-agnostic. * Maintainers can come up with their own policies depending on the technology targeted (FPGAs, GPUs, etc.) and framework used (``HLS``, ``ROCm``, ``CUDA``, etc). @@ -59,11 +61,33 @@ The motivation behind this REP does **not** include: * No maintainer is required to target any specific hardware solution by any of the guidelines in this REP. * Instead, maintainers are free to choose from either using the proposed generic CMake macros meant to support all accelerators (e.g. ``acceleration_kernel`` ), or instead use vendor-specific CMake extensions for particular supported technologies and hardware solutions (e.g. ``vitis_acceleration_kernel`` ). +.. _Kernel Levels: + +Kernel levels in ROS 2 +^^^^^^^^^^^^^^^^^^^^^^^ +To favour modularity, organize acceleration kernels and allow robotics architects to select only those accelerators needed to meet the requirements of their application, acceleration kernels in ROS 2 shall be classified in 3 levels according to the ROS layer/underlayer they impact: + +.. _Level I kernels: + +- *Level 1* - **ROS 2 applications and libraries**: This group corresponds with acceleration kernels that speed-up OSI L7 applications or libraries on top of ROS 2. Any computation on top of ROS 2 is a good a candidate for this category. Examples include selected components in the navigation, manipulation, perception or control stacks. + +.. _Level II kernels: + +- *Level 2* - **ROS 2 core packages**: This includes kernels that accelerate or offload OSI L7 ROS 2 core components and tools to a dedicated acceleration solution (e.g. an FPGA). Namely, we consider ``rclcpp``, ``rcl``, ``rmw``, and the corresponding ``rmw_adapters`` for each supported communication middleware. Examples includes ROS 2 executors for more deterministic behaviours [4]_, or complete hardware offloaded ROS 2 Nodes [5]_. + +.. _Level III kernels: + +- *Level 3* - **ROS 2 underlayers**: Groups together all accelerators below the ROS 2 core layers belonging to OSI L2-L7, including the communication middleware (e.g. DDS). Examples of such accelerators include a complete or partial DDS implementation, an offloaded networking stack or a data link layer for real-time deterministic, low latency and high throughput interactions. + +Hardware acceleration solutions complying with this REP should aspire to support multiple kernel levels in ROS 2 to maximize consumer experience. + Architecture pillars ==================== -Unless stated otherwise, the hardware acceleration terminology used in this document follows the OpenCL nomenclature ([9]_, [10]_) for hardware acceleration. The proposed architecture in this REP is depicted below: +Unless stated otherwise, the hardware acceleration terminology used in this document follows the OpenCL nomenclature ([8]_, [9]_) for hardware acceleration. Hardware acceleration commercial solutions are also called *accelerators*. The proposed architecture in this REP is depicted below: + +.. _architecture: :: @@ -124,7 +148,7 @@ Under the hood, each specialization of ``ament_hardware_acceleration`` should re -While ``ament_hardware_acceleration`` CMake macros are preferred and encouraged, maintainers are free to choose among all the CMake macros available within each of the specializalizations. After all, it'll be hard to define a generic set of macros that fits all use cases across technologies and silicon vendors. Maintainers are, however, encouraged to produce ROS packages that consider various accelerators. To do so, each extension of the ``ament`` ROS 2 build system for hardware acceleration purposes shall define CMake hardware acceleration variables. These variables are meant to be used by the maintainers to conditionally compile their ROS 2 packages for specific accelerators. The following table lists the variables defined by ``ament_hardware_acceleration`` and its specializalization ``ament_vitis`` for hardware acceleration purposes. Other specializalizations should follow along the same lines: +While ``ament_hardware_acceleration`` CMake macros are preferred and encouraged, maintainers are free to choose among all the CMake macros available within each of the specializalizations. After all, it'll be hard to define a generic set of macros that fits all use cases across technologies and silicon vendors. Maintainers are, however, encouraged to produce ROS packages that consider various accelerators. To do so, each extension of the ``ament`` ROS 2 build system for hardware acceleration purposes shall define *CMake hardware acceleration variables*. These variables are meant to be used by the maintainers to conditionally compile their ROS 2 packages for specific accelerators. The following table lists the variables defined by ``ament_hardware_acceleration`` and its specializalization ``ament_vitis`` for hardware acceleration purposes. Other specializalizations should follow along the same lines: .. list-table:: @@ -136,18 +160,19 @@ While ``ament_hardware_acceleration`` CMake macros are preferred and encouraged, - description * - `ament_hardware_acceleration `_ - ``ROS_ACCELERATION`` - - This CMake variable will evaluate to ``True`` when targeting *any* of the supported ROS 2-enabled accelerators (see ``mixins`` enablement below). *Use while integrating acceleration kernels in a technology and vendor-agnostic manner*. + - This CMake variable will evaluate to ``True`` when targeting *any* of the supported ROS 2-enabled accelerators (see ``mixins`` enablement below). Use while integrating acceleration kernels in a technology and vendor-agnostic manner. * - `ament_vitis `_ (specializes ``ament_hardware_acceleration``) - ``ROS_VITIS`` - - *use to build ...* - * - + - This CMake variable will evaluate to ``True`` when targeting ROS 2-enabled accelerators that use the Vitis platform for hardware acceleration. + +.. * - - ``ROS_XRT`` - - *use to consider cases with XRT as the communication framework between the application code and the acceleration kernels* + - This CMake variable will evaluate to ``True`` if the Xilinx Runtime (XRT) Library is enabled in the build system. This variable is meant to be used by the maintainers to conditionally compile their ROS 2 packages for acceleration kernels that leverage XRT. Through ``ament_hardware_acceleration`` and technology-specific specializations (like ``ament_vitis``), the ROS 2 build system is automatically enhanced to support producing acceleration kernel and related artifacts as part of the build process when invoking ``colcon build``. To facilitate the work of maintainers, this additional functionality is configurable through ``mixins`` that can be added to the build step of a ROS 2 workspace, triggering all the hardware acceleration logic only when appropriate (e.g. when ``--mixin kv260`` is appended to the build effort, it'll trigger the build of kernels targeting the KV260 hardware solution). For a reference implementation of these enhacements, refer to `ament_vitis `_. -In turn, extensions to the existing CMake macros might be proposed which would allow to support more technologies and hardware solutions. For example, ``ament_vitis`` provides a ``vitis_acceleration_kernel`` macro that can be used to generate kernels for the Xilinx Vitis platform. Similarly, ``ament_rocm`` could provide a ``rocm_acceleration_kernel`` macro that can be used to generate kernels for the AMD ROCm platform. This way, maintainers can choose to use the generic ``acceleration_kernel`` macro, or instead use the technology-specific macros to target specific hardware solutions. Also, future extensions to the ``ament_hardware_acceleration`` package could be proposed to support the use of accelerators in binary formats, instead of build them from source code. This would allow to support accelerators which are are not fully integrated into ROS 2 through their corresponding technology libraries (e.g. FPGAs that are not supported by Vitis). +In turn, additional extensions to the existing CMake macros might be proposed which would allow to support more technologies and hardware solutions. For example, ``ament_vitis`` provides a ``vitis_acceleration_kernel`` macro that can be used to generate kernels for the Xilinx Vitis platform. Similarly, ``ament_rocm`` could provide a ``rocm_acceleration_kernel`` macro that can be used to generate kernels for the AMD ROCm platform. This way, maintainers can choose to use the generic ``acceleration_kernel`` macro, or instead use the technology-specific macros to target specific hardware solutions. Also, future extensions to the ``ament_hardware_acceleration`` package could be proposed to support the use of accelerators in binary formats, instead of build them from source code. This would allow to support accelerators which are are not fully integrated into ROS 2 through their corresponding technology libraries (e.g. FPGAs that are not supported by Vitis). .. _pillarII: @@ -157,48 +182,77 @@ Pillar II - firmware The second pillar is firmware, it is meant to provide firmware artifacts for each supported technology so that the kernels can be compiled against them, simplifying the process for consumers and maintainers, and further aligning with the ROS typical development flow. -Each ROS 2 workspace can have one or multiple firmware packages deployed. The selection of the active firmware in the workspace is performed by the ``colcon acceleration select`` subverb (pillarII_). To get a technology solution aligned with this REP's architecture, each vendor should provide and maintain an ``acceleration_firmware_`` package specialization that delivers the corresponding artifacts. Firmware artifacts should be deployed at ``/acceleration/firmware/`` and be ready to be used by the ROS 2 build system extensions at (pillarI_) . For a reference implementation of specialized vendor firmware package, see `acceleration_firmware_kv260 `_. +Each ROS 2 workspace can have one or multiple firmware packages deployed. The selection of the active firmware in the workspace is performed by simlinking the preferred one or by using auxiliary tools that extend the ``colcon`` ROS 2 meta-build system (see the `colcon-hardware-acceleration `_ ROS package and with it, the ``colcon acceleration select`` subverb). To get a technology solution aligned with this REP's architecture, each vendor should provide and maintain an ``acceleration_firmware_`` package specialization that delivers the corresponding artifacts. Firmware artifacts should be deployed at ``/acceleration/firmware/`` and be ready to be used by the ROS 2 build system extensions at (pillarI_) . For a reference implementation of specialized vendor firmware package, see `acceleration_firmware_kv260 `_. -By splitting vendors across packages, consumers and maintainers can easily switch between hardware acceleration solutions. +By splitting vendors across packages, consumers and maintainers can easily switch between hardware acceleration solutions within the same workspace. .. _specification: Specification ============= +A ROS 2 package supports hardware acceleration if it provides support for at least one of the *supported hardware acceleration commercial solutions* (or *accelerators*) that comply with this REP. An accelerator complies with this REP if it aligns with the open source architecture_ for hardware acceleration proposed in this REP. The architecture proposed is composed of two pillars. The first one comprehends ROS 2 build system extensions (``ament``) to support various accelerators (pillarI_) as build targets. The second one introduces firmware additions into ROS 2 workspaces (pillarII_), which abstract away vendor-specific hardware acceleration frameworks, libraries and tools while building acceleration kernels for each technology/accelerator. An accelerator complying with this architecture should implement pillarI_ and pillarII_. ``colcon`` mixins are suggested as the triggering method for enabling ``ament`` build extensions. Given ``accelerator_A`` which implements the architecture through its two pillars, building ROS ``package_1`` for ``accelerator_A`` would look like this: -**TODO: discuss the use of mixins, and CMake variables to enable/disable the build of acceleration kernels**. +:: -To drive the creation, maintenance and testing of acceleration kernels in ROS 2 packages that are agnostic to the computing target (i.e. consider support for edge, workstation, data center or cloud targets) and technology-agnostic (considers initial support for FPGAs and GPUs), this REP builds on top of open standards. Particularly, ``OpenCL 1.2`` ([9]_, [10]_) is encouraged for a well established standardized interoperability between the host-side (CPU) and the acceleration kernel. + colcon build --mixin accelerator_A --packages-select package_1 -A ROS 2 package supports hardware acceleration if it provides support for at least one of the *supported hardware acceleration solutions* that comply with this REP. +Building the same package for another accelerator, ``accelerator_B``, would look like this: -A hardware acceleration solution from a given vendor is *supported* if it at least has a `Compatible` category. +:: + colcon build --mixin accelerator_B --packages-select package_1 -.. _Kernel Levels: -Kernel levels in ROS 2 -^^^^^^^^^^^^^^^^^^^^^^^ -To favour modularity, organize kernels and allow robotics architects to select only those accelerators needed to meet the requirements of their application, acceleration kernels in ROS 2 will be classified in 3 levels according to the ROS layer/underlayer they impact: +Separating the build and install directories for each accelerator is encouraged as each would rely on different firmware and potentially different cross-compilers. This would allow to build and install ROS 2 packages for different accelerators in the same workspace. -.. _Level I kernels: +:: -- *Level 1* - **ROS 2 applications and libraries**: This group corresponds with acceleration kernels that speed-up OSI L7 applications or libraries on top of ROS 2. Any computation on top of ROS 2 is a good a candidate for this category. Examples include selected components in the navigation, manipulation, perception or control stacks. + # build for accelerator_A + colcon build --merge-install --build-base=build-accelerator-A --install-base=install-accelerator-A --merge-install --mixin accelerator_A --packages-select package_1 -.. _Level II kernels: - -- *Level 2* - **ROS 2 core packages**: This includes kernels that accelerate or offload OSI L7 ROS 2 core components and tools to a dedicated acceleration solution (e.g. an FPGA). Namely, we consider ``rclcpp``, ``rcl``, ``rmw``, and the corresponding ``rmw_adapters`` for each supported communication middleware. Examples includes ROS 2 executors for more deterministic behaviours [4]_, or complete hardware offloaded ROS 2 Nodes [5]_. + # build for accelerator_B + colcon build --merge-install --build-base=build-accelerator-B --install-base=install-accelerator-B --merge-install --mixin accelerator_B --packages-select package_1 -.. _Level III kernels: +The architecture proposed in this REP is meant to be generic and technology-agnostic, so that it can be extended to support other hardware acceleration technologies and hardware solutions in the future while bringing value to all stakeholders_. Silicon vendors and hardware solution manufacturers aiming to integrate their solutions in the ROS 2 ecosystem are encouraged to provide themselves their own technology-specific extensions to the ROS 2 build system (pillarII_) and firmware packages (pillarI_). This way, consumers and maintainers can easily adopt their technologies to build kernels, and switch between hardware acceleration solutions within the same ROS workspace. -- *Level 3* - **ROS 2 underlayers**: Groups together all accelerators below the ROS 2 core layers belonging to OSI L2-L7, including the communication middleware (e.g. DDS). Examples of such accelerators include a complete or partial DDS implementation, an offloaded networking stack or a data link layer for real-time deterministic, low latency and high throughput interactions. +Package maintainers are encouraged to integrate hardware acceleration in their ROS 2 packages by using the *CMake hardware acceleration variables* and the ``acceleration_kernel`` CMake macro, or any of its specializations. These CMake macros are meant to be used by maintainers to conditionally compile their ROS 2 packages and acceleration kernels for those accelerators that implement the architecture_. This would like like this on the package's ``CMakeLists.txt``: -Hardware acceleration solutions complying with this REP should aspire to support multiple kernel levels in ROS 2 to maximize consumer experience. +:: + find_package(ament_hardware_acceleration) + find_package(ament_vitis QUIET) + + if(ROS_VITIS) # e.g. well tested acceleration kernel with a 100 MHz clock + # and a configuration that is known to work well + vitis_acceleration_kernel( + NAME my_acceleration_kernel + FILE my_acceleration_kernel.cpp + CONFIG config.cfg + CLOCK 100000000:my_kernel_main + INCLUDE + include + TYPE + hw + LINK + PACKAGE + ) + elseif(ROS_ACCELERATION) # best effort acceleration kernel for the supported + # (and targeted, via mixins) accelerator + acceleration_kernel( + NAME my_acceleration_kernel + FILE my_acceleration_kernel.cpp + INCLUDE + include + ) + endif() # ROS_VITIS, ROS_ACCELERATION + + ament_package() + +To further support package maintainers and consumers, this REP also proposes a methodology to analyze a ROS 2 application and integrate hardware acceleration. Developers are encouraged to follow this methodology while developing acceleleration kernels. Examples demonstrating the use of this methodology for hardware acceleration are available in [6]_ and [7]_. Methodology for developing ROS 2 packages that integrate hardware acceleration -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :: @@ -239,33 +293,28 @@ Methodology for developing ROS 2 packages that integrate hardware acceleration re-instrument +This can be summarized as follows: +- 1. instrument the core components of a ROS 2 application using `LTTng `_ and trace its execution to obtain information about its time bottlenecks. Refer to `ros2_tracing `_ for tools, documentation and ROS 2 core layers tracepoints; +- 2. trace, benchmark and analyze the ROS 2 application on the CPU to establish a compute baseline; Determine which functions are subject to be hardware accelerated. + +- 3. develop a hardware acceleration kernels for those functions identified before, and optimize the dataflow across Nodes + - 3.1 accelerate computations at the Node or Component level for each one of those functions identified in 2. as good candidates. + - 3.2 accelerate inter-Node exchanges and reduce the overhead of the ROS 2 message-passing system across all its abstraction layers. +- 4. trace, benchmark and analyze the ROS 2 application including the acceleration kernels and compare against the CPU baseline obtained in 2. to determine the performance improvement. Re-iterate 3. and 4. until the desired performance is achieved. -The following proposes a methodology to analyze a ROS 2 application and design appropriate acceleration: - -1. instrument both the core components of ROS 2 and the target kernels using `LTTng `_. Refer to `ros2_tracing `_ for tools, documentation and ROS 2 core layers tracepoints; -2. trace and benchmark the kernels on the CPU to establish a compute baseline; -3. develop a hardware accelerated implementation on alternate hardware (e.g., GPU, FPGA, etc): - - - **3.1** accelerate computations at the Node or Component level for each one of those identified in **2.** as good candidates. - - **3.2** accelerate inter-Node exchanges and reduce the overhead of the ROS 2 message-passing system across all its abstraction layers. - -4. trace, benchmark against the CPU baseline, and improve the accelerated implementation. - -The proposed ROS 2 methodology for hardware acceleration is demonstrated in [7]_ and [8]_. +Details about tracing and benchmarking are beyond the scope of this REP. For more details about tracing and benchmarking, see REP-2014. .. _acceleration examples: Acceleration examples ^^^^^^^^^^^^^^^^^^^^^ -For the sake of illustrating maintainers and consumers how to build their own acceleration kernels and guarantee interoperability across technologies, a ROS 2 meta-package named `acceleration_examples `_ will be maintained and made available. This meta-package will contain various packages with simple common acceleration examples. Each one of these examples should support all hardware acceleration solutions complying with this REP. - -In turn, a CI system will be set to build periodically and for every commit the meta-package. +For the sake of illustrating maintainers and consumers how to build their own acceleration kernels and guarantee interoperability across technologies, a ROS 2 meta-package named `acceleration_examples `_ will be maintained and made available. This meta-package will contain various packages with simple common acceleration examples. Each one of these examples should aim to support all hardware acceleration solutions complying with this REP. Backwards Compatibility ======================= @@ -275,9 +324,9 @@ The proposed features and conventions add new functionality while not modifying Reference Implementation and recommendations ============================================ -Reference implementations complying with this REP and extending the ROS 2 build system and tools for hardware acceleration are available at the `Hardware Accelerationg WG GitHub organization `_. This includes also the abstraction layer `ament_hardware_acceleration `_ and firmware from vendor specializalizations like `ament_vitis `_. A paper describing in more detail the reference implementation is available at [11]_. +Reference implementations complying with this REP and extending the ROS 2 build system and tools for hardware acceleration are available at the `Hardware Accelerationg WG GitHub organization `_. This includes also the abstraction layer `ament_hardware_acceleration `_ and firmware from vendor specializalizations like `ament_vitis `_. A paper describing in more detail the reference implementation is available at [10]_. -``colcon`` ROS 2 meta built tools can be extended to help integrate hardware acceleration flows into the ROS 2 CLI tooling. Examples of these extensions include emulation capabilities to speed-up the development process and/or facilitate it without access to the real hardware, or raw image production tools, which are convenient when packing together acceleration kernels for embedded targets. A reference implementation of these extensions is implemented at the `colcon-hardware-acceleration `_ ROS 2 package, which is available in the buildfarm. Refer to the package for more details on its capabilities. +``colcon`` ROS 2 meta built tools can be extended to help integrate hardware acceleration flows into the ROS 2 CLI tooling. Examples of these extensions include emulation capabilities to speed-up the development process and/or facilitate it without access to the real hardware, or raw image production tools, which are convenient when packing together acceleration kernels for embedded targets. A reference implementation of these extensions is implemented at the `colcon-hardware-acceleration`_ ROS 2 package, which is available in the buildfarm. Refer to the package for more details on its capabilities. For additional implementations and recommendations, check out the `Hardware Accelerationg WG GitHub organization `_. @@ -310,27 +359,23 @@ References and Footnotes https://arxiv.org/pdf/2107.07208.pdf -.. [6] A. Pemmaiah​, D. Pangercic, D. Aggarwal, K. Neumann, K. Marcey, "Performance Testing in ROS 2". - https://drive.google.com/file/d/15nX80RK6aS8abZvQAOnMNUEgh7px9V5S/view - - -.. [7] "Methodology for ROS 2 Hardware Acceleration". ros-acceleration/community #20. ROS 2 Hardware Acceleration Working Group. +.. [6] "Methodology for ROS 2 Hardware Acceleration". ros-acceleration/community #20. ROS 2 Hardware Acceleration Working Group. https://github.com/ros-acceleration/community/issues/20 -.. [8] Acceleration Robotics, "Hardware accelerated ROS 2 pipelines and towards the Robotic Processing Unit (RPU)". +.. [7] Acceleration Robotics, "Hardware accelerated ROS 2 pipelines and towards the Robotic Processing Unit (RPU)". https://news.accelerationrobotics.com/hardware-accelerated-ros2-pipelines/ -.. [9] OpenCL 1.2 API and C Language Specification (November 14, 2012). +.. [8] OpenCL 1.2 API and C Language Specification (November 14, 2012). https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf -.. [10] OpenCL 1.2 Reference Pages. +.. [9] OpenCL 1.2 Reference Pages. https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/ -.. [11] Mayoral-Vilches, V., Neuman, S. M., Plancher, B., & Reddi, V. J. (2022). "RobotCore: An Open Architecture for Hardware Acceleration in ROS 2". +.. [10] Mayoral-Vilches, V., Neuman, S. M., Plancher, B., & Reddi, V. J. (2022). "RobotCore: An Open Architecture for Hardware Acceleration in ROS 2". https://arxiv.org/pdf/2205.03929.pdf From 4732bff43545ad66044f8bfc48963a49bfc2f5a0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=ADctor=20Mayoral=20Vilches?= Date: Wed, 2 Nov 2022 06:44:34 +0100 Subject: [PATCH 21/24] Add pointers to REP-2007 and REP-2009 Following from community feedback, these changes hint ROS users that inter-Node data exchange can be further optimized by following leveraging REP-2007 and REP-2009. It must be noted though that each accelerator (and associated host CPUs) may presents various other opportunities which are specific to that architecture. Such aspects shouldn't be discarded and would often lead to best optimizations. --- rep-2008.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rep-2008.rst b/rep-2008.rst index ab8648646..cd91913e8 100644 --- a/rep-2008.rst +++ b/rep-2008.rst @@ -302,7 +302,7 @@ This can be summarized as follows: - 3. develop a hardware acceleration kernels for those functions identified before, and optimize the dataflow across Nodes - 3.1 accelerate computations at the Node or Component level for each one of those functions identified in 2. as good candidates. - - 3.2 accelerate inter-Node exchanges and reduce the overhead of the ROS 2 message-passing system across all its abstraction layers. + - 3.2 accelerate inter-Node exchanges and reduce the overhead of the ROS 2 message-passing system across all its abstraction layers. To do so, consider leveraging REP-2007 and REP-2009. - 4. trace, benchmark and analyze the ROS 2 application including the acceleration kernels and compare against the CPU baseline obtained in 2. to determine the performance improvement. Re-iterate 3. and 4. until the desired performance is achieved. From 71120c922e83da25505a3fa0efb088ec50716bba Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=ADctor=20Mayoral=20Vilches?= Date: Sat, 5 Nov 2022 20:00:50 +0100 Subject: [PATCH 22/24] Adjust text to make it OS-agnostic MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Removed pointers to OS-specific tracers and pushed tracing and benchmarking related topics to REP-2014. Signed-off-by: Víctor Mayoral Vilches Co-authored-by: Daniel Rosenstein --- rep-2008.rst | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/rep-2008.rst b/rep-2008.rst index cd91913e8..11921454e 100644 --- a/rep-2008.rst +++ b/rep-2008.rst @@ -25,9 +25,9 @@ Hardware acceleration can revolutionize ROS 2 computations, enabling new applica .. This REP aims to provide a vendor-neutral, scalable and technology-agnostic approach to hardware acceleration in ROS 2 that can be easily adopted by roboticists and researchers alike. -**The purpose of this REP is thereby to provide standard guidelines on how to use hardware acceleration in combination with ROS 2**. These guidelines are realized in the form of a series of ROS 2 packages that implement an open architecture and abstract away these external resources providing a vendor-neutral, scalable and technology-agnostic ROS 2-centric development experience for hardware acceleration. The proposed open architecture also allows for the integration of external frameworks, tools and libraries, so that changing from one accelerator to another becomes trivial. This REP also recommends a methodology for developing ROS 2 packages that integrate hardware acceleration which can help developers guide their efforts. **This REP does not aim to dictate policy on which frameworks or languages shall be used to create acceleration kernels, nor it aims to abstract away from these technologies**. This decision is left to the architect building the acceleration. For example, the use of OpenCL, CUDA, Vulkan, HLS, Halide, Exo, etc. is beyond the scope of this REP and often attached to the specific accelerator in-use, as each silicon vendor supports only certain technologies. What this REP provides is an entry point for developers to define their own acceleration kernels, integrate them into ROS 2 applications and if appropriate, command the ROS 2 build system to build such accelerators. +**The purpose of this REP is thereby to provide standard guidelines on how to use hardware acceleration in combination with ROS 2**. These guidelines are realized in the form of a series of ROS 2 packages that implement an open architecture and abstract away these external resources providing a vendor-neutral, Operating System (OS) and technology-agnostic ROS 2-centric development experience for hardware acceleration. The proposed open architecture also allows for the integration of external frameworks, tools and libraries, so that changing from one accelerator to another becomes trivial. This REP also recommends a methodology for developing ROS 2 packages that integrate hardware acceleration which can help developers guide their efforts. **This REP does not aim to dictate policy on which OS, frameworks or languages shall be used to create acceleration kernels, nor it aims to abstract away from these technologies**. This decision is left to the architect building the acceleration. For example, the use of OpenCL, CUDA, Vulkan, HLS, Halide, Exo, etc. is beyond the scope of this REP and often attached to the specific accelerator in-use, as each silicon vendor supports only certain technologies. What this REP provides is an entry point for developers to define their own acceleration kernels, integrate them into ROS 2 applications and if appropriate, command the ROS 2 build system to build such accelerators. -To do so, the architecture proposed extends the ROS 2 build system (``ament``), optionally extends the ROS 2 build tools (``colcon``) and adds a new firmware pillar to simplify the production and deployment of acceleration kernels. The architecture is agnostic to the computing target (i.e. considers support for edge, workstation, data center or cloud targets), technology-agnostic (considers initial support for FPGAs and GPUs), application-agnostic and modular, which enhances portability to new accelerators across silicon vendors. The core components of the architecture are implemented and open sourced under an Apache 2.0 license, and maintained at the `ROS 2 Hardware Acceleration Working Group GitHub organization `_. +To do so, the architecture proposed extends the ROS 2 build system (``ament``), optionally extends the ROS 2 build tools (``colcon``) and adds a new firmware pillar to simplify the production and deployment of acceleration kernels. The architecture is agnostic to the computing target (i.e. considers support for edge, workstation, data center or cloud targets), OS and technology-agnostic (considers initial support for FPGAs and GPUs), application-agnostic and modular, which enhances portability to new accelerators across silicon vendors. The core components of the architecture are implemented and open sourced under an Apache 2.0 license, and maintained at the `ROS 2 Hardware Acceleration Working Group GitHub organization `_. .. _stakeholders: @@ -37,7 +37,7 @@ Value for stakeholders: - Consumers can use the guidelines in the REP, as well as the corresponding benchmarks of each accelerator (if available), to set expectations on the hardware acceleration capabilities, *ease of use* and scalability that could be obtained from each vendor's hardware acceleration solution. -- Silicon vendors and hardware solution manufacturers can use these guidelines to connect their accelerators and firmware (including frameworks, tools and libraries) to the ROS 2 ecosystem. By doing so, they will obtain direct support for hardware acceleration in all ROS 2 packages that support it. +- Silicon vendors, OS suppliers and hardware solution manufacturers can use these guidelines to connect their accelerators and firmware (including frameworks, tools and libraries) to the ROS 2 ecosystem. By doing so, they will obtain direct support for hardware acceleration in all ROS 2 packages that support it. The outcome of this REP should be that maintainers who want to leverage hardware acceleration in their packages, can do so with consistent guidelines and with support across multiple hardware acceleration technologies (FPGAs and GPUs initially, but extending towards more accelerators in the future) by following the conventions set. This way, maintainers will be able to create ROS 2 packages with support for hardware acceleration that can run across accelerators. In turn, as silicon vendors add support for their technologies and accelerators, ROS 2 packages aligned with this REP will automatically gain support for these new accelerators. @@ -61,6 +61,8 @@ The motivation behind this REP does **not** include: * No maintainer is required to target any specific hardware solution by any of the guidelines in this REP. * Instead, maintainers are free to choose from either using the proposed generic CMake macros meant to support all accelerators (e.g. ``acceleration_kernel`` ), or instead use vendor-specific CMake extensions for particular supported technologies and hardware solutions (e.g. ``vitis_acceleration_kernel`` ). +* Excluding any supported ROS 2 OS. + .. _Kernel Levels: Kernel levels in ROS 2 @@ -214,7 +216,7 @@ Separating the build and install directories for each accelerator is encouraged # build for accelerator_B colcon build --merge-install --build-base=build-accelerator-B --install-base=install-accelerator-B --merge-install --mixin accelerator_B --packages-select package_1 -The architecture proposed in this REP is meant to be generic and technology-agnostic, so that it can be extended to support other hardware acceleration technologies and hardware solutions in the future while bringing value to all stakeholders_. Silicon vendors and hardware solution manufacturers aiming to integrate their solutions in the ROS 2 ecosystem are encouraged to provide themselves their own technology-specific extensions to the ROS 2 build system (pillarII_) and firmware packages (pillarI_). This way, consumers and maintainers can easily adopt their technologies to build kernels, and switch between hardware acceleration solutions within the same ROS workspace. +The architecture proposed in this REP is meant to be generic and technology-agnostic (including OS), so that it can be extended to support other hardware acceleration technologies and hardware solutions in the future while bringing value to all stakeholders_. Silicon vendors and hardware solution manufacturers aiming to integrate their solutions in the ROS 2 ecosystem are encouraged to provide themselves their own technology-specific extensions to the ROS 2 build system (pillarII_) and firmware packages (pillarI_). This way, consumers and maintainers can easily adopt their technologies to build kernels, and switch between hardware acceleration solutions within the same ROS workspace. Package maintainers are encouraged to integrate hardware acceleration in their ROS 2 packages by using the *CMake hardware acceleration variables* and the ``acceleration_kernel`` CMake macro, or any of its specializations. These CMake macros are meant to be used by maintainers to conditionally compile their ROS 2 packages and acceleration kernels for those accelerators that implement the architecture_. This would like like this on the package's ``CMakeLists.txt``: @@ -289,13 +291,13 @@ Methodology for developing ROS 2 packages that integrate hardware acceleration | | | +-------+------+ | | | | +------+ | | - LTTng +--------------------+ + Tracer +--------------------+ re-instrument This can be summarized as follows: -- 1. instrument the core components of a ROS 2 application using `LTTng `_ and trace its execution to obtain information about its time bottlenecks. Refer to `ros2_tracing `_ for tools, documentation and ROS 2 core layers tracepoints; +- 1. instrument the core components of a ROS 2 application using a tracing framework and trace its execution to obtain information about its time bottlenecks. Refer to REP-2014 for guidelines on tracing and benchmarking performance in ROS 2, including examples of tracing frameworks and how to use them. - 2. trace, benchmark and analyze the ROS 2 application on the CPU to establish a compute baseline; Determine which functions are subject to be hardware accelerated. @@ -307,7 +309,7 @@ This can be summarized as follows: - 4. trace, benchmark and analyze the ROS 2 application including the acceleration kernels and compare against the CPU baseline obtained in 2. to determine the performance improvement. Re-iterate 3. and 4. until the desired performance is achieved. -Details about tracing and benchmarking are beyond the scope of this REP. For more details about tracing and benchmarking, see REP-2014. +Additional about tracing and benchmarking are beyond the scope of this REP. For more details about tracing and benchmarking, see REP-2014. .. _acceleration examples: From 337bb491241910c5e1e54672b40601008c6fd4cb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=ADctor=20Mayoral=20Vilches?= Date: Fri, 18 Nov 2022 14:20:10 +0100 Subject: [PATCH 23/24] Update rep-2008.rst Co-authored-by: Henning Kayser --- rep-2008.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rep-2008.rst b/rep-2008.rst index 11921454e..c1d164182 100644 --- a/rep-2008.rst +++ b/rep-2008.rst @@ -52,7 +52,7 @@ The motivation behind this REP does **not** include: * Dictating policy implementation specifics on maintainers to achieve hardware acceleration and/or create acceleration kernels (e.g use of CUDA, HLS, OpenCL, or similar) - * Policy requirements about which framework to use (e.g. ROCm, JetPack, etc.) or wihch specific C++ preprocessor instructions are *intentionally generic* and to remain technology-agnostic. + * Policy requirements about which framework to use (e.g. ROCm, JetPack, etc.) or which specific C++ preprocessor instructions are *intentionally generic* and to remain technology-agnostic. * Maintainers can come up with their own policies depending on the technology targeted (FPGAs, GPUs, etc.) and framework used (``HLS``, ``ROCm``, ``CUDA``, etc). From e370808559f839069af051a9daeb15d633c4427b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=ADctor=20Mayoral=20Vilches?= Date: Fri, 18 Nov 2022 14:20:38 +0100 Subject: [PATCH 24/24] Update rep-2008.rst Co-authored-by: Henning Kayser --- rep-2008.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rep-2008.rst b/rep-2008.rst index c1d164182..ca98088ca 100644 --- a/rep-2008.rst +++ b/rep-2008.rst @@ -174,7 +174,7 @@ While ``ament_hardware_acceleration`` CMake macros are preferred and encouraged, Through ``ament_hardware_acceleration`` and technology-specific specializations (like ``ament_vitis``), the ROS 2 build system is automatically enhanced to support producing acceleration kernel and related artifacts as part of the build process when invoking ``colcon build``. To facilitate the work of maintainers, this additional functionality is configurable through ``mixins`` that can be added to the build step of a ROS 2 workspace, triggering all the hardware acceleration logic only when appropriate (e.g. when ``--mixin kv260`` is appended to the build effort, it'll trigger the build of kernels targeting the KV260 hardware solution). For a reference implementation of these enhacements, refer to `ament_vitis `_. -In turn, additional extensions to the existing CMake macros might be proposed which would allow to support more technologies and hardware solutions. For example, ``ament_vitis`` provides a ``vitis_acceleration_kernel`` macro that can be used to generate kernels for the Xilinx Vitis platform. Similarly, ``ament_rocm`` could provide a ``rocm_acceleration_kernel`` macro that can be used to generate kernels for the AMD ROCm platform. This way, maintainers can choose to use the generic ``acceleration_kernel`` macro, or instead use the technology-specific macros to target specific hardware solutions. Also, future extensions to the ``ament_hardware_acceleration`` package could be proposed to support the use of accelerators in binary formats, instead of build them from source code. This would allow to support accelerators which are are not fully integrated into ROS 2 through their corresponding technology libraries (e.g. FPGAs that are not supported by Vitis). +In turn, additional extensions to the existing CMake macros might be proposed which would allow to support more technologies and hardware solutions. For example, ``ament_vitis`` provides a ``vitis_acceleration_kernel`` macro that can be used to generate kernels for the Xilinx Vitis platform. Similarly, ``ament_rocm`` could provide a ``rocm_acceleration_kernel`` macro that can be used to generate kernels for the AMD ROCm platform. This way, maintainers can choose to use the generic ``acceleration_kernel`` macro, or instead use the technology-specific macros to target specific hardware solutions. Also, future extensions to the ``ament_hardware_acceleration`` package could be proposed to support the use of accelerators in binary format instead of source builds. This would allow to support accelerators which are are not fully integrated into ROS 2 through their corresponding technology libraries (e.g. FPGAs that are not supported by Vitis). .. _pillarII: