You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ubuntu 20.04
Rolling
Both packages and my own build
Commit: 61fcc76
Independent of rmw implementation (tested both FastDDS and CycloneDDS).
Steps to reproduce issue
This was first reported and confirmed by me when recording a rosbag2 (which uses SingleThreadedExecutor). Investigating CPU load, I noticed that the majority of resource use is due to executor function get_next_executable(), meaning that even with empty callbacks (no actual work to execute) the CPU load remains very high (around 70% on my machine). The rosbag2 comes from a real (automotive) use case and amounts to about 4k executables (subscriptions) per second.
To reproduce, it should be enough to use the spin() function with enough traffic to ensure high amount of executables. Performance package in rosbag2 could be used to automate running of desired number of publishers.
Expected behavior
Executor should us less CPU for acquiring the next executable. This is important e. g. in the case of rosbag2 it affects how the recorded system performs.
Actual behavior
Executor has a high CPU consumption even when subscription callbacks are empty (just to acquire next executables).
Additional information
A partial work around is to use spin_some() or spin_all() followed by a short (e. g. 1 ms) sleep in a while (rclcpp::ok()) loop, instead of a spin().
Note that spin_once() with a similar sleep won't work as well since we want to execute a higher number of executables each second than it would permit.
With the case of ~4k executables per second, when calling wait_for_work, 3.3k calls to rcl_wait per each second are made, so it is mostly only one executable that is returned each time, which seems quite inefficient. I am not sure if this is by design (since it perhaps minimizes latency), certainly collections used to gather a bunch of executables for each rcl_wait call are underused.
When a 1 ms sleep is introduced after we miss the cache (before/after rcl_wait), only ~600 calls to rcl_wait per second are made while successfully executing the same number of callbacks per second.
Sleeping (and chrono steady clock) of course can have their behavior dependency on platform so it is hard to suggest this as an executor level change, but certainly a factor to be aware of.
Perhaps another type of executor (Events based) would be more suitable for this type of use-case / requirements for low CPU consumption. There is quite some work ongoing ros2/design#305.
The text was updated successfully, but these errors were encountered:
Thank you for the suggestion - it is a good idea and so it was something I checked right away looking through available executors. I checked the Static one, observed only minimal improvement in my case as described above, probably because it doesn't solve the core issue.
Report
Ubuntu 20.04
Rolling
Both packages and my own build
Commit: 61fcc76
Independent of rmw implementation (tested both FastDDS and CycloneDDS).
Steps to reproduce issue
This was first reported and confirmed by me when recording a rosbag2 (which uses SingleThreadedExecutor). Investigating CPU load, I noticed that the majority of resource use is due to executor function get_next_executable(), meaning that even with empty callbacks (no actual work to execute) the CPU load remains very high (around 70% on my machine). The rosbag2 comes from a real (automotive) use case and amounts to about 4k executables (subscriptions) per second.
To reproduce, it should be enough to use the spin() function with enough traffic to ensure high amount of executables. Performance package in rosbag2 could be used to automate running of desired number of publishers.
Expected behavior
Executor should us less CPU for acquiring the next executable. This is important e. g. in the case of rosbag2 it affects how the recorded system performs.
Actual behavior
Executor has a high CPU consumption even when subscription callbacks are empty (just to acquire next executables).
Additional information
A partial work around is to use spin_some() or spin_all() followed by a short (e. g. 1 ms) sleep in a while (rclcpp::ok()) loop, instead of a spin().
Note that spin_once() with a similar sleep won't work as well since we want to execute a higher number of executables each second than it would permit.
With the case of ~4k executables per second, when calling wait_for_work, 3.3k calls to rcl_wait per each second are made, so it is mostly only one executable that is returned each time, which seems quite inefficient. I am not sure if this is by design (since it perhaps minimizes latency), certainly collections used to gather a bunch of executables for each rcl_wait call are underused.
When a 1 ms sleep is introduced after we miss the cache (before/after rcl_wait), only ~600 calls to rcl_wait per second are made while successfully executing the same number of callbacks per second.
Sleeping (and chrono steady clock) of course can have their behavior dependency on platform so it is hard to suggest this as an executor level change, but certainly a factor to be aware of.
Perhaps another type of executor (Events based) would be more suitable for this type of use-case / requirements for low CPU consumption. There is quite some work ongoing ros2/design#305.
The text was updated successfully, but these errors were encountered: