-
Notifications
You must be signed in to change notification settings - Fork 13.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SITL perf improvements #11177
SITL perf improvements #11177
Conversation
This is a small method that is used a lot.
…ce locking - the loop is not needed - we optimize for the fast case and lock only if really needed
Not required, since the lock is held during the whole loop iteration.
less function calls
Previously hrt_absolute_time() was at around 5% of the total CPU usage, now it's around 0.35%.
Because $sitl_command contains quotes
- use a linked-list instead of std::vector. Insertion and removal are now O(1) - avoid malloc and use a thread_local instance of TimedWait. It gets destroyed when the thread exits, so we have to add protection in case a thread exits too quickly. This in turn requires a fix to the unit-tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this and improving it, that's awsome!
- removed the heavy use of malloc in the lockstep scheduler by using a thread_local object (required some changes to the unit-tests as well).
Which malloc do you mean? The one for std::vector
?
Ah, this thing, I didn't realize, crazy!
static thread_local TimedWait timed_wait;
- replaced the std::vector with a more efficient linked list in the lockstep scheduler.
How much did the switch from std::vector
to the linked list really improve?
@@ -6,6 +6,8 @@ if(NOT PROJECT_NAME STREQUAL "px4") | |||
|
|||
set (CMAKE_CXX_STANDARD 11) | |||
|
|||
add_definitions(-DUNIT_TESTS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably move the whole file to tabs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I didn't realize. Commit pushed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick comment because I just noticed this (not PR related). The CMakeLists.txt doesn't need to be different for inclusion in px4 or building standalone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does because it can be used for standalone unit tests. And I would like to keep these tests until we have figured out a way to port them over into PX4.
void set_absolute_time(uint64_t time_us); | ||
uint64_t get_absolute_time() const; | ||
inline uint64_t get_absolute_time() const { return _time_us; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that you made it inline
explicitly but it is already inline
automatically if defined in the class declaration, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that you made it inline explicitly but it is already inline automatically if defined in the class declaration, right?
Yes, this is just slightly stronger, and sometimes I like to add it to make it clear to a reader that it's important here. Not strictly necessary though.
pthread_cond_t *passed_cond{nullptr}; | ||
pthread_mutex_t *passed_lock{nullptr}; | ||
uint64_t time_us{0}; | ||
bool timeout{false}; | ||
bool done{false}; | ||
std::atomic<bool> done{false}; | ||
std::atomic<bool> removed{true}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I found the name removed
confusing but I think it actually is what it is, and I can't find a better word.
std::atomic<bool> done{false}; | ||
std::atomic<bool> removed{true}; | ||
|
||
TimedWait *next{nullptr}; ///< linked list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love the raw linked list. I would prefer an abstraction but I suppose it's easy enough to understand.
} | ||
|
||
void LockstepScheduler::set_absolute_time(uint64_t time_us) | ||
{ | ||
time_us_ = time_us; | ||
_time_us = time_us; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, makes sense to switch it to the PX4 convention.
!temp_timed_wait->done) { | ||
temp_timed_wait->timeout = true; | ||
if (timed_wait->time_us <= time_us && | ||
!timed_wait->timeout) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure that we don't need && !temp_timed_wait->done
anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, since we check for that above already. If it were needed here, it would mean we have a race condition.
pthread_cond_t cond; | ||
pthread_cond_init(&cond, nullptr); | ||
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; | ||
pthread_cond_t cond = PTHREAD_COND_INITIALIZER; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought these were only allowed for static
locks/conds but looks like that's only true for older versions of the POSIX standard. I guess it's a nice optimization to save calls.
std::this_thread::yield(); // usleep is too slow here | ||
} | ||
|
||
ls.set_absolute_time(ls.get_absolute_time()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, you set the same time again? I don't get it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just that the linked list gets cleaned up. Setting the removed
flag in particular.
|
||
pthread_mutex_unlock(&_hrt_mutex); | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice cleanup, missed that.
@@ -334,10 +330,6 @@ extern "C" { | |||
|
|||
#endif | |||
|
|||
while (sim_delay) { | |||
px4_usleep(100); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@bkueng AWSOME!!! I just pulled master and now all the strange issues I reported in #10693 are gone! The only downside is that the shutdown command is now also broken and I always have to kill the task (see #11027). But I'll likely figure that one out. |
This brings some lockstep/timing/locking-related performance improvements.
In more detail:
hrt_absolute_time
and the lockstep scheduler. This reduces the amount of time spent in (un)locking by a few percent, but it's still really high - around 30%. Heaviest remaining users are the driver framework and uORB.hrt_absolute_time
used 11% of the total usage, it is now down to 0.35%malloc
in the lockstep scheduler by using a thread_local object (required some changes to the unit-tests as well).std::vector
with a more efficient linked list in the lockstep scheduler.