SITL perf improvements #11177

bkueng · 2019-01-09T12:52:41Z

This brings some lockstep/timing/locking-related performance improvements.

In more detail:

reduces SITL CPU usage by ~2% (28% -> 26%)
reduces locking in hrt_absolute_time and the lockstep scheduler. This reduces the amount of time spent in (un)locking by a few percent, but it's still really high - around 30%. Heaviest remaining users are the driver framework and uORB.
hrt_absolute_time used 11% of the total usage, it is now down to 0.35%
removed the heavy use of malloc in the lockstep scheduler by using a thread_local object (required some changes to the unit-tests as well).
replaced the std::vector with a more efficient linked list in the lockstep scheduler.
fixes valgrind & gdb invocation

This is a small method that is used a lot.

…ce locking - the loop is not needed - we optimize for the fast case and lock only if really needed

Not required, since the lock is held during the whole loop iteration.

less function calls

…e_time

Previously hrt_absolute_time() was at around 5% of the total CPU usage, now it's around 0.35%.

Because $sitl_command contains quotes

- use a linked-list instead of std::vector. Insertion and removal are now O(1) - avoid malloc and use a thread_local instance of TimedWait. It gets destroyed when the thread exits, so we have to add protection in case a thread exits too quickly. This in turn requires a fix to the unit-tests.

julianoes

Thanks for looking into this and improving it, that's awsome!

removed the heavy use of malloc in the lockstep scheduler by using a thread_local object (required some changes to the unit-tests as well).

~~Which malloc do you mean? The one for std::vector?~~

Ah, this thing, I didn't realize, crazy!

static thread_local TimedWait timed_wait;

replaced the std::vector with a more efficient linked list in the lockstep scheduler.

How much did the switch from std::vector to the linked list really improve?

julianoes · 2019-01-09T13:02:30Z

platforms/posix/src/lockstep_scheduler/CMakeLists.txt

@@ -6,6 +6,8 @@ if(NOT PROJECT_NAME STREQUAL "px4")

    set (CMAKE_CXX_STANDARD 11)

+	add_definitions(-DUNIT_TESTS)


We should probably move the whole file to tabs.

Right, I didn't realize. Commit pushed.

Quick comment because I just noticed this (not PR related). The CMakeLists.txt doesn't need to be different for inclusion in px4 or building standalone.

It does because it can be used for standalone unit tests. And I would like to keep these tests until we have figured out a way to port them over into PX4.

julianoes · 2019-01-09T13:05:16Z

platforms/posix/src/lockstep_scheduler/include/lockstep_scheduler/lockstep_scheduler.h

 	void set_absolute_time(uint64_t time_us);
-	uint64_t get_absolute_time() const;
+	inline uint64_t get_absolute_time() const { return _time_us; }


I like that you made it inline explicitly but it is already inline automatically if defined in the class declaration, right?

I like that you made it inline explicitly but it is already inline automatically if defined in the class declaration, right?

Yes, this is just slightly stronger, and sometimes I like to add it to make it clear to a reader that it's important here. Not strictly necessary though.

julianoes · 2019-01-09T13:11:59Z

platforms/posix/src/lockstep_scheduler/include/lockstep_scheduler/lockstep_scheduler.h

 		pthread_cond_t *passed_cond{nullptr};
 		pthread_mutex_t *passed_lock{nullptr};
 		uint64_t time_us{0};
 		bool timeout{false};
-		bool done{false};
+		std::atomic<bool> done{false};
+		std::atomic<bool> removed{true};


At first I found the name removed confusing but I think it actually is what it is, and I can't find a better word.

julianoes · 2019-01-09T13:12:56Z

platforms/posix/src/lockstep_scheduler/include/lockstep_scheduler/lockstep_scheduler.h

+		std::atomic<bool> done{false};
+		std::atomic<bool> removed{true};
+
+		TimedWait *next{nullptr}; ///< linked list


I don't love the raw linked list. I would prefer an abstraction but I suppose it's easy enough to understand.

julianoes · 2019-01-09T13:14:34Z

platforms/posix/src/lockstep_scheduler/src/lockstep_scheduler.cpp

 }

 void LockstepScheduler::set_absolute_time(uint64_t time_us)
 {
-	time_us_ = time_us;
+	_time_us = time_us;


Ok, makes sense to switch it to the PX4 convention.

julianoes · 2019-01-09T13:26:38Z

platforms/posix/src/lockstep_scheduler/src/lockstep_scheduler.cpp

-			    !temp_timed_wait->done) {
-				temp_timed_wait->timeout = true;
+			if (timed_wait->time_us <= time_us &&
+			    !timed_wait->timeout) {


Are you sure that we don't need && !temp_timed_wait->done anymore?

No, since we check for that above already. If it were needed here, it would mean we have a race condition.

julianoes · 2019-01-09T13:31:52Z

platforms/posix/src/lockstep_scheduler/src/lockstep_scheduler.cpp

-	pthread_cond_t cond;
-	pthread_cond_init(&cond, nullptr);
+	pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
+	pthread_cond_t cond = PTHREAD_COND_INITIALIZER;


I thought these were only allowed for static locks/conds but looks like that's only true for older versions of the POSIX standard. I guess it's a nice optimization to save calls.

julianoes · 2019-01-09T13:35:10Z

platforms/posix/src/lockstep_scheduler/test/src/lockstep_scheduler_test.cpp

+			std::this_thread::yield(); // usleep is too slow here
+		}
+
+		ls.set_absolute_time(ls.get_absolute_time());


Huh, you set the same time again? I don't get it.

It's just that the linked list gets cleaned up. Setting the removed flag in particular.

julianoes · 2019-01-09T13:37:45Z

platforms/posix/src/px4_layer/drv_hrt.cpp

-
-	pthread_mutex_unlock(&_hrt_mutex);
-
-}


Nice cleanup, missed that.

julianoes · 2019-01-09T13:37:58Z

src/lib/cdev/posix/cdev_platform.cpp

@@ -334,10 +330,6 @@ extern "C" {

 #endif

-		while (sim_delay) {
-			px4_usleep(100);
-		}


bkueng · 2019-01-09T14:48:55Z

Thanks for the review.

How much did the switch from std::vector to the linked list really improve?

Enough to show up in the profiler:
This is before (CPU spent in set_absolute_time and its callees):

This PR:

In particular the left green area is mostly from the vector (the total CPU reduction is from 2.58% to 0.73%, which also includes the call to free).

MaEtUgR · 2019-01-14T11:26:43Z

@bkueng AWSOME!!! I just pulled master and now all the strange issues I reported in #10693 are gone!
I can run the Intel stress test utility next to SITL jmavsim and all that happens is an irregular slow down of the simulation but it runs "forever" and performance is back once the extreme CPU load is removed. Pausing also works fine, QGC connection is only lost during the pause and regained just afterwards.

The only downside is that the shutdown command is now also broken and I always have to kill the task (see #11027). But I'll likely figure that one out.

bkueng added 10 commits January 9, 2019 13:32

lockstep_scheduler: inline get_absolute_time()

8d2e47c

This is a small method that is used a lot.

lockstep_scheduler: simplify LockstepScheduler::cond_timedwait & redu…

f91bd48

…ce locking - the loop is not needed - we optimize for the fast case and lock only if really needed

lockstep_scheduler: remove timed_waits_iterator_invalidated_

17756d4

Not required, since the lock is held during the whole loop iteration.

lockstep_scheduler: use a static lock in usleep_until

2f7ab05

less function calls

drv_hrt posix: remove unused code and remove locking from hrt_absolut…

d6fd0e5

…e_time

drv_hrt posix: improve performance for hrt_absolute_time()

02a5edd

Previously hrt_absolute_time() was at around 5% of the total CPU usage, now it's around 0.35%.

cdev_platform: remove unused code (sim_delay)

c1c049d

sitl_run.sh: fix for debugger & valgrind invocation: use eval

a89ef3f

Because $sitl_command contains quotes

refactor lockstep_scheduler: fix class member naming convention

c7ad75f

bkueng requested a review from julianoes January 9, 2019 12:52

julianoes reviewed Jan 9, 2019

View reviewed changes

lockstep_scheduler cmake: spaces -> tabs

c44900c

julianoes approved these changes Jan 14, 2019

View reviewed changes

julianoes merged commit 4bc5909 into master Jan 14, 2019

bkueng deleted the locking_improvements branch January 14, 2019 10:10

This was referenced Jan 14, 2019

Performance issues on Windows that didn't exist before #10693

Closed

Exiting under Windows is strange #11027

Open

weekly-digest bot mentioned this pull request Jan 19, 2019

Weekly Digest (12 January, 2019 - 19 January, 2019) #11244

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SITL perf improvements #11177

SITL perf improvements #11177

bkueng commented Jan 9, 2019 •

edited by AuterionWrikeBot

Loading

julianoes left a comment •

edited

Loading

julianoes Jan 9, 2019

bkueng Jan 9, 2019

dagar Jan 9, 2019

julianoes Jan 10, 2019

julianoes Jan 9, 2019

bkueng Jan 9, 2019

julianoes Jan 9, 2019

julianoes Jan 9, 2019

julianoes Jan 9, 2019

julianoes Jan 9, 2019

bkueng Jan 9, 2019

julianoes Jan 9, 2019

julianoes Jan 9, 2019

bkueng Jan 9, 2019

julianoes Jan 9, 2019

julianoes Jan 9, 2019

bkueng commented Jan 9, 2019

MaEtUgR commented Jan 14, 2019

		@@ -6,6 +6,8 @@ if(NOT PROJECT_NAME STREQUAL "px4")

		set (CMAKE_CXX_STANDARD 11)

		add_definitions(-DUNIT_TESTS)

SITL perf improvements #11177

SITL perf improvements #11177

Conversation

bkueng commented Jan 9, 2019 • edited by AuterionWrikeBot Loading

julianoes left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkueng commented Jan 9, 2019

MaEtUgR commented Jan 14, 2019

bkueng commented Jan 9, 2019 •

edited by AuterionWrikeBot

Loading

julianoes left a comment •

edited

Loading