ARC: runner: mdb: tweak searching for cld process pid #30301

evgeniy-paltsev · 2020-11-29T17:56:52Z

mdb binary starts several subproceses and one of them is cld process. In runners/mdb.py we record process id of cld on each mdb launch to terminate simulator correctly later. However we can finish test and terminate mdb before the cld process was found (so cld won't be terminated correctly by sanitycheck infrastructure). It may happen if we launch mdb on fast host machine.

That leads to several issues. First of all we get ugly error in sanitycheck output:

FileNotFoundError: [Errno 2] No such file or directory: '/xxxx/mdb.pid'

Secondly (and it's more important) we terminate simulator incorrectly. We terminate mdb leaving cld process alive, running and consuming one cpu core permanently (until we kill it manually)

So, let's don't wait extra 0.5 seconds before the first lookup (as test may finish before we start lookup) and increase granularity of lookups (so we won't have big delays where test can start and finish between lookups). It shouldn't affect host machine performance as we usually find cld (and hence exit lookup cycle) in 1-2 iterations (based on my tests on local machine and our servers)

abrodkin

A couple of things:

Could you please add some background why did you get into that business of tweaking MDB runner (i.e. I guess there was some problem you were looking at)
Move of the sleep() to the end of the loop is well understood, but what about 10 times smaller granularity now (50 milliseconds instead of 500)? Won't it introduce not-needed overhead?

ruuddw · 2020-11-30T08:35:27Z

Why is it so important to always have the pid file early? If there is no pid file, there was no process so nothing to kill -> suggest to move the pid check to where the process gets killed, instead of 'tweaking' parameters around development host machine timing/performance.

evgeniy-paltsev · 2020-11-30T15:01:47Z

@ruuddw @abrodkin I've updated commit description and added more background.

If there is no pid file, there was no process so nothing to kill -> suggest to move the pid check to where the process gets killed
But we get situation that there is no pid file or mdb process but cld is alive and consumes cpu power. And we can't determine which process with name cld to kill (as we launches several simulations in parallel)

I've played with a pstree util a bit.

Here is the simulation launched:

systemd───systemd───mdb───cld

Here we've killed mdb process (the cld parent):

systemd───systemd───cld

The cld still runs after his parent mdb is terminated.

mdb binary starts several subproceses and one of them is cld process. In runners/mdb.py we record process id of cld on each mdb launch to terminate simulator correctly later. However we can finish test and terminate mdb before the cld process was found (so cld won't be terminated correctly by sanitycheck infrastructure). It may happen if we launch mdb on fast host machine. That leads to several issues. First of all we get ugly error in sanitycheck output: ------------------------>8-------------------------------- FileNotFoundError: [Errno 2] No such file or directory: '/xxxx/mdb.pid' ------------------------>8-------------------------------- Secondly (and it's more important) we terminate simulator incorrectly. We terminate mdb leaving cld process alive, running and consuming one cpu core permanently (until we kill it manually) So, let's increase granularity of lookups and don't wait extra 0.5 seconds before the first lookup. Signed-off-by: Eugeniy Paltsev <[email protected]>

ruuddw · 2020-12-01T11:13:51Z

Understood, without the mdb process running anymore it will hard to get the specific cld pid to kill.

mbolivar-nordic · 2020-12-11T17:56:32Z

@nashif I know I was late to the review but this should not have been merged with an optional import being added without checking if it's installed. I'll send a follow up fix.

evgeniy-paltsev requested a review from mbolivar-nordic as a code owner November 29, 2020 17:56

evgeniy-paltsev requested review from abrodkin and ruuddw November 29, 2020 17:57

abrodkin requested changes Nov 30, 2020

View reviewed changes

evgeniy-paltsev force-pushed the rff-mdb-cld-pid-fix branch from 661590a to 032bb72 Compare November 30, 2020 11:02

evgeniy-paltsev force-pushed the rff-mdb-cld-pid-fix branch from 032bb72 to 5d13843 Compare December 1, 2020 08:47

ruuddw approved these changes Dec 1, 2020

View reviewed changes

evgeniy-paltsev requested a review from abrodkin December 2, 2020 18:27

abrodkin approved these changes Dec 2, 2020

View reviewed changes

nashif merged commit 9858893 into zephyrproject-rtos:master Dec 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARC: runner: mdb: tweak searching for cld process pid #30301

ARC: runner: mdb: tweak searching for cld process pid #30301

evgeniy-paltsev commented Nov 29, 2020 •

edited

Loading

abrodkin left a comment

ruuddw commented Nov 30, 2020

evgeniy-paltsev commented Nov 30, 2020

ruuddw commented Dec 1, 2020

mbolivar-nordic commented Dec 11, 2020

ARC: runner: mdb: tweak searching for cld process pid #30301

ARC: runner: mdb: tweak searching for cld process pid #30301

Conversation

evgeniy-paltsev commented Nov 29, 2020 • edited Loading

abrodkin left a comment

Choose a reason for hiding this comment

ruuddw commented Nov 30, 2020

evgeniy-paltsev commented Nov 30, 2020

ruuddw commented Dec 1, 2020

mbolivar-nordic commented Dec 11, 2020

evgeniy-paltsev commented Nov 29, 2020 •

edited

Loading