-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARC: runner: mdb: tweak searching for cld process pid #30301
ARC: runner: mdb: tweak searching for cld process pid #30301
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of things:
- Could you please add some background why did you get into that business of tweaking MDB runner (i.e. I guess there was some problem you were looking at)
- Move of the sleep() to the end of the loop is well understood, but what about 10 times smaller granularity now (50 milliseconds instead of 500)? Won't it introduce not-needed overhead?
Why is it so important to always have the pid file early? If there is no pid file, there was no process so nothing to kill -> suggest to move the pid check to where the process gets killed, instead of 'tweaking' parameters around development host machine timing/performance. |
661590a
to
032bb72
Compare
@ruuddw @abrodkin I've updated commit description and added more background.
I've played with a Here is the simulation launched:
Here we've killed
The cld still runs after his parent |
mdb binary starts several subproceses and one of them is cld process. In runners/mdb.py we record process id of cld on each mdb launch to terminate simulator correctly later. However we can finish test and terminate mdb before the cld process was found (so cld won't be terminated correctly by sanitycheck infrastructure). It may happen if we launch mdb on fast host machine. That leads to several issues. First of all we get ugly error in sanitycheck output: ------------------------>8-------------------------------- FileNotFoundError: [Errno 2] No such file or directory: '/xxxx/mdb.pid' ------------------------>8-------------------------------- Secondly (and it's more important) we terminate simulator incorrectly. We terminate mdb leaving cld process alive, running and consuming one cpu core permanently (until we kill it manually) So, let's increase granularity of lookups and don't wait extra 0.5 seconds before the first lookup. Signed-off-by: Eugeniy Paltsev <[email protected]>
032bb72
to
5d13843
Compare
Understood, without the mdb process running anymore it will hard to get the specific cld pid to kill. |
@nashif I know I was late to the review but this should not have been merged with an optional import being added without checking if it's installed. I'll send a follow up fix. |
mdb binary starts several subproceses and one of them is cld process. In runners/mdb.py we record process id of cld on each mdb launch to terminate simulator correctly later. However we can finish test and terminate mdb before the cld process was found (so cld won't be terminated correctly by sanitycheck infrastructure). It may happen if we launch mdb on fast host machine.
That leads to several issues. First of all we get ugly error in sanitycheck output:
Secondly (and it's more important) we terminate simulator incorrectly. We terminate mdb leaving cld process alive, running and consuming one cpu core permanently (until we kill it manually)
So, let's don't wait extra 0.5 seconds before the first lookup (as test may finish before we start lookup) and increase granularity of lookups (so we won't have big delays where test can start and finish between lookups). It shouldn't affect host machine performance as we usually find
cld
(and hence exit lookup cycle) in 1-2 iterations (based on my tests on local machine and our servers)