Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoTVM][RPCRunner] timeout is not passed correctly #6924

Merged
merged 4 commits into from
Nov 17, 2020
Merged

[AutoTVM][RPCRunner] timeout is not passed correctly #6924

merged 4 commits into from
Nov 17, 2020

Conversation

bernhardklein
Copy link
Contributor

If we hand-over the timeout to the constructor of the LocalExecutor, like done in the LocalBuilder class it seems to solve the issue and the timeout is now correctly passed.

@bernhardklein
Copy link
Contributor Author

relates to this 6924 issue.

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

cc @merrymercy

Copy link
Member

@merrymercy merrymercy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot do this because it changes the meaning of the current timeout.
Currently, the timeout argument of RPCRunner only controls the time spent on remote devices. But after your change, the timeout will take all time into consideration, which includes the waiting time to get the device.

For example, if the timeout is 10, we have one remote device and we launch two measurement tasks.
With the existing code, each of the two requests can get 10 seconds to do the measurement on the remote device. So the later request actually has a 20 seconds timeout in LocalExecutor (10s for waiting for the first request to finish).
But after your change, the later request can only get 10 seconds in LocalExecutor.

The correct way is

self.executor = LocalExecutor(timeout=(timeout * self.n_parallel + 1))

We use self.n_parallel+1 instead of self.n_parallel to give more time for other overhead.

@bernhardklein
Copy link
Contributor Author

@merrymercy as you suggested I set the constructor argument to (timeout * (self.n_parallel + 1)), also changed the self.timeout in the same way. Hope this is the correct way to do it. For n_parallel=None the base class Runner sets to cpu count, which seems to be working correctly. Hope now it's ok.

@merrymercy
Copy link
Member

merrymercy commented Nov 17, 2020

You should only change the timeout setting in LocalExecutor.
The self.timeout is used in other places and it is correct.
I helped you fix the problems.

@bernhardklein
Copy link
Contributor Author

ok, thank you!

@merrymercy merrymercy merged commit 47809c3 into apache:main Nov 17, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 2, 2020
* [AutoTVM][RPCRunner] timeout is not passed correctly

* like @merrymercy suggests, scale timeout with (n_parallel + 1)

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Lianmin Zheng <[email protected]>
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 4, 2020
* [AutoTVM][RPCRunner] timeout is not passed correctly

* like @merrymercy suggests, scale timeout with (n_parallel + 1)

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Lianmin Zheng <[email protected]>
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Dec 4, 2020
* [AutoTVM][RPCRunner] timeout is not passed correctly

* like @merrymercy suggests, scale timeout with (n_parallel + 1)

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Lianmin Zheng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AutoTVM][RPCRunner] timeout is not passed correctly
3 participants