-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make AutoScheduler handling of errors during measure consistent with AutoTvm #6909
Conversation
@tqchen @merrymercy Thoughts and review please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you hit this part of the code? Generally, it means you have some fatal errors in the code.
It is very rare to recover from a case where you have so many continuous errors.
Please rebase and fix the CI error. |
I'm not entirely certain what causes us to hit this condition. In our case, we observed from the AutoTvm debug prints that it was due to error_no=4 which is a RUNTIME_DEVICE error (as you can see from the except of AutoTvm log I included previously). Hitting this condition happened very intermittently. We could run a particular op/shape one time and hit the condition and without changing anything it would work the next. In addition, having one op/shape reach this condition didnt mean the rest of our op/shapes that we were running in the same script would fail meaning the system overall was able to recover. I think the main issue is that by terminating the program as soon as we meet this condition we dont allow for the chance to recover and additionally, we wont be getting this useful precise feedback about what error we are hitting while using the auto_scheduler. Ill do the rebasing and try to fix the CI issue. |
…utoTVM and match default level of logging
Co-authored-by: Lianmin Zheng <[email protected]>
ccb15e9
to
9e4c8ec
Compare
Thanks, @TaylorZowtuk. It is merged. |
…AutoTvm (apache#6909) * Match ansor handling of 'too many errors' during measure to that of autoTVM and match default level of logging * Set correct level of verbosity for debug mode Co-authored-by: Lianmin Zheng <[email protected]> * Lint * trigger CI Co-authored-by: Lianmin Zheng <[email protected]> Co-authored-by: Taylor Zowtuk 84152750 <[email protected]>
…AutoTvm (apache#6909) * Match ansor handling of 'too many errors' during measure to that of autoTVM and match default level of logging * Set correct level of verbosity for debug mode Co-authored-by: Lianmin Zheng <[email protected]> * Lint * trigger CI Co-authored-by: Lianmin Zheng <[email protected]> Co-authored-by: Taylor Zowtuk 84152750 <[email protected]>
…AutoTvm (apache#6909) * Match ansor handling of 'too many errors' during measure to that of autoTVM and match default level of logging * Set correct level of verbosity for debug mode Co-authored-by: Lianmin Zheng <[email protected]> * Lint * trigger CI Co-authored-by: Lianmin Zheng <[email protected]> Co-authored-by: Taylor Zowtuk 84152750 <[email protected]>
While running scripts using both AutoScheduler and AutoTvm to consecutively search for schedules for a number of operators/shapes, I observed different behaviors during measurement following the output “Too many errors happened during tuning.”
After looking into the code I determined that the difference in behavior was due to AutoScheduler and AutoTvm handling the case of, the number of accumulated errors during measurement exceeding some threshold, differently.
I observed that while using AutoTvm, the program would switch to debug level logging and continue search.
While using AutoScheduler, the program would crash after throwing an uncaught error.
In my particular case, AutoScheduler crashing rather than continuing to attempt searching meant that my script would terminate prematurely when it may have recovered from whatever was causing errors during search.
In addition, I was unclear why this behavior was only occurring in AutoScheduler and not AutoTvm. This discrepancy in behavior can be confusing to new users who may want to explore both methods of schedule searching. This PR proposes bringing the AutoScheduler handling of errors in measurement in line with AutoTvm.
By removing the LOG(FATAL) and changing verbosity for AutoScheduler in the same way we change logging level in AutoTvm the programs will behave the same. In addition, I changed the default verbosity of AutoScheduler to 0 (silent) in order to match the default logging level of AutoTvm.