-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Auto scheduler tutorial failure on CI #6723
Conversation
@comaniac thanks for doing this! |
Yeah we should probably unblock the CI first, then can work together to find the root cause in exactly the same docker environment used in the CI. |
Although CI still failed, I've finally got the incorrect JSON. Here are the instructions to reproduce:
{"i": [["[\"conv2d_layer\", 1, 7, 7, 512, 512, 3, 3, [1, 1], [1, 1]]", "cuda -keys=cuda,gpu -max_num_threads=1024 -thread_warp_size=32", [-1, 16, 64, 49152, 65536, 1024, 8, 32]], [[], [["CI", 5], ["SP", 3, 0, 1, [1, 1, 1, 1], 1], ["SP", 3, 5, 512, [4, 32, 1, 1], 1], ["SP", 3, 10, 7, [1, 1, 1, 7], 1], ["SP", 3, 15, 7, [1, 1, 7, 1], 1], ["SP", 3, 20, 512, [8, 1], 1], ["SP", 3, 23, 3, [1, 1], 1], ["SP", 3, 26, 3, [1, 3], 1], ["RE", 3, [0, 5, 10, 15, 1, 6, 11, 16, 2, 7, 12, 17, 20, 23, 26, 21, 24, 27, 3, 8, 13, 18, 22, 25, 28, 4, 9, 14, 19]], ["FSP", 6, 0, 1, 3], ["FSP", 6, 4, 2, 3], ["FSP", 6, 8, 3, 3], ["FSP", 6, 12, 4, 3], ["RE", 6, [0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15]], ["CA", 3, 6, 11], ["CHR", 2, "shared", [3]], ["CA", 3, 4, 14], ["CHR", 1, "shared", [4]], ["CA", 2, 5, 14], ["CI", 1], ["FU", 8, [0, 1, 2, 3]], ["AN", 8, 0, 5], ["FU", 8, [1, 2, 3, 4]], ["AN", 8, 1, 4], ["FU", 8, [2, 3, 4, 5]], ["AN", 8, 2, 6], ["FU", 4, [0, 1, 2, 3]], ["SP", 4, 0, 16, [1], 1], ["AN", 4, 1, 2], ["FFSP", 4, 0, [4, 3, 2, 1], 1, 1], ["AN", 4, 1, 6], ["FU", 2, [0, 1, 2, 3]], ["SP", 2, 0, 784, [7], 1], ["AN", 2, 1, 2], ["FFSP", 2, 0, [4, 3, 2, 1], 1, 1], ["AN", 2, 1, 6], ["PR", 5, 0, "auto_unroll_max_step$0"]]]], "r": [[0.001027474], 0, 1.97181, 1603158399], "v": "v0.2"}
import tvm
from tvm import auto_scheduler
auto_scheduler.load_best("conv2d.json") I'll try to fix it tomorrow. @jcf94 will help investigate as well. |
@comaniac Thanks! |
Nice we finally get the json file to reproduce! |
...... 😢 Unfortunately, it seems we still can't reproduce such bug in local runtime. I'm still not able to figure out how this log was generated. |
I disabled the tutorials to make the CI green first. Will file another PR to fix the issue. |
We found that the root cause is the log file generated by the tutorial is not removed, meaning that each CI will append several lines of log to the same file. Based on that, #6671 changes the log format and appended the record in different format to the file that is read by other CI runs. After this PR, I'll file another PR to make sure every CI run is independent. |
We can merge this PR first if it passes the CI. Then we can have a follow-up PR to fix the root cause and remove the logging.
cc @merrymercy @jcf94 @junrushao1994