-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use correctly the new autoscheduler api? #7531
Comments
Could you post a full repro? The autoscheduler seems to be working correctly for me on the local laplacian app with default parameters. |
Thank you for helping! Here is an example:
Compiled as follows:
|
Sorry for annoyance and bothering: did you have a chance to look into this please? |
Sorry for the delay. I was able to reproduce the issue, but am currently stumped as to the cause. It's not exploring sliding window schedules, and I can't figure out why. I don't see any recent changes that look relevant. I'll keep digging. |
Hi there! How's digging going? |
Got distracted by other deadlines, but I'm back at it now because it's blocking the release of Halide 16. Doing a bisection, it looks like it was caused by #6861 |
Thank you! |
Hi. Sorry for cross-posting (asked previously on Gitter, but got no luck with responses).
I updated my Halide installation from 14.0.0 to 15.0.1 and it looks like I'm missing something pretty basic about using the new API of the Adams2019 autoscheduler. For a plain 2D convolution testcase the Adams2019 autoscheduler with the same level of parallelism now generates a much worse schedule, which take twice as long to execute. In addition, I can see that schedule generation is performed in almost no time as follows:
generate_schedule for target=x86-64-linux-avx-avx2-f16c-fma-no_runtime-sse41-user_context
Adams2019.parallelism:8
Adams2019.beam_size:32
Adams2019.random_dropout:100
Adams2019.random_dropout_seed:0
Adams2019.weights_path:
Adams2019.disable_subtiling:0
Adams2019.disable_memoized_features:0
Adams2019.disable_memoized_blocks:0
Adams2019.memory_limit:-1
Loading weights from built-in data...
Pass 0 of 5, cost: 2.52664, time (ms): 11
Pass 1 of 5, cost: 2.52664, time (ms): 5
Pass 2 of 5, cost: 2.52664, time (ms): 5
Pass 3 of 5, cost: 2.52664, time (ms): 5
Pass 4 of 5, cost: 2.52664, time (ms): 5
Best cost: 2.52664
Cache (block) hits: 939
Cache (block) misses: 135
Cost evaluated this many times: 2485
And in the old Halide (v14) it behavied differently:
generate_schedule for target=x86-64-linux-avx-avx2-f16c-fma-no_runtime-sse41-user_context
Pass 0 of 5, cost: 1.45978, time (ms): 17335
Pass 1 of 5, cost: 1.45978, time (ms): 1301
Pass 2 of 5, cost: 1.45978, time (ms): 1307
Pass 3 of 5, cost: 1.45978, time (ms): 1371
Pass 4 of 5, cost: 1.45978, time (ms): 1364
Best cost: 1.45978
Cache (block) hits: 537
Cache (block) misses: 135
I tried to play with autoscheduler.beam_size parameter setting it to 1 for greedy search, but it produces even worse outcome.
When the beam_size is set to a huge value (eg, 8192), it does take considerable time to generate, but the resulting schedule is still bad.
Would you be so kind to guide me how the new autoscheduler api should be used correctly please? Thank you very much.
The text was updated successfully, but these errors were encountered: