How to use correctly the new autoscheduler api? #7531

lyudalev · 2023-04-24T13:33:33Z

Hi. Sorry for cross-posting (asked previously on Gitter, but got no luck with responses).

I updated my Halide installation from 14.0.0 to 15.0.1 and it looks like I'm missing something pretty basic about using the new API of the Adams2019 autoscheduler. For a plain 2D convolution testcase the Adams2019 autoscheduler with the same level of parallelism now generates a much worse schedule, which take twice as long to execute. In addition, I can see that schedule generation is performed in almost no time as follows:

generate_schedule for target=x86-64-linux-avx-avx2-f16c-fma-no_runtime-sse41-user_context
Adams2019.parallelism:8
Adams2019.beam_size:32
Adams2019.random_dropout:100
Adams2019.random_dropout_seed:0
Adams2019.weights_path:
Adams2019.disable_subtiling:0
Adams2019.disable_memoized_features:0
Adams2019.disable_memoized_blocks:0
Adams2019.memory_limit:-1
Loading weights from built-in data...
Pass 0 of 5, cost: 2.52664, time (ms): 11
Pass 1 of 5, cost: 2.52664, time (ms): 5
Pass 2 of 5, cost: 2.52664, time (ms): 5
Pass 3 of 5, cost: 2.52664, time (ms): 5
Pass 4 of 5, cost: 2.52664, time (ms): 5
Best cost: 2.52664
Cache (block) hits: 939
Cache (block) misses: 135
Cost evaluated this many times: 2485

And in the old Halide (v14) it behavied differently:

generate_schedule for target=x86-64-linux-avx-avx2-f16c-fma-no_runtime-sse41-user_context
Pass 0 of 5, cost: 1.45978, time (ms): 17335
Pass 1 of 5, cost: 1.45978, time (ms): 1301
Pass 2 of 5, cost: 1.45978, time (ms): 1307
Pass 3 of 5, cost: 1.45978, time (ms): 1371
Pass 4 of 5, cost: 1.45978, time (ms): 1364
Best cost: 1.45978
Cache (block) hits: 537
Cache (block) misses: 135

I tried to play with autoscheduler.beam_size parameter setting it to 1 for greedy search, but it produces even worse outcome.
When the beam_size is set to a huge value (eg, 8192), it does take considerable time to generate, but the resulting schedule is still bad.

Would you be so kind to guide me how the new autoscheduler api should be used correctly please? Thank you very much.

abadams · 2023-04-28T16:48:14Z

Could you post a full repro? The autoscheduler seems to be working correctly for me on the local laplacian app with default parameters.

lyudalev · 2023-04-30T04:03:33Z

Thank you for helping! Here is an example:

class Convolve_f32_to_f32 : public Generator<Convolve_f32_to_f32>
{
public:
    GeneratorParam<int>      kernel_size {"kernel_size", 5};
    Input<Buffer<float, 2>>  input {"input"};
    Input<Buffer<float, 2>>  kernel {"kernel"};
    Output<Buffer<float, 2>> output {"output"};

    Func clamped {"clamped"}, convolved {"convolved"};
    Var  x {"x"}, y {"y"};
    Pipeline pipeline;

    void generate ()
    {
        clamped = BoundaryConditions::repeat_edge (input);
        RDom r (kernel);
        Expr shift = kernel.width () / 2;
        output (x, y) = sum (kernel (r.x, r.y) * clamped (x + r.x - shift, y + r.y - shift));
        pipeline = get_pipeline ();
    }

    void schedule()
    {
        if (using_autoscheduler ()) // v14: if (auto_schedule)
        {
            input.set_estimates({ {0, 1280}, {0, 1024} });
            output.set_estimates({ {0, 1280}, {0, 1024} });
            if (kernel_size == 5)
                kernel.set_estimates({ {0, 5}, {0, 5} });
            else
                kernel.set_estimates({ {0, 9}, {0, 9} });
        }
    }
};
HALIDE_REGISTER_GENERATOR (Convolve_f32_to_f32, convolve_f32_to_f32)

Compiled as follows:
add_halide_library (convolve_5x5_f32_to_f32 FROM ${PROJECT_NAME}
GENERATOR convolve_f32_to_f32
FUNCTION_NAME convolve_5x5_f32_to_f32
FEATURES user_context
TARGETS ${PRODUCTION_TARGET}
AUTOSCHEDULER Halide::Adams2019
PARAMS autoscheduler.parallelism=8 # v14: PARAMS auto_schedule=true machine_params=8,0,0
PARAMS kernel_size=5
HEADER ${GENS_OUT_DIR}
SCHEDULE convolve_5x5_f32_to_f32
STMT_HTML convolve_5x5_f32_to_f32)

And here are my generated schedules difference:
![Capture](https://user-images.githubusercontent.com/2185111/235334933-93b1d11a-0761-4051-aa1d-61be9cb6dbec.GIF)
(Left side -- new, right side -- old).
Thank you!

lyudalev · 2023-05-10T10:49:47Z

Sorry for annoyance and bothering: did you have a chance to look into this please?

abadams · 2023-05-10T16:26:32Z

Sorry for the delay. I was able to reproduce the issue, but am currently stumped as to the cause. It's not exploring sliding window schedules, and I can't figure out why. I don't see any recent changes that look relevant. I'll keep digging.

lyudalev · 2023-06-12T04:49:45Z

Hi there! How's digging going?

abadams · 2023-06-12T17:21:40Z

Got distracted by other deadlines, but I'm back at it now because it's blocking the release of Halide 16. Doing a bisection, it looks like it was caused by #6861

lyudalev · 2023-06-13T17:23:29Z

Thank you!

abadams mentioned this issue Jun 12, 2023

Fix inverted may_subtile checks #7626

Merged

abadams closed this as completed in #7626 Jun 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use correctly the new autoscheduler api? #7531

How to use correctly the new autoscheduler api? #7531

lyudalev commented Apr 24, 2023

abadams commented Apr 28, 2023

lyudalev commented Apr 30, 2023 •

edited

Loading

lyudalev commented May 10, 2023

abadams commented May 10, 2023

lyudalev commented Jun 12, 2023

abadams commented Jun 12, 2023

lyudalev commented Jun 13, 2023

How to use correctly the new autoscheduler api? #7531

How to use correctly the new autoscheduler api? #7531

Comments

lyudalev commented Apr 24, 2023

abadams commented Apr 28, 2023

lyudalev commented Apr 30, 2023 • edited Loading

lyudalev commented May 10, 2023

abadams commented May 10, 2023

lyudalev commented Jun 12, 2023

abadams commented Jun 12, 2023

lyudalev commented Jun 13, 2023

lyudalev commented Apr 30, 2023 •

edited

Loading