Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use correctly the new autoscheduler api? #7531

Closed
lyudalev opened this issue Apr 24, 2023 · 7 comments · Fixed by #7626
Closed

How to use correctly the new autoscheduler api? #7531

lyudalev opened this issue Apr 24, 2023 · 7 comments · Fixed by #7626

Comments

@lyudalev
Copy link

Hi. Sorry for cross-posting (asked previously on Gitter, but got no luck with responses).

I updated my Halide installation from 14.0.0 to 15.0.1 and it looks like I'm missing something pretty basic about using the new API of the Adams2019 autoscheduler. For a plain 2D convolution testcase the Adams2019 autoscheduler with the same level of parallelism now generates a much worse schedule, which take twice as long to execute. In addition, I can see that schedule generation is performed in almost no time as follows:

generate_schedule for target=x86-64-linux-avx-avx2-f16c-fma-no_runtime-sse41-user_context
Adams2019.parallelism:8
Adams2019.beam_size:32
Adams2019.random_dropout:100
Adams2019.random_dropout_seed:0
Adams2019.weights_path:
Adams2019.disable_subtiling:0
Adams2019.disable_memoized_features:0
Adams2019.disable_memoized_blocks:0
Adams2019.memory_limit:-1
Loading weights from built-in data...
Pass 0 of 5, cost: 2.52664, time (ms): 11
Pass 1 of 5, cost: 2.52664, time (ms): 5
Pass 2 of 5, cost: 2.52664, time (ms): 5
Pass 3 of 5, cost: 2.52664, time (ms): 5
Pass 4 of 5, cost: 2.52664, time (ms): 5
Best cost: 2.52664
Cache (block) hits: 939
Cache (block) misses: 135
Cost evaluated this many times: 2485

And in the old Halide (v14) it behavied differently:

generate_schedule for target=x86-64-linux-avx-avx2-f16c-fma-no_runtime-sse41-user_context
Pass 0 of 5, cost: 1.45978, time (ms): 17335
Pass 1 of 5, cost: 1.45978, time (ms): 1301
Pass 2 of 5, cost: 1.45978, time (ms): 1307
Pass 3 of 5, cost: 1.45978, time (ms): 1371
Pass 4 of 5, cost: 1.45978, time (ms): 1364
Best cost: 1.45978
Cache (block) hits: 537
Cache (block) misses: 135

I tried to play with autoscheduler.beam_size parameter setting it to 1 for greedy search, but it produces even worse outcome.
When the beam_size is set to a huge value (eg, 8192), it does take considerable time to generate, but the resulting schedule is still bad.

Would you be so kind to guide me how the new autoscheduler api should be used correctly please? Thank you very much.

@abadams
Copy link
Member

abadams commented Apr 28, 2023

Could you post a full repro? The autoscheduler seems to be working correctly for me on the local laplacian app with default parameters.

@lyudalev
Copy link
Author

lyudalev commented Apr 30, 2023

Thank you for helping! Here is an example:

class Convolve_f32_to_f32 : public Generator<Convolve_f32_to_f32>
{
public:
    GeneratorParam<int>      kernel_size {"kernel_size", 5};
    Input<Buffer<float, 2>>  input {"input"};
    Input<Buffer<float, 2>>  kernel {"kernel"};
    Output<Buffer<float, 2>> output {"output"};

    Func clamped {"clamped"}, convolved {"convolved"};
    Var  x {"x"}, y {"y"};
    Pipeline pipeline;

    void generate ()
    {
        clamped = BoundaryConditions::repeat_edge (input);
        RDom r (kernel);
        Expr shift = kernel.width () / 2;
        output (x, y) = sum (kernel (r.x, r.y) * clamped (x + r.x - shift, y + r.y - shift));
        pipeline = get_pipeline ();
    }

    void schedule()
    {
        if (using_autoscheduler ()) // v14: if (auto_schedule)
        {
            input.set_estimates({ {0, 1280}, {0, 1024} });
            output.set_estimates({ {0, 1280}, {0, 1024} });
            if (kernel_size == 5)
                kernel.set_estimates({ {0, 5}, {0, 5} });
            else
                kernel.set_estimates({ {0, 9}, {0, 9} });
        }
    }
};
HALIDE_REGISTER_GENERATOR (Convolve_f32_to_f32, convolve_f32_to_f32)

Compiled as follows:
add_halide_library (convolve_5x5_f32_to_f32 FROM ${PROJECT_NAME}
GENERATOR convolve_f32_to_f32
FUNCTION_NAME convolve_5x5_f32_to_f32
FEATURES user_context
TARGETS ${PRODUCTION_TARGET}
AUTOSCHEDULER Halide::Adams2019
PARAMS autoscheduler.parallelism=8 # v14: PARAMS auto_schedule=true machine_params=8,0,0
PARAMS kernel_size=5
HEADER ${GENS_OUT_DIR}
SCHEDULE convolve_5x5_f32_to_f32
STMT_HTML convolve_5x5_f32_to_f32)

And here are my generated schedules difference:
![Capture](https://user-images.githubusercontent.com/2185111/235334933-93b1d11a-0761-4051-aa1d-61be9cb6dbec.GIF)
(Left side -- new, right side -- old).
Thank you!

@lyudalev
Copy link
Author

Sorry for annoyance and bothering: did you have a chance to look into this please?

@abadams
Copy link
Member

abadams commented May 10, 2023

Sorry for the delay. I was able to reproduce the issue, but am currently stumped as to the cause. It's not exploring sliding window schedules, and I can't figure out why. I don't see any recent changes that look relevant. I'll keep digging.

@lyudalev
Copy link
Author

Hi there! How's digging going?

@abadams
Copy link
Member

abadams commented Jun 12, 2023

Got distracted by other deadlines, but I'm back at it now because it's blocking the release of Halide 16. Doing a bisection, it looks like it was caused by #6861

@lyudalev
Copy link
Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants