-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support vectorization in OpenGLCompute backend #6348
Support vectorization in OpenGLCompute backend #6348
Conversation
This patch adds support for vector load and store operations. First, a pass identifies the buffers whose loads and stores are all dense, aligned, and have the same number of lanes. Such buffers are declared with a vector base type and accessed accordingly. Loads and stores that do not satisfy those criteria are implemented as gathers and scatters from buffers whose base type is scalar. Resolves halide#4976. Partially resolves halide#1687.
I noticed in my simple tests that loads and stores were never aligned. They always had a modulo of 1 and a remainder of 0. For a bounded vectorization of 4, I expected the modulo to be 4 as well, but this was not the case. Forcing alignment works, so I suspect there might be an issue with updating the alignments for loads and stores. Any idea if this is an issue or just an expected behavior? |
// If this is a dense vector load and the buffer has a vector base type, | ||
// then index the buffer using the base of the ramp divided by the number | ||
// of lanes. | ||
Expr ramp_base = strided_ramp_base(op->index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to check, but strided_ramp_base takes a stride parameter that defaults to one, so this actually is checking for a stride 1 ramp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsharletg That was my intention. Dense loads/stores are those that have a stride of 1. A stride higher than 1 means that the ramp is skipping elements and is not dense.
// of lanes. | ||
Expr ramp_base = strided_ramp_base(op->index); | ||
if (ramp_base.defined() && buffer_is_vector[op->name]) { | ||
string index_id = print_expr(ramp_base / op->type.lanes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't there be a check that the number of lanes in the buffer type is the same as the load? Maybe buffer_is_vector should hold a Type instead of a bool to facilitate this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsharletg The CheckAlignedDenseVectorLoadStore
pass checks for the number of lanes in all loads and stores. So if the pass decides to declare the buffer as a vector, it is guaranteed that all of the loads/stores will have the same number of lanes. So a check for lanes here is redundant as far as I can see.
if (lanes != -1 && op->value.type().lanes() != lanes) {
are_all_dense = false;
return;
}
// of lanes. | ||
Expr ramp_base = strided_ramp_base(op->index); | ||
if (ramp_base.defined() && buffer_is_vector[op->name]) { | ||
string index_id = print_expr(ramp_base / op->value.type().lanes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment as above.
public: | ||
// True if all loads and stores from the buffer are dense, aligned, and all | ||
// have the same number of lanes, false otherwise. | ||
bool are_all_dense = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this answers my questions above. Buffers are only stored as vectors if all the loads and stores are aligned dense vectors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsharletg That's right.
Does this PR need more review / revision? |
The ternary operator in GLSL does not work with vector types. While the mix function have overloads to boolean vectors, it is only supported in version 4.5, so it is not exactly portable. To work around this, we use the ternary operator on all elements of the vector type. Necessary for halide#6348.
Thanks for the fix! |
The ternary operator in GLSL does not work with vector types. While the mix function have overloads to boolean vectors, it is only supported in version 4.5, so it is not exactly portable. To work around this, we use the ternary operator on all elements of the vector type. Necessary for #6348.
This patch adds support for vector load and store operations in the
OpenGLCompute backend.
First, a pass identifies the buffers whose loads and stores are all
dense, aligned, and have the same number of lanes. Such buffers are
declared with a vector base type and accessed accordingly. Loads and
stores that do not satisfy those criteria are implemented as gathers and
scatters from buffers whose base type is scalar.
Resolves #4976.
Partially resolves #1687.