Skip to content
This repository has been archived by the owner on Jan 29, 2025. It is now read-only.

[spv-out] support const arrays #669

Closed
wants to merge 6 commits into from
Closed

Conversation

acowley
Copy link
Contributor

@acowley acowley commented Apr 7, 2021

Fixes #635

Loading an element from a const array requires that memory be allocated for that array. If a function makes use of a local or global const array, we allocate a local variable into which we store the array. Array accesses then load from that variable.

@acowley acowley force-pushed the const-array branch 2 times, most recently from 819e485 to b0623af Compare April 7, 2021 16:43
@@ -439,6 +441,44 @@ impl Writer {
results: Vec::new(),
};

for (handle, expression) in ir_function.expressions.iter() {
match expression {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's match *expression instead, like in all of the other places

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -439,6 +441,44 @@ impl Writer {
results: Vec::new(),
};

for (handle, expression) in ir_function.expressions.iter() {
match expression {
&crate::Expression::Compose { ty, .. } => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on the Matrix, this code is overly generic.
The problem we need to solve is about dynamic indexing of arrays that are given by values (i.e. expressions or constants).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have narrowed it down at this point. Thank you for identifying this so quickly!

@acowley acowley marked this pull request as ready for review April 7, 2021 18:05
Copy link
Member

@kvark kvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid this needs another major iteration. I expressed some ideas below.

function.local_const_arrays.insert(handle, id);
}
}
crate::Expression::Constant(h) => match ir_module.constants[h].inner {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to check that h isn't already registered. I don't think there is a rule in the IR that prevents multiple Expression::Constant pointing to the same thing.

block
.body
.push(Instruction::store(var_id, composite_id, None));
var_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can return var_id here. The type of the returned SPIR-V is still expected to be a value, not a pointer. It seems to me that we should always return composite_id from this block?

@@ -1375,7 +1450,20 @@ impl Writer {
}
}
crate::Expression::GlobalVariable(handle) => self.global_variables[handle.index()].id,
crate::Expression::Constant(handle) => self.constant_ids[handle.index()],
crate::Expression::Constant(handle) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the near future, when we know how each constant is used by each function - #656 - we'll be able to do this at the beginning of the function only. Maybe we can leave a small TODO pointing to this issue?

block
.body
.push(Instruction::store(var_id, composite_id, None));
var_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly to the other case, I don't think we can return var_id here, since it has a different type

let id = self.id_gen.next();
let var_id = match ir_function.expressions[base] {
crate::Expression::Constant(..) => self.cached[base],
crate::Expression::Compose { .. } => self.cached[base],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this is incomplete: there other ways to get an array-typed expression:

  • a Expression::FunctionArgument that happens to be an array
  • a Expression::Load on a pointer to an array
  • an Expression::Access* on a structure (or other composite) that contains an array

There needs to be a more general approach here...

Here is a rough idea. Let's have a generic mechanism here for promoting expressions to variables. If an expression is promoted, then the Emit of it becomes an OpStore, and any use of it becomes an OpLoad, other than the Access/AccessIndex uses, which do OpLoad after issuing OpAccessChain.

We could build a FastHashMap<Handle<crate::Expression>, Word> for storing the variable ids, similar to what the PR does.

@acowley
Copy link
Contributor Author

acowley commented Apr 9, 2021

I've been making some headway on this, and noticed something: The WGSL front-end always generates an Access expression when it encounters an array indexing expression. I think this could be an AccessIndex if the index is a literal. Is this something that could be improved, or do we want array indexing to always go through Access?

@kvark
Copy link
Member

kvark commented Apr 9, 2021

That's a great question! I believe we want to generate AccessIndex if we can, since it makes the bounds checking easier (i.e. does it at compile time instead of run-time).

@acowley
Copy link
Contributor Author

acowley commented Apr 10, 2021

I have several test cases working now, including const arrays of arrays, but a problem is that I haven't found an opportunity to produce a single multi-step OpAccessChain instruction for something like foo[2][2] where const foo: array<array<f32,3>,3>.

With non-const vars,OpAccessChain generation is triggered off of a Load expression in the syntax tree produced by the WGSL front-end. But with consts, the back-end ends up seeing an Access for the foo[2] sub-expression and then another Access for the outer [2] indexing. What happens, then, is that we store a global constant foo into a local variable we synthesize. Next we load and store foo[2] into another variable we synthesize, and then we load from that variable to obtain the needed f32 value.

With non-const variables, the spv back-end caches an id of 0 when the base of an Access is a variable, or if the cached id of the base is 0. That pushes information forward so that nothing is generated for the chain of array indexing expressions. The trailing Load expression produced by the front-end encodes the information needed to produce a multi-step OpAccessChain.

It's an interesting problem! It took me a while to figure out that coupling between the front- and back-ends with the Load expression. I'm going to think about it some more, but if the above description rings any bells I'm open to suggestions. I've thus far kept all changes to the spv back-end, but perhaps it will need help from the front-end to have more context regarding nested access if we want to generate nicer code.

@kvark
Copy link
Member

kvark commented Apr 11, 2021

Thank you for digging into this! It's a complex topic, and I'm happy that you aren't giving up easily 👍

What happens, then, is that we store a global constant foo into a local variable we synthesize. Next we load and store foo[2] into another variable we synthesize, and then we load from that variable to obtain the needed f32 value.

Are you concerned about having to load the whole foo array in this case, while we know we should only be loading foo[2][2], i.e. just an efficiency concern? I think we can worry about this later, for now we just need to have the simplest workaround for this to work correctly.

With non-const variables, the spv back-end caches an id of 0

Could you clarify what is 0 here?

I'm going to think about it some more, but if the above description rings any bells I'm open to suggestions.

Did you consider a path where we just promote any expression (containing a fixed-size array) into a variable?

perhaps it will need help from the front-end to have more context regarding nested access if we want to generate nicer code.

I'm not sure what extra help or information the frontends could provide in this case.

@acowley
Copy link
Contributor Author

acowley commented Apr 11, 2021

Are you concerned about having to load the whole foo array in this case, while we know we should only be loading foo[2][2], i.e. just an efficiency concern? I think we can worry about this later, for now we just need to have the simplest workaround for this to work correctly.

Yes, it's an efficiency concern. The sour taste is because the current approach loads all of foo, then it loads all of foo[2], and then loads foo[2][2]. Loading the third element of foo (i.e. foo[2]) is the particular disappointment.

I'm sorry that I wasn't clear enough with my last message. The issue with building an OpAccessChain with multiple indices is that the approach of this PR is to promote expressions to have an attached variable in the SPV writer. When we are evaluating an expression in cache_expression_value in the writer, we see the sub-expressions in the order foo, foo[2], and then foo[2][2]. Each of the first two could plausibly prompt us to promote them to variables. For example, if the program said var row3: array<f32,3> = foo[2];, then it make sense that we be able to load foo[2] and store it to row3.

The challenge with generating a multi-level OpAccessChain is that by the time we see the full foo[2][2] in cache_expression_value we have already generated loads for the intermediate sub-expressions.

For the var case, the cached value is set to zero when the base of the access is a variable or the cached value of the base is zero. We don't really do anything at all for Access expressions on a var root in the SPV writer.

That works out because the wgsl front-end produces an Expression::Load if it sees that the root of the expression is a variable. Crucially, it is able to parse a contiguous sequence of accesses and only generate a single Load at the end of that sequence.

It is that Load that produces the desired OpAccessChain in cache_expression_value.

To be clear, there is nothing wrong with how things are currently handled; it nicely sidesteps the challenge of the writer visiting expressions inside-out: that is, the writer sees the innermost foo, followed by foo[2], followed by foo[2][2] causing us to wastefully generate loads for those first two sub-expressions in the const array case. Arrays in vars do not deal with this at all (cf the 0 values in the cache), instead relying on the Load the WGSL front-end produces to generate an efficient OpAccessChain instruction.

Having the WGSL front-end produce a Load for the const array case isn't entirely trivial as far as I can see because we only make the pointers (the vars) in the spv writer as per this PR, so the typifier isn't happy with being given a Load of an array type, for example.

One approach would be if we had a way to determine that the expressions foo and foo[2] are part of a larger access. I could possibly search through the statements in the ir_function.body until I find the expression I'm looking at and perhaps piece together the context by looking for adjacent Access expressions, but that sounds crude. Please let me know if I've overlooked a nicer way to reconstruct that context.

@kvark
Copy link
Member

kvark commented Apr 11, 2021

Right, thank you for explaining this is great detail!

That works out because the wgsl front-end produces an Expression::Load if it sees that the root of the expression is a variable.

Please not that this isn't just WGSL frontend. The same goes for SPV-in.

The sour taste is because the current approach loads all of foo, then it loads all of foo[2], and then loads foo[2][2]

Is there much to be saved here though, with a sufficiently smart compiler? My understanding is that we'd still have to load foo, in which case trying to save the subsequent loads is already giving us diminishing returns. I.e. you can at most make it 50% less overhead.

If that's the case, let's not worry about this right now. We can keep an issue about possible overhead of constant arrays, when nested into other structures by value, and that's it.

@kvark
Copy link
Member

kvark commented Apr 15, 2021

@acowley any chance we can resume this and land?

Loading an element from a const array requires that memory be
allocated for that array. If a function makes use of a local or global
const array, we allocate a local variable into which we store the
array. Array accesses then load from that variable.
@acowley acowley marked this pull request as draft April 15, 2021 16:07
@acowley
Copy link
Contributor Author

acowley commented Apr 15, 2021

First: apologies that this is a larger PR than you probably want.

I think this now hits all of the complaints I had before about spurious allocations, but it needs more testing. It can do things like generate an OpAccessChain into an array field of a constant structure through a let binding to that field.

@kvark
Copy link
Member

kvark commented Apr 15, 2021

@acowley I am indeed very concerned about the complexity of this PR. More than 400 LOC of pure code to handle (or work around) the cases of constant arrays indexed dynamically. I'm sure we can approach this much simpler. Let me try to explain.

We just need two things:

  1. a generic mechanism of promoting expressions to variables
  2. detection for this case of Expression::Access on value arrays, which promotes them

For the performance considerations, the important thing to keep in mind that value arrays never appear out of nowhere. They need to be either constructed, or loaded from somewhere (e.g. a uniform buffer). In both cases, there is a copy of the data involved. Therefore, promoting it to a variable, and doing an extra OpStore, is not going to blow the performance away. It's just another copy of the data, so in the worst case it's 50% slower.

In practice, Vulkan drivers all run the local variable optimization pass, where they figure out which variables are not needed. So all this "promotion to a variable" will likely be resolved and eliminated by the driver, anyway. We should put more focus on clean and simple code as opposed to trying to design an elaborate minimum-overhead solution for this.

@acowley
Copy link
Contributor Author

acowley commented Apr 15, 2021

Okay. I'd have preferred to simplify the traversal logic which is why I moved the PR status back to Draft, but it's your call.

ETA: I took out the optimization which is why the old snapshots now fail.

@acowley
Copy link
Contributor Author

acowley commented Apr 15, 2021

I moved the Access-optimized version elsewhere as something closer to where I wanted this code to go. The bulk of the lines related to the optimization is in a couple traversal helpers on Naga IR that don't have anything to do with Access or SPIR-V.

I understand that we're really not on the same page regarding this feature, though, so I'm fine with closing this PR with no hard feelings to let you get on with it the way you'd like to see it done!

@acowley acowley closed this Apr 15, 2021
@kvark
Copy link
Member

kvark commented Apr 15, 2021

The bulk of the lines related to the optimization is in a couple traversal helpers on Naga IR that don't have anything to do with Access or SPIR-V.

I didn't realize this. Will review this PR and try to figure out what is related and what not.

@acowley
Copy link
Contributor Author

acowley commented Apr 15, 2021

I think it’s still more than you’re looking for. The diff I linked deals with finding roots of access chains, producing vars for those roots, building OpAccessChain instructions, and optimizing out code generation for any Access expressions that are intermediates of other access chains.

I support however you want to iterate on const array support, but, as a user, I’d be a bit disappointed if using var produced more efficient code than using let.

That the generated SPIR-V is noisier without some optimization is another minor consideration, and I understand your point that a driver is highly likely to optimize away the noise.

I appreciate that the alternative of producing a var for every Access and not chaining accesses would be less code in naga, and less code is appealing!

@kvark
Copy link
Member

kvark commented Apr 15, 2021

I'm going to try implementing it the way I described (unless you want to try this out?) and see how it fares in terms of code complexity and produced code quality. We can also run it through tools like Radeon GPU Analyser, and perhaps it will show the actual machine code after optimization.

@acowley
Copy link
Contributor Author

acowley commented Apr 16, 2021

Here's a simplified commit. It leaves intermediate OpAccessChain and OpLoad instructions for the downstream compiler to shake out, but otherwise does handle access chaining.

@kvark
Copy link
Member

kvark commented Apr 16, 2021

@acowley thank you! So do I understand correctly that the logic of traversing the access chain would only help for arrays that are nested? I took a stab at a naive implementation in #723, and I'm tempted to land it as a quick and dirty solution (doesn't require any extra state, well localized). It comes with a simple test, which I suspect is producing similar code to what this PR does (since the array is non-nested).

@acowley
Copy link
Contributor Author

acowley commented Apr 16, 2021

Yes, I checked against a few test cases, and for single array accesses the generated spv is essentially the same modulo the friendly name generation that I cribbed from elsewhere in the writer. Nested accesses of course look different in the spv, but I didn’t try to look at any disassembly from the driver.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Indexing constant arrays in SPIR-V
2 participants