[spv-out] support const arrays #669

acowley · 2021-04-07T16:32:35Z

Fixes #635

Loading an element from a const array requires that memory be allocated for that array. If a function makes use of a local or global const array, we allocate a local variable into which we store the array. Array accesses then load from that variable.

kvark · 2021-04-07T16:43:58Z

src/back/spv/writer.rs

@@ -439,6 +441,44 @@ impl Writer {
            results: Vec::new(),
        };

+        for (handle, expression) in ir_function.expressions.iter() {
+            match expression {


let's match *expression instead, like in all of the other places

kvark · 2021-04-07T16:57:22Z

src/back/spv/writer.rs

@@ -439,6 +441,44 @@ impl Writer {
            results: Vec::new(),
        };

+        for (handle, expression) in ir_function.expressions.iter() {
+            match expression {
+                &crate::Expression::Compose { ty, .. } => {


As discussed on the Matrix, this code is overly generic.
The problem we need to solve is about dynamic indexing of arrays that are given by values (i.e. expressions or constants).

I think I have narrowed it down at this point. Thank you for identifying this so quickly!

kvark

I'm afraid this needs another major iteration. I expressed some ideas below.

kvark · 2021-04-07T18:17:14Z

src/back/spv/writer.rs

+                        function.local_const_arrays.insert(handle, id);
+                    }
+                }
+                crate::Expression::Constant(h) => match ir_module.constants[h].inner {


We probably want to check that h isn't already registered. I don't think there is a rule in the IR that prevents multiple Expression::Constant pointing to the same thing.

kvark · 2021-04-07T18:20:15Z

src/back/spv/writer.rs

+                    block
+                        .body
+                        .push(Instruction::store(var_id, composite_id, None));
+                    var_id


I don't think we can return var_id here. The type of the returned SPIR-V is still expected to be a value, not a pointer. It seems to me that we should always return composite_id from this block?

kvark · 2021-04-07T18:21:44Z

src/back/spv/writer.rs

@@ -1375,7 +1450,20 @@ impl Writer {
                }
            }
            crate::Expression::GlobalVariable(handle) => self.global_variables[handle.index()].id,
-            crate::Expression::Constant(handle) => self.constant_ids[handle.index()],
+            crate::Expression::Constant(handle) => {


in the near future, when we know how each constant is used by each function - #656 - we'll be able to do this at the beginning of the function only. Maybe we can leave a small TODO pointing to this issue?

kvark · 2021-04-07T18:22:25Z

src/back/spv/writer.rs

+                        block
+                            .body
+                            .push(Instruction::store(var_id, composite_id, None));
+                        var_id


similarly to the other case, I don't think we can return var_id here, since it has a different type

kvark · 2021-04-07T18:32:21Z

src/back/spv/writer.rs

+                            let id = self.id_gen.next();
+                            let var_id = match ir_function.expressions[base] {
+                                crate::Expression::Constant(..) => self.cached[base],
+                                crate::Expression::Compose { .. } => self.cached[base],


Unfortunately, this is incomplete: there other ways to get an array-typed expression:

a Expression::FunctionArgument that happens to be an array

a Expression::Load on a pointer to an array

an Expression::Access* on a structure (or other composite) that contains an array

There needs to be a more general approach here...

Here is a rough idea. Let's have a generic mechanism here for promoting expressions to variables. If an expression is promoted, then the Emit of it becomes an OpStore, and any use of it becomes an OpLoad, other than the Access/AccessIndex uses, which do OpLoad after issuing OpAccessChain.

We could build a FastHashMap<Handle<crate::Expression>, Word> for storing the variable ids, similar to what the PR does.

acowley · 2021-04-09T13:34:28Z

I've been making some headway on this, and noticed something: The WGSL front-end always generates an Access expression when it encounters an array indexing expression. I think this could be an AccessIndex if the index is a literal. Is this something that could be improved, or do we want array indexing to always go through Access?

kvark · 2021-04-09T13:36:32Z

That's a great question! I believe we want to generate AccessIndex if we can, since it makes the bounds checking easier (i.e. does it at compile time instead of run-time).

acowley · 2021-04-10T07:39:14Z

I have several test cases working now, including const arrays of arrays, but a problem is that I haven't found an opportunity to produce a single multi-step OpAccessChain instruction for something like foo[2][2] where const foo: array<array<f32,3>,3>.

With non-const vars,OpAccessChain generation is triggered off of a Load expression in the syntax tree produced by the WGSL front-end. But with consts, the back-end ends up seeing an Access for the foo[2] sub-expression and then another Access for the outer [2] indexing. What happens, then, is that we store a global constant foo into a local variable we synthesize. Next we load and store foo[2] into another variable we synthesize, and then we load from that variable to obtain the needed f32 value.

With non-const variables, the spv back-end caches an id of 0 when the base of an Access is a variable, or if the cached id of the base is 0. That pushes information forward so that nothing is generated for the chain of array indexing expressions. The trailing Load expression produced by the front-end encodes the information needed to produce a multi-step OpAccessChain.

It's an interesting problem! It took me a while to figure out that coupling between the front- and back-ends with the Load expression. I'm going to think about it some more, but if the above description rings any bells I'm open to suggestions. I've thus far kept all changes to the spv back-end, but perhaps it will need help from the front-end to have more context regarding nested access if we want to generate nicer code.

kvark · 2021-04-11T03:33:14Z

Thank you for digging into this! It's a complex topic, and I'm happy that you aren't giving up easily 👍

What happens, then, is that we store a global constant foo into a local variable we synthesize. Next we load and store foo[2] into another variable we synthesize, and then we load from that variable to obtain the needed f32 value.

Are you concerned about having to load the whole foo array in this case, while we know we should only be loading foo[2][2], i.e. just an efficiency concern? I think we can worry about this later, for now we just need to have the simplest workaround for this to work correctly.

With non-const variables, the spv back-end caches an id of 0

Could you clarify what is 0 here?

I'm going to think about it some more, but if the above description rings any bells I'm open to suggestions.

Did you consider a path where we just promote any expression (containing a fixed-size array) into a variable?

perhaps it will need help from the front-end to have more context regarding nested access if we want to generate nicer code.

I'm not sure what extra help or information the frontends could provide in this case.

acowley · 2021-04-11T05:27:35Z

Are you concerned about having to load the whole foo array in this case, while we know we should only be loading foo[2][2], i.e. just an efficiency concern? I think we can worry about this later, for now we just need to have the simplest workaround for this to work correctly.

Yes, it's an efficiency concern. The sour taste is because the current approach loads all of foo, then it loads all of foo[2], and then loads foo[2][2]. Loading the third element of foo (i.e. foo[2]) is the particular disappointment.

I'm sorry that I wasn't clear enough with my last message. The issue with building an OpAccessChain with multiple indices is that the approach of this PR is to promote expressions to have an attached variable in the SPV writer. When we are evaluating an expression in cache_expression_value in the writer, we see the sub-expressions in the order foo, foo[2], and then foo[2][2]. Each of the first two could plausibly prompt us to promote them to variables. For example, if the program said var row3: array<f32,3> = foo[2];, then it make sense that we be able to load foo[2] and store it to row3.

The challenge with generating a multi-level OpAccessChain is that by the time we see the full foo[2][2] in cache_expression_value we have already generated loads for the intermediate sub-expressions.

For the var case, the cached value is set to zero when the base of the access is a variable or the cached value of the base is zero. We don't really do anything at all for Access expressions on a var root in the SPV writer.

That works out because the wgsl front-end produces an Expression::Load if it sees that the root of the expression is a variable. Crucially, it is able to parse a contiguous sequence of accesses and only generate a single Load at the end of that sequence.

It is that Load that produces the desired OpAccessChain in cache_expression_value.

To be clear, there is nothing wrong with how things are currently handled; it nicely sidesteps the challenge of the writer visiting expressions inside-out: that is, the writer sees the innermost foo, followed by foo[2], followed by foo[2][2] causing us to wastefully generate loads for those first two sub-expressions in the const array case. Arrays in vars do not deal with this at all (cf the 0 values in the cache), instead relying on the Load the WGSL front-end produces to generate an efficient OpAccessChain instruction.

Having the WGSL front-end produce a Load for the const array case isn't entirely trivial as far as I can see because we only make the pointers (the vars) in the spv writer as per this PR, so the typifier isn't happy with being given a Load of an array type, for example.

One approach would be if we had a way to determine that the expressions foo and foo[2] are part of a larger access. I could possibly search through the statements in the ir_function.body until I find the expression I'm looking at and perhaps piece together the context by looking for adjacent Access expressions, but that sounds crude. Please let me know if I've overlooked a nicer way to reconstruct that context.

kvark · 2021-04-11T05:51:13Z

Right, thank you for explaining this is great detail!

That works out because the wgsl front-end produces an Expression::Load if it sees that the root of the expression is a variable.

Please not that this isn't just WGSL frontend. The same goes for SPV-in.

The sour taste is because the current approach loads all of foo, then it loads all of foo[2], and then loads foo[2][2]

Is there much to be saved here though, with a sufficiently smart compiler? My understanding is that we'd still have to load foo, in which case trying to save the subsequent loads is already giving us diminishing returns. I.e. you can at most make it 50% less overhead.

If that's the case, let's not worry about this right now. We can keep an issue about possible overhead of constant arrays, when nested into other structures by value, and that's it.

kvark · 2021-04-15T13:46:49Z

@acowley any chance we can resume this and land?

Loading an element from a const array requires that memory be allocated for that array. If a function makes use of a local or global const array, we allocate a local variable into which we store the array. Array accesses then load from that variable.

acowley · 2021-04-15T16:11:00Z

First: apologies that this is a larger PR than you probably want.

I think this now hits all of the complaints I had before about spurious allocations, but it needs more testing. It can do things like generate an OpAccessChain into an array field of a constant structure through a let binding to that field.

kvark · 2021-04-15T16:21:01Z

@acowley I am indeed very concerned about the complexity of this PR. More than 400 LOC of pure code to handle (or work around) the cases of constant arrays indexed dynamically. I'm sure we can approach this much simpler. Let me try to explain.

We just need two things:

a generic mechanism of promoting expressions to variables
detection for this case of Expression::Access on value arrays, which promotes them

For the performance considerations, the important thing to keep in mind that value arrays never appear out of nowhere. They need to be either constructed, or loaded from somewhere (e.g. a uniform buffer). In both cases, there is a copy of the data involved. Therefore, promoting it to a variable, and doing an extra OpStore, is not going to blow the performance away. It's just another copy of the data, so in the worst case it's 50% slower.

In practice, Vulkan drivers all run the local variable optimization pass, where they figure out which variables are not needed. So all this "promotion to a variable" will likely be resolved and eliminated by the driver, anyway. We should put more focus on clean and simple code as opposed to trying to design an elaborate minimum-overhead solution for this.

acowley · 2021-04-15T16:41:54Z

Okay. I'd have preferred to simplify the traversal logic which is why I moved the PR status back to Draft, but it's your call.

ETA: I took out the optimization which is why the old snapshots now fail.

acowley · 2021-04-15T18:11:25Z

I moved the Access-optimized version elsewhere as something closer to where I wanted this code to go. The bulk of the lines related to the optimization is in a couple traversal helpers on Naga IR that don't have anything to do with Access or SPIR-V.

I understand that we're really not on the same page regarding this feature, though, so I'm fine with closing this PR with no hard feelings to let you get on with it the way you'd like to see it done!

kvark · 2021-04-15T18:25:14Z

The bulk of the lines related to the optimization is in a couple traversal helpers on Naga IR that don't have anything to do with Access or SPIR-V.

I didn't realize this. Will review this PR and try to figure out what is related and what not.

acowley · 2021-04-15T18:54:58Z

I think it’s still more than you’re looking for. The diff I linked deals with finding roots of access chains, producing vars for those roots, building OpAccessChain instructions, and optimizing out code generation for any Access expressions that are intermediates of other access chains.

I support however you want to iterate on const array support, but, as a user, I’d be a bit disappointed if using var produced more efficient code than using let.

That the generated SPIR-V is noisier without some optimization is another minor consideration, and I understand your point that a driver is highly likely to optimize away the noise.

I appreciate that the alternative of producing a var for every Access and not chaining accesses would be less code in naga, and less code is appealing!

kvark · 2021-04-15T21:06:12Z

I'm going to try implementing it the way I described (unless you want to try this out?) and see how it fares in terms of code complexity and produced code quality. We can also run it through tools like Radeon GPU Analyser, and perhaps it will show the actual machine code after optimization.

acowley · 2021-04-16T02:21:48Z

Here's a simplified commit. It leaves intermediate OpAccessChain and OpLoad instructions for the downstream compiler to shake out, but otherwise does handle access chaining.

kvark · 2021-04-16T03:33:04Z

@acowley thank you! So do I understand correctly that the logic of traversing the access chain would only help for arrays that are nested? I took a stab at a naive implementation in #723, and I'm tempted to land it as a quick and dirty solution (doesn't require any extra state, well localized). It comes with a simple test, which I suspect is producing similar code to what this PR does (since the array is non-nested).

acowley · 2021-04-16T05:30:38Z

Yes, I checked against a few test cases, and for single array accesses the generated spv is essentially the same modulo the friendly name generation that I cribbed from elsewhere in the writer. Nested accesses of course look different in the spv, but I didn’t try to look at any disassembly from the driver.

acowley force-pushed the const-array branch 2 times, most recently from 819e485 to b0623af Compare April 7, 2021 16:43

kvark suggested changes Apr 7, 2021

View reviewed changes

acowley marked this pull request as ready for review April 7, 2021 18:05

kvark suggested changes Apr 7, 2021

View reviewed changes

[spv-out] support const arrays

4116115

Loading an element from a const array requires that memory be allocated for that array. If a function makes use of a local or global const array, we allocate a local variable into which we store the array. Array accesses then load from that variable.

acowley force-pushed the const-array branch from 5a7253a to 4116115 Compare April 15, 2021 16:06

acowley marked this pull request as draft April 15, 2021 16:07

Address clippy issues

7817345

clippy: pattern matches and return type simplification

2c3ec7a

acowley force-pushed the const-array branch from b11cf9a to 2c3ec7a Compare April 15, 2021 16:34

Include spurious loads to keep code simple

ab9a108

clippy: more pattern matching

3a337b5

acowley force-pushed the const-array branch from 6c88ed0 to 3a337b5 Compare April 15, 2021 16:49

Remove debugging

83e647e

acowley closed this Apr 15, 2021

kvark mentioned this pull request Apr 16, 2021

[spv-out] implement array value indexing #723

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[spv-out] support const arrays #669

[spv-out] support const arrays #669

acowley commented Apr 7, 2021 •

edited by kvark

Loading

kvark Apr 7, 2021

acowley Apr 7, 2021

kvark Apr 7, 2021

acowley Apr 7, 2021

kvark left a comment

kvark Apr 7, 2021

kvark Apr 7, 2021

kvark Apr 7, 2021

kvark Apr 7, 2021

kvark Apr 7, 2021

acowley commented Apr 9, 2021

kvark commented Apr 9, 2021

acowley commented Apr 10, 2021

kvark commented Apr 11, 2021

acowley commented Apr 11, 2021 •

edited

Loading

kvark commented Apr 11, 2021

kvark commented Apr 15, 2021

acowley commented Apr 15, 2021

kvark commented Apr 15, 2021

acowley commented Apr 15, 2021 •

edited

Loading

acowley commented Apr 15, 2021

kvark commented Apr 15, 2021

acowley commented Apr 15, 2021

kvark commented Apr 15, 2021

acowley commented Apr 16, 2021

kvark commented Apr 16, 2021

acowley commented Apr 16, 2021

[spv-out] support const arrays #669

[spv-out] support const arrays #669

Conversation

acowley commented Apr 7, 2021 • edited by kvark Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kvark left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acowley commented Apr 9, 2021

kvark commented Apr 9, 2021

acowley commented Apr 10, 2021

kvark commented Apr 11, 2021

acowley commented Apr 11, 2021 • edited Loading

kvark commented Apr 11, 2021

kvark commented Apr 15, 2021

acowley commented Apr 15, 2021

kvark commented Apr 15, 2021

acowley commented Apr 15, 2021 • edited Loading

acowley commented Apr 15, 2021

kvark commented Apr 15, 2021

acowley commented Apr 15, 2021

kvark commented Apr 15, 2021

acowley commented Apr 16, 2021

kvark commented Apr 16, 2021

acowley commented Apr 16, 2021

acowley commented Apr 7, 2021 •

edited by kvark

Loading

acowley commented Apr 11, 2021 •

edited

Loading

acowley commented Apr 15, 2021 •

edited

Loading