-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reads in memory barrier srcAccessMask #131
Comments
I think I want to handle this at the same time as #128, by doing a wide review of the synchronization chapter + glossary terms. Does that seem reasonable to you? As I think you mentioned in #128, memory barriers are for cache flushing. That's it. Execution dependencies handle the rest. For whatever reason, we've stayed away from actually mentioning caches or even something a little more inclusive and abstract. Instead we've opted to say "visible" and "available", and the definition of "visible" is a circular reference back to the place it's used.. There are times when a "cache" might be on-chip registers or a physically distinct piece of memory (say, in a networked Vulkan implementation), so we can't just say "caches". I also spotted this gem in the glossary:
I don't even... I think there might be some serious problems with the synchronization language. Maybe I'll try to tackle them on the plane out to SF next week... |
I definitely agree the synchronisation section needs a lot of work, not just minor tweaks - I've been trying to write a more precise description of the behaviour myself to help me understand it, and there are a lot of ambiguities. (I can try to upload my version somewhere soonish). My immediate problem is with this example in the WSI spec which seems incompatible with my current understanding, and I can't tell whether that's because I'm misunderstanding or because the WSI spec authors were misunderstanding (or maybe they used to be correct then the spec changed incompatibly under them). I think the available/visible abstraction is fine as long as it's accurately defined. "Cache flush" is probably too concrete for the specification, but I think it gives a good intuition to help understand the abstract definitions. But in this case the abstraction is not that well defined, and the WSI spec conflicts with my intuition, so I get confused both ways! |
Ah, I missed this bit of your issue before. It's been a little while since I've looked at this but IIRC: The point here is that the pipeline barrier is mostly just defining memory dependencies. The execution dependency part is actually only saying that it's waiting on previous COLOR_ATTACHMENT_OUTPUT stages that have run (i.e. previous frames' output). The semaphore (or implicit queue ordering) from the present operation is what actually handles the execution dependency for the presentation. The combination of "wait on last frame at this point" and the (maybe implicit) semaphore is enough information for the implementation to wait appropriately. It's a little odd, and yea; not explained particularly well :/ |
It means that it has completed execution. Unlike writes, there's no need to "flush" it out (make it available), so a memory read is complete as soon as it has executed.
This is subtle enough that I had to dig back in the comments on old MRs to remember the intent. The first subtle bit is that the presentation engine is considered to be separate from the queue, so the actual presentation work does not occur on the queue, only the synchronization work and memory barriers execute on the queue. The second subtle bit is that the image memory barrier in this example is not a direct dependency from the presentation operation. Rather, it is a step in a dependency chain. That dependency chain is a presentation operation signaling a semaphore, the queue waiting on the semaphore with a waitmask of VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, then the image layout transition using a matching source mask of VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT to depend on the semaphore wait. This actually is documented in that note, but it's subtle and particularly if you look at the one dependency out of context it wouldn't make sense. Maybe we can find a way to make this more obvious.
Maybe I'm being dense, but I don't immediately see what's wrong with this definition. |
That sounds almost exactly like what execution dependencies are for. It seems weird and redundant for memory dependencies to have the same effect, given that you can't even express a memory dependency unless you've already got an execution dependency on that stage.
Ah, yes, I was confused by that example since I forgot it was doing a layout transition, so ignore the last paragraph in my original comment. I think the rest of my comment still applies, though. The execution dependencies all look fine. There's a semaphore from end of previous command buffer to vkQueuePresentKHR, then a semaphore from vkAcquireNextImageKHR to next command buffer's STAGE_COLOR_ATTACHMENT_OUTPUT. The next command buffer has a layout transition in STAGE_COLOR_ATTACHMENT_OUTPUT before any draw calls, so there's an execution dependency chain from semaphore to presentation to semaphore to transition to draw. The semaphore at the end of the first command buffer will provide an implicit memory dependency (I think), so its writes will be available for all subsequent commands. My problem is with the memory dependencies around the layout transition. The spec says:
I assume the layout transition is magic and doesn't need writes to be explicitly made visible to it, and doesn't need its own writes to be explicitly made available. (If it wasn't magic, and behaved like any other stage, I think you'd need three VkImageMemoryBarrier every time you do a layout transition: one to make visible to the transition, one to perform the transition, one to make available from the transition, in that order.) Writes from STAGE_COLOR_ATTACHMENT_OUTPUT in the first command buffer were made available ages ago (before the presentation engine read it), so we don't need to use srcAccessMask to make anything else available before the layout transition. Any potential write-after-read hazards are already solved by the execution dependencies. Therefore it's sufficient to say srcAccessMask = 0. There is no need to include anything else in there, contrary to the spec's claim that "srcAccessMask must include VK_ACCESS_MEMORY_READ_BIT". |
Actually, there's a good reason for that. Sometimes, you want to make sure that an operation completes, but you don't care about visibility. This would be for performing writes. If a prior command tries to read a memory location that a later command may write to, you need an execution dependency between the reading operation and the writing operation. Without one, it becomes possible for the write to happen and become visible before all reads were completed, thus corrupting the reading command. But you don't need a memory dependency, because you aren't trying to make the previous operation visible (since the previous operation was a write). So yes, if you need a memory dependency, you also need an execution dependency. But not vice versa. |
That's the reason to use an execution dependency when doing write-after-read, which is fine. But once you've got that execution dependency, what's the reason for adding a VkImageMemoryBarrier with srcAccessMask=VK_ACCESS_MEMORY_READ_BIT (or, more generally, any memory barrier with any READ in srcAccessMask)? I still don't see any situations where such a memory barrier could have any observable or meaningful difference to a similar memory barrier with srcAccessMask=0, but the spec text implies there is some meaning. |
Image memory barriers are useful among others for performing layout transitions. In that case, it's definitely important to correctly specify {src,dst}{Stage,Access}Masks even for read-only accesses, because the layout transition is logically a "write" operation on the image - it must occur after those accesses are complete, and before new accesses start. |
If you have a layout transition in an image memory barrier, the spec says there's an execution dependency from the srcStageMask of previous commands to the transition, and from the transition to the dstStageMask of subsequent commands. The first execution dependency is sufficient for the write-after-read hazard involving the transition's writes, so I believe it's still pointless and meaningless to put srcAccessMask=ANY_READ_BIT there. (You might need to put some WRITE bits in srcAccessMask to make previous commands' writes available to the transition's reads, but my original question is just about the READs.) |
Is there any resolution to this? I would like to have these answered:
|
We're still working through the minutiae, but we did discuss this a bit and it appears that they are indeed useless:
Source access masks are intended to determine visibility of write accesses, and nothing else - so yes.
I doubt it - but I don't speak for every implementation. Nobody spoke up about this last time we discussed it though. I don't really see how anyone could use it, because a bit's absence would conservatively be equivalent to the presence of the bit - adding the bit just confirms what you have to already assume.
It's likely unnecessary and doesn't need to be there. However, interactions with windowing systems are always more complicated than I think, so I'd like to hear from at least @cubanismo on the matter, just to make sure the WSI extensions are not using this to mean something it wasn't otherwise intended for. |
So we've now completely resolved that READ in srcAccessMask is completely a no-op. If we hadn't been previously implying that you needed it, then we'd probably make it invalid. Instead we're just going to leave it as a no-op, and hopefully add a warning to the validation layers that it's a no-op. This should be clear in the spec in the next couple of weeks. |
This should be fixed in the 1.0.35 spec update, as part of the synchronization language rewrite. |
https://www.khronos.org/registry/vulkan/specs/1.0/xhtml/vkspec.html#synchronization-execution-and-memory-dependencies says:
I think that's saying two things:
I don't understand the first part - what does it mean for a memory read to "complete" or to be "complete"? (I'm not even sure if it's an adjective or a verb here.)
I think it must mean something, because the spec later says:
and the WSI extension has an example:
so these are intentionally putting memory reads in srcAccessMask and expecting something to happen.
I suppose one interpretation is that it forbids reads from before the barrier being reordered relative to writes that are after the barrier - but the spec says:
which indicates those guarantees are provided by execution dependencies, not by memory barriers. So I still don't see what the memory barrier is meant to do.
(In the WSI example I also don't understand what
STAGE_COLOR_ATTACHMENT_OUTPUT
has got to do with reads by the presentation engine - that stage is apparently for writing fragments to the framebuffer, and the memory barrier should only influence memory accesses performed by that stage, which are completely different to memory accesses performed by the presentation engine (which doesn't even have a stage defined, as far as I can see).)The text was updated successfully, but these errors were encountered: