Reads in memory barrier srcAccessMask #131

philiptaylor · 2016-03-08T00:29:06Z

https://www.khronos.org/registry/vulkan/specs/1.0/xhtml/vkspec.html#synchronization-execution-and-memory-dependencies says:

The first half of the dependency makes memory accesses using the set of access types in srcAccessMask performed in pipeline stages in srcStageMask by the first set of commands complete and writes be available for subsequent commands.

I think that's saying two things:

It makes memory accesses, i.e. both reads and writes, complete
It makes memory writes be available for subsequent commands

I don't understand the first part - what does it mean for a memory read to "complete" or to be "complete"? (I'm not even sure if it's an adjective or a verb here.)

I think it must mean something, because the spec later says:

The vkCmdWaitEvents or vkCmdPipelineBarrier used to transition the image away from VK_IMAGE_LAYOUT_PRESENT_SRC_KHR layout must have dstStageMask and dstAccessMask parameters set based on the next use of the image. The srcAccessMask must include VK_ACCESS_MEMORY_READ_BIT to ensure that all prior reads by the presentation engine are complete before the image layout transition occurs.

and the WSI extension has an example:

If an image layout transition needs to be performed on a swapchain image before it is used in a framebuffer, that can be performed as the first operation submitted to the queue after acquiring the image, and should not prevent other work from overlapping with the presentation operation. For example, a VkImageMemoryBarrier could use:

srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT

srcAccessMask = VK_ACCESS_MEMORY_READ_BIT

so these are intentionally putting memory reads in srcAccessMask and expecting something to happen.

I suppose one interpretation is that it forbids reads from before the barrier being reordered relative to writes that are after the barrier - but the spec says:

Write-after-read and read-after-read hazards only require execution dependencies to synchronize.

which indicates those guarantees are provided by execution dependencies, not by memory barriers. So I still don't see what the memory barrier is meant to do.

(In the WSI example I also don't understand what STAGE_COLOR_ATTACHMENT_OUTPUT has got to do with reads by the presentation engine - that stage is apparently for writing fragments to the framebuffer, and the memory barrier should only influence memory accesses performed by that stage, which are completely different to memory accesses performed by the presentation engine (which doesn't even have a stage defined, as far as I can see).)

The text was updated successfully, but these errors were encountered:

ghost · 2016-03-08T10:23:58Z

I think I want to handle this at the same time as #128, by doing a wide review of the synchronization chapter + glossary terms. Does that seem reasonable to you?

As I think you mentioned in #128, memory barriers are for cache flushing. That's it. Execution dependencies handle the rest.

For whatever reason, we've stayed away from actually mentioning caches or even something a little more inclusive and abstract. Instead we've opted to say "visible" and "available", and the definition of "visible" is a circular reference back to the place it's used.. There are times when a "cache" might be on-chip registers or a physically distinct piece of memory (say, in a networked Vulkan implementation), so we can't just say "caches".

I also spotted this gem in the glossary:

Memory Dependency::
A sequence of operations that makes writes available, performs an execution dependency between the writes and subsequent accesses

I don't even... I think there might be some serious problems with the synchronization language. Maybe I'll try to tackle them on the plane out to SF next week...

philiptaylor · 2016-03-08T12:21:53Z

I definitely agree the synchronisation section needs a lot of work, not just minor tweaks - I've been trying to write a more precise description of the behaviour myself to help me understand it, and there are a lot of ambiguities. (I can try to upload my version somewhere soonish).

My immediate problem is with this example in the WSI spec which seems incompatible with my current understanding, and I can't tell whether that's because I'm misunderstanding or because the WSI spec authors were misunderstanding (or maybe they used to be correct then the spec changed incompatibly under them).

I think the available/visible abstraction is fine as long as it's accurately defined. "Cache flush" is probably too concrete for the specification, but I think it gives a good intuition to help understand the abstract definitions. But in this case the abstraction is not that well defined, and the WSI spec conflicts with my intuition, so I get confused both ways!

ghost · 2016-03-08T14:54:26Z

(In the WSI example I also don't understand what STAGE_COLOR_ATTACHMENT_OUTPUT has got to do with reads by the presentation engine - that stage is apparently for writing fragments to the framebuffer, and the memory barrier should only influence memory accesses performed by that stage, which are completely different to memory accesses performed by the presentation engine (which doesn't even have a stage defined, as far as I can see).)

Ah, I missed this bit of your issue before. It's been a little while since I've looked at this but IIRC:

The point here is that the pipeline barrier is mostly just defining memory dependencies. The execution dependency part is actually only saying that it's waiting on previous COLOR_ATTACHMENT_OUTPUT stages that have run (i.e. previous frames' output). The semaphore (or implicit queue ordering) from the present operation is what actually handles the execution dependency for the presentation. The combination of "wait on last frame at this point" and the (maybe implicit) semaphore is enough information for the implementation to wait appropriately. It's a little odd, and yea; not explained particularly well :/

jeffbolznv · 2016-03-08T15:21:45Z

I don't understand the first part - what does it mean for a memory read to "complete" or to be "complete"?

It means that it has completed execution. Unlike writes, there's no need to "flush" it out (make it available), so a memory read is complete as soon as it has executed.

In the WSI example I also don't understand what STAGE_COLOR_ATTACHMENT_OUTPUT has got to do with reads by the presentation engine

This is subtle enough that I had to dig back in the comments on old MRs to remember the intent. The first subtle bit is that the presentation engine is considered to be separate from the queue, so the actual presentation work does not occur on the queue, only the synchronization work and memory barriers execute on the queue.

The second subtle bit is that the image memory barrier in this example is not a direct dependency from the presentation operation. Rather, it is a step in a dependency chain. That dependency chain is a presentation operation signaling a semaphore, the queue waiting on the semaphore with a waitmask of VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, then the image layout transition using a matching source mask of VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT to depend on the semaphore wait.

This actually is documented in that note, but it's subtle and particularly if you look at the one dependency out of context it wouldn't make sense. Maybe we can find a way to make this more obvious.

I also spotted this gem in the glossary:
Memory Dependency:: ...

Maybe I'm being dense, but I don't immediately see what's wrong with this definition.

philiptaylor · 2016-03-08T22:20:58Z

I don't understand the first part - what does it mean for a memory read to "complete" or to be "complete"?

It means that it has completed execution.

That sounds almost exactly like what execution dependencies are for. It seems weird and redundant for memory dependencies to have the same effect, given that you can't even express a memory dependency unless you've already got an execution dependency on that stage.

the image memory barrier in this example is not a direct dependency from the presentation operation. Rather, it is a step in a dependency chain.

Ah, yes, I was confused by that example since I forgot it was doing a layout transition, so ignore the last paragraph in my original comment. I think the rest of my comment still applies, though.

The execution dependencies all look fine. There's a semaphore from end of previous command buffer to vkQueuePresentKHR, then a semaphore from vkAcquireNextImageKHR to next command buffer's STAGE_COLOR_ATTACHMENT_OUTPUT. The next command buffer has a layout transition in STAGE_COLOR_ATTACHMENT_OUTPUT before any draw calls, so there's an execution dependency chain from semaphore to presentation to semaphore to transition to draw. The semaphore at the end of the first command buffer will provide an implicit memory dependency (I think), so its writes will be available for all subsequent commands.

My problem is with the memory dependencies around the layout transition.

The spec says:

When an image memory barrier includes a layout transition, the barrier first makes writes via srcStageMask and srcAccessMask available, then performs the layout transition, then makes the contents of the image subresource(s) in the new layout visible to memory accesses in dstStageMask and dstAccessMask, as if there is an execution and memory dependency between the source masks and the transition, as well as between the transition and the destination masks.

I assume the layout transition is magic and doesn't need writes to be explicitly made visible to it, and doesn't need its own writes to be explicitly made available. (If it wasn't magic, and behaved like any other stage, I think you'd need three VkImageMemoryBarrier every time you do a layout transition: one to make visible to the transition, one to perform the transition, one to make available from the transition, in that order.)

Writes from STAGE_COLOR_ATTACHMENT_OUTPUT in the first command buffer were made available ages ago (before the presentation engine read it), so we don't need to use srcAccessMask to make anything else available before the layout transition. Any potential write-after-read hazards are already solved by the execution dependencies. Therefore it's sufficient to say srcAccessMask = 0. There is no need to include anything else in there, contrary to the spec's claim that "srcAccessMask must include VK_ACCESS_MEMORY_READ_BIT".

NicolBolas · 2016-03-09T14:30:05Z

@philiptaylor

That sounds almost exactly like what execution dependencies are for. It seems weird and redundant for memory dependencies to have the same effect, given that you can't even express a memory dependency unless you've already got an execution dependency on that stage.

Actually, there's a good reason for that. Sometimes, you want to make sure that an operation completes, but you don't care about visibility. This would be for performing writes.

If a prior command tries to read a memory location that a later command may write to, you need an execution dependency between the reading operation and the writing operation. Without one, it becomes possible for the write to happen and become visible before all reads were completed, thus corrupting the reading command.

But you don't need a memory dependency, because you aren't trying to make the previous operation visible (since the previous operation was a write).

So yes, if you need a memory dependency, you also need an execution dependency. But not vice versa.

philiptaylor · 2016-03-09T19:50:07Z

Actually, there's a good reason for that. Sometimes, you want to make sure that an operation completes, but you don't care about visibility. This would be for performing writes.

That's the reason to use an execution dependency when doing write-after-read, which is fine. But once you've got that execution dependency, what's the reason for adding a VkImageMemoryBarrier with srcAccessMask=VK_ACCESS_MEMORY_READ_BIT (or, more generally, any memory barrier with any READ in srcAccessMask)?

I still don't see any situations where such a memory barrier could have any observable or meaningful difference to a similar memory barrier with srcAccessMask=0, but the spec text implies there is some meaning.

pimanttr · 2016-03-09T21:02:11Z

Image memory barriers are useful among others for performing layout transitions. In that case, it's definitely important to correctly specify {src,dst}{Stage,Access}Masks even for read-only accesses, because the layout transition is logically a "write" operation on the image - it must occur after those accesses are complete, and before new accesses start.

philiptaylor · 2016-03-09T21:17:30Z

If you have a layout transition in an image memory barrier, the spec says there's an execution dependency from the srcStageMask of previous commands to the transition, and from the transition to the dstStageMask of subsequent commands. The first execution dependency is sufficient for the write-after-read hazard involving the transition's writes, so I believe it's still pointless and meaningless to put srcAccessMask=ANY_READ_BIT there. (You might need to put some WRITE bits in srcAccessMask to make previous commands' writes available to the transition's reads, but my original question is just about the READs.)

krOoze · 2016-09-21T15:01:57Z

Is there any resolution to this?

I would like to have these answered:

Can any *_READ flag in srcAccess be just substituted for 0 (in all situations)?
(if so) Is there some expected benefit if the appropriate *_READ flags are provided anyway?
(if so) why does the hand-off from presentation prescribes there must be VK_ACCESS_MEMORY_READ_BIT in srcAccess (and does not allow just 0)?

ghost · 2016-09-21T16:52:42Z

@krOoze

We're still working through the minutiae, but we did discuss this a bit and it appears that they are indeed useless:

Can any *_READ flag in srcAccess be just substituted for 0 (in all situations)?

Source access masks are intended to determine visibility of write accesses, and nothing else - so yes.

(if so) Is there some expected benefit if the appropriate *_READ flags are provided anyway?

I doubt it - but I don't speak for every implementation. Nobody spoke up about this last time we discussed it though. I don't really see how anyone could use it, because a bit's absence would conservatively be equivalent to the presence of the bit - adding the bit just confirms what you have to already assume.

(if so) why does the hand-off from presentation prescribes there must be VK_ACCESS_MEMORY_READ_BIT in srcAccess (and does not allow just 0)?

It's likely unnecessary and doesn't need to be there. However, interactions with windowing systems are always more complicated than I think, so I'd like to hear from at least @cubanismo on the matter, just to make sure the WSI extensions are not using this to mean something it wasn't otherwise intended for.

ghost · 2016-11-15T17:43:58Z

So we've now completely resolved that READ in srcAccessMask is completely a no-op. If we hadn't been previously implying that you needed it, then we'd probably make it invalid. Instead we're just going to leave it as a no-op, and hopefully add a warning to the validation layers that it's a no-op. This should be clear in the spec in the next couple of weeks.

oddhack · 2016-11-26T10:48:59Z

This should be fixed in the 1.0.35 spec update, as part of the synchronization language rewrite.

See: - KhronosGroup/Vulkan-Docs#131 - KhronosGroup/Vulkan-Docs#131 (comment)

ghost mentioned this issue Mar 8, 2016

Memory barriers on TOP_OF_PIPE #128

Closed

ghost closed this as completed Mar 9, 2016

ghost reopened this Mar 9, 2016

philiptaylor mentioned this issue Mar 10, 2016

Synchronization is impossible to understand #132

Closed

oddhack added Specification Resolving Inside Khronos labels Mar 21, 2016

oddhack assigned ghost Mar 21, 2016

oddhack closed this as completed Nov 26, 2016

JesseRMeyer mentioned this issue Mar 15, 2020

Consider removing *_READ flag in VkSubpassDependency's srcAccess for acquire and present. Overv/VulkanTutorial#168

Closed

vmilea mentioned this issue Nov 9, 2020

Redundant access flags for implicit subpass dependencies #1389

Closed

httpdigest added a commit to LWJGL/lwjgl3-demos that referenced this issue Sep 10, 2021

refactor: remove redundant READ bit in memory barrier srcAccessMask

ce76be3

See: - KhronosGroup/Vulkan-Docs#131 - KhronosGroup/Vulkan-Docs#131 (comment)

mikusp mentioned this issue Dec 24, 2024

renderer_vulkan: Cleanup and improve barriers in caches shadps4-emu/shadPS4#1865

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reads in memory barrier srcAccessMask #131

Reads in memory barrier srcAccessMask #131

philiptaylor commented Mar 8, 2016

ghost commented Mar 8, 2016

philiptaylor commented Mar 8, 2016

ghost commented Mar 8, 2016

jeffbolznv commented Mar 8, 2016

philiptaylor commented Mar 8, 2016

NicolBolas commented Mar 9, 2016

philiptaylor commented Mar 9, 2016

pimanttr commented Mar 9, 2016

philiptaylor commented Mar 9, 2016

krOoze commented Sep 21, 2016

ghost commented Sep 21, 2016 •

edited by ghost

Loading

ghost commented Nov 15, 2016

oddhack commented Nov 26, 2016

Reads in memory barrier srcAccessMask #131

Reads in memory barrier srcAccessMask #131

Comments

philiptaylor commented Mar 8, 2016

ghost commented Mar 8, 2016

philiptaylor commented Mar 8, 2016

ghost commented Mar 8, 2016

jeffbolznv commented Mar 8, 2016

philiptaylor commented Mar 8, 2016

NicolBolas commented Mar 9, 2016

philiptaylor commented Mar 9, 2016

pimanttr commented Mar 9, 2016

philiptaylor commented Mar 9, 2016

krOoze commented Sep 21, 2016

ghost commented Sep 21, 2016 • edited by ghost Loading

ghost commented Nov 15, 2016

oddhack commented Nov 26, 2016

ghost commented Sep 21, 2016 •

edited by ghost

Loading