Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reads in memory barrier srcAccessMask #131

Closed
philiptaylor opened this issue Mar 8, 2016 · 13 comments
Closed

Reads in memory barrier srcAccessMask #131

philiptaylor opened this issue Mar 8, 2016 · 13 comments

Comments

@philiptaylor
Copy link
Contributor

https://www.khronos.org/registry/vulkan/specs/1.0/xhtml/vkspec.html#synchronization-execution-and-memory-dependencies says:

The first half of the dependency makes memory accesses using the set of access types in srcAccessMask performed in pipeline stages in srcStageMask by the first set of commands complete and writes be available for subsequent commands.

I think that's saying two things:

  • It makes memory accesses, i.e. both reads and writes, complete
  • It makes memory writes be available for subsequent commands

I don't understand the first part - what does it mean for a memory read to "complete" or to be "complete"? (I'm not even sure if it's an adjective or a verb here.)

I think it must mean something, because the spec later says:

The vkCmdWaitEvents or vkCmdPipelineBarrier used to transition the image away from VK_IMAGE_LAYOUT_PRESENT_SRC_KHR layout must have dstStageMask and dstAccessMask parameters set based on the next use of the image. The srcAccessMask must include VK_ACCESS_MEMORY_READ_BIT to ensure that all prior reads by the presentation engine are complete before the image layout transition occurs.

and the WSI extension has an example:

If an image layout transition needs to be performed on a swapchain image before it is used in a framebuffer, that can be performed as the first operation submitted to the queue after acquiring the image, and should not prevent other work from overlapping with the presentation operation. For example, a VkImageMemoryBarrier could use:

  • srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT
  • srcAccessMask = VK_ACCESS_MEMORY_READ_BIT

so these are intentionally putting memory reads in srcAccessMask and expecting something to happen.

I suppose one interpretation is that it forbids reads from before the barrier being reordered relative to writes that are after the barrier - but the spec says:

Write-after-read and read-after-read hazards only require execution dependencies to synchronize.

which indicates those guarantees are provided by execution dependencies, not by memory barriers. So I still don't see what the memory barrier is meant to do.

(In the WSI example I also don't understand what STAGE_COLOR_ATTACHMENT_OUTPUT has got to do with reads by the presentation engine - that stage is apparently for writing fragments to the framebuffer, and the memory barrier should only influence memory accesses performed by that stage, which are completely different to memory accesses performed by the presentation engine (which doesn't even have a stage defined, as far as I can see).)

@ghost
Copy link

ghost commented Mar 8, 2016

I think I want to handle this at the same time as #128, by doing a wide review of the synchronization chapter + glossary terms. Does that seem reasonable to you?

As I think you mentioned in #128, memory barriers are for cache flushing. That's it. Execution dependencies handle the rest.

For whatever reason, we've stayed away from actually mentioning caches or even something a little more inclusive and abstract. Instead we've opted to say "visible" and "available", and the definition of "visible" is a circular reference back to the place it's used.. There are times when a "cache" might be on-chip registers or a physically distinct piece of memory (say, in a networked Vulkan implementation), so we can't just say "caches".

I also spotted this gem in the glossary:

Memory Dependency::
A sequence of operations that makes writes available, performs an execution dependency between the writes and subsequent accesses

I don't even... I think there might be some serious problems with the synchronization language. Maybe I'll try to tackle them on the plane out to SF next week...

@philiptaylor
Copy link
Contributor Author

I definitely agree the synchronisation section needs a lot of work, not just minor tweaks - I've been trying to write a more precise description of the behaviour myself to help me understand it, and there are a lot of ambiguities. (I can try to upload my version somewhere soonish).

My immediate problem is with this example in the WSI spec which seems incompatible with my current understanding, and I can't tell whether that's because I'm misunderstanding or because the WSI spec authors were misunderstanding (or maybe they used to be correct then the spec changed incompatibly under them).

I think the available/visible abstraction is fine as long as it's accurately defined. "Cache flush" is probably too concrete for the specification, but I think it gives a good intuition to help understand the abstract definitions. But in this case the abstraction is not that well defined, and the WSI spec conflicts with my intuition, so I get confused both ways!

@ghost
Copy link

ghost commented Mar 8, 2016

(In the WSI example I also don't understand what STAGE_COLOR_ATTACHMENT_OUTPUT has got to do with reads by the presentation engine - that stage is apparently for writing fragments to the framebuffer, and the memory barrier should only influence memory accesses performed by that stage, which are completely different to memory accesses performed by the presentation engine (which doesn't even have a stage defined, as far as I can see).)

Ah, I missed this bit of your issue before. It's been a little while since I've looked at this but IIRC:

The point here is that the pipeline barrier is mostly just defining memory dependencies. The execution dependency part is actually only saying that it's waiting on previous COLOR_ATTACHMENT_OUTPUT stages that have run (i.e. previous frames' output). The semaphore (or implicit queue ordering) from the present operation is what actually handles the execution dependency for the presentation. The combination of "wait on last frame at this point" and the (maybe implicit) semaphore is enough information for the implementation to wait appropriately. It's a little odd, and yea; not explained particularly well :/

@jeffbolznv
Copy link
Contributor

I don't understand the first part - what does it mean for a memory read to "complete" or to be "complete"?

It means that it has completed execution. Unlike writes, there's no need to "flush" it out (make it available), so a memory read is complete as soon as it has executed.

In the WSI example I also don't understand what STAGE_COLOR_ATTACHMENT_OUTPUT has got to do with reads by the presentation engine

This is subtle enough that I had to dig back in the comments on old MRs to remember the intent. The first subtle bit is that the presentation engine is considered to be separate from the queue, so the actual presentation work does not occur on the queue, only the synchronization work and memory barriers execute on the queue.

The second subtle bit is that the image memory barrier in this example is not a direct dependency from the presentation operation. Rather, it is a step in a dependency chain. That dependency chain is a presentation operation signaling a semaphore, the queue waiting on the semaphore with a waitmask of VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, then the image layout transition using a matching source mask of VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT to depend on the semaphore wait.

This actually is documented in that note, but it's subtle and particularly if you look at the one dependency out of context it wouldn't make sense. Maybe we can find a way to make this more obvious.

I also spotted this gem in the glossary:
Memory Dependency:: ...

Maybe I'm being dense, but I don't immediately see what's wrong with this definition.

@philiptaylor
Copy link
Contributor Author

I don't understand the first part - what does it mean for a memory read to "complete" or to be "complete"?

It means that it has completed execution.

That sounds almost exactly like what execution dependencies are for. It seems weird and redundant for memory dependencies to have the same effect, given that you can't even express a memory dependency unless you've already got an execution dependency on that stage.

the image memory barrier in this example is not a direct dependency from the presentation operation. Rather, it is a step in a dependency chain.

Ah, yes, I was confused by that example since I forgot it was doing a layout transition, so ignore the last paragraph in my original comment. I think the rest of my comment still applies, though.

The execution dependencies all look fine. There's a semaphore from end of previous command buffer to vkQueuePresentKHR, then a semaphore from vkAcquireNextImageKHR to next command buffer's STAGE_COLOR_ATTACHMENT_OUTPUT. The next command buffer has a layout transition in STAGE_COLOR_ATTACHMENT_OUTPUT before any draw calls, so there's an execution dependency chain from semaphore to presentation to semaphore to transition to draw. The semaphore at the end of the first command buffer will provide an implicit memory dependency (I think), so its writes will be available for all subsequent commands.

My problem is with the memory dependencies around the layout transition.

The spec says:

When an image memory barrier includes a layout transition, the barrier first makes writes via srcStageMask and srcAccessMask available, then performs the layout transition, then makes the contents of the image subresource(s) in the new layout visible to memory accesses in dstStageMask and dstAccessMask, as if there is an execution and memory dependency between the source masks and the transition, as well as between the transition and the destination masks.

I assume the layout transition is magic and doesn't need writes to be explicitly made visible to it, and doesn't need its own writes to be explicitly made available. (If it wasn't magic, and behaved like any other stage, I think you'd need three VkImageMemoryBarrier every time you do a layout transition: one to make visible to the transition, one to perform the transition, one to make available from the transition, in that order.)

Writes from STAGE_COLOR_ATTACHMENT_OUTPUT in the first command buffer were made available ages ago (before the presentation engine read it), so we don't need to use srcAccessMask to make anything else available before the layout transition. Any potential write-after-read hazards are already solved by the execution dependencies. Therefore it's sufficient to say srcAccessMask = 0. There is no need to include anything else in there, contrary to the spec's claim that "srcAccessMask must include VK_ACCESS_MEMORY_READ_BIT".

@ghost ghost closed this as completed Mar 9, 2016
@ghost ghost reopened this Mar 9, 2016
@NicolBolas
Copy link

@philiptaylor

That sounds almost exactly like what execution dependencies are for. It seems weird and redundant for memory dependencies to have the same effect, given that you can't even express a memory dependency unless you've already got an execution dependency on that stage.

Actually, there's a good reason for that. Sometimes, you want to make sure that an operation completes, but you don't care about visibility. This would be for performing writes.

If a prior command tries to read a memory location that a later command may write to, you need an execution dependency between the reading operation and the writing operation. Without one, it becomes possible for the write to happen and become visible before all reads were completed, thus corrupting the reading command.

But you don't need a memory dependency, because you aren't trying to make the previous operation visible (since the previous operation was a write).

So yes, if you need a memory dependency, you also need an execution dependency. But not vice versa.

@philiptaylor
Copy link
Contributor Author

Actually, there's a good reason for that. Sometimes, you want to make sure that an operation completes, but you don't care about visibility. This would be for performing writes.

That's the reason to use an execution dependency when doing write-after-read, which is fine. But once you've got that execution dependency, what's the reason for adding a VkImageMemoryBarrier with srcAccessMask=VK_ACCESS_MEMORY_READ_BIT (or, more generally, any memory barrier with any READ in srcAccessMask)?

I still don't see any situations where such a memory barrier could have any observable or meaningful difference to a similar memory barrier with srcAccessMask=0, but the spec text implies there is some meaning.

@pimanttr
Copy link

pimanttr commented Mar 9, 2016

Image memory barriers are useful among others for performing layout transitions. In that case, it's definitely important to correctly specify {src,dst}{Stage,Access}Masks even for read-only accesses, because the layout transition is logically a "write" operation on the image - it must occur after those accesses are complete, and before new accesses start.

@philiptaylor
Copy link
Contributor Author

If you have a layout transition in an image memory barrier, the spec says there's an execution dependency from the srcStageMask of previous commands to the transition, and from the transition to the dstStageMask of subsequent commands. The first execution dependency is sufficient for the write-after-read hazard involving the transition's writes, so I believe it's still pointless and meaningless to put srcAccessMask=ANY_READ_BIT there. (You might need to put some WRITE bits in srcAccessMask to make previous commands' writes available to the transition's reads, but my original question is just about the READs.)

@krOoze
Copy link
Contributor

krOoze commented Sep 21, 2016

Is there any resolution to this?

I would like to have these answered:

  • Can any *_READ flag in srcAccess be just substituted for 0 (in all situations)?
  • (if so) Is there some expected benefit if the appropriate *_READ flags are provided anyway?
  • (if so) why does the hand-off from presentation prescribes there must be VK_ACCESS_MEMORY_READ_BIT in srcAccess (and does not allow just 0)?

@ghost
Copy link

ghost commented Sep 21, 2016

@krOoze

We're still working through the minutiae, but we did discuss this a bit and it appears that they are indeed useless:

Can any *_READ flag in srcAccess be just substituted for 0 (in all situations)?

Source access masks are intended to determine visibility of write accesses, and nothing else - so yes.

(if so) Is there some expected benefit if the appropriate *_READ flags are provided anyway?

I doubt it - but I don't speak for every implementation. Nobody spoke up about this last time we discussed it though. I don't really see how anyone could use it, because a bit's absence would conservatively be equivalent to the presence of the bit - adding the bit just confirms what you have to already assume.

(if so) why does the hand-off from presentation prescribes there must be VK_ACCESS_MEMORY_READ_BIT in srcAccess (and does not allow just 0)?

It's likely unnecessary and doesn't need to be there. However, interactions with windowing systems are always more complicated than I think, so I'd like to hear from at least @cubanismo on the matter, just to make sure the WSI extensions are not using this to mean something it wasn't otherwise intended for.

@ghost
Copy link

ghost commented Nov 15, 2016

So we've now completely resolved that READ in srcAccessMask is completely a no-op. If we hadn't been previously implying that you needed it, then we'd probably make it invalid. Instead we're just going to leave it as a no-op, and hopefully add a warning to the validation layers that it's a no-op. This should be clear in the spec in the next couple of weeks.

@oddhack
Copy link
Contributor

oddhack commented Nov 26, 2016

This should be fixed in the 1.0.35 spec update, as part of the synchronization language rewrite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants