Warm tip: This article is reproduced from stackoverflow.com, please click
memory-model synchronization vulkan

"Synchronizing" a render pass layout transition with a semaphore in Acquire-Present scenario in Vulk

发布于 2020-05-07 17:57:59

So there is this official example https://github.com/KhronosGroup/Vulkan-Docs/wiki/Synchronization-Examples#combined-graphicspresent-queue:

/* Only need a dependency coming in to ensure that the first
   layout transition happens at the right time.
   Second external dependency is implied by having a different
   finalLayout and subpass layout. */
VkSubpassDependency dependency = {
    .srcSubpass = VK_SUBPASS_EXTERNAL,
    .dstSubpass = 0,
    // .srcStageMask needs to be a part of pWaitDstStageMask in the WSI semaphore.
    .srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
    .dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
    .srcAccessMask = 0,
    .dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT,
    .dependencyFlags = 0};

Could someone please provide me with relevant sections in the specification that combined guarantee (constitute a chain of reasoning) that with such a dependency layout transition will happen not until the queue's wait semaphore (image acquired) is signaled?

Particularly I can't find how to interpret this "dependency from that same stage to itself".

To be clear. I've found a lot of places that seem to be relevant here. I'm reading docs for over a month now but I'm struggling to find coherence in them.

For example when (according to the specification) an availability operation does happen? When relevant memory dependency operation was submitted (as in submission order)? If yes than when the subpass dependency is submitted? Or is it somewhere between source scope instructions and destination scope instruction (like in If srcSubpass is equal to VK_SUBPASS_EXTERNAL, the first synchronization scope includes commands that occur earlier in submission order than the vkCmdBeginRenderPass). And if yes what instructions the srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT in the above example is referring to?

EDIT after krOoze's answer

I thought I'll write here. One because it's too long for a comment and two because I believe it may be useful for others.

I admit, I misinterpreted the part about the execution dependency chain in the spec.

So to sum it up. To define the mechanism in question in terms of the Specification we have what follows:

  1. The waiting on semaphore operation happens-before the subpass dependency operation (here I have some trouble actually):

    6.4.2. Semaphore Waiting*
    The semaphore wait operation happens-after the first set of operations in the execution dependency, and happens-before the second set of operations in the execution dependency.

    But how to be sure that our subpass dependency operation is in the second set? It is in the same batch, it doesn't have defined submission order with regard to a subpass dependency (at least I cannot see one) and the definition of the semaphore second synchronisation scope isn't helpful because our subpass dependency doesn't happen on the VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT pipeline stage (and it's the limitation of the second synchronisation scope in case of vkQueueSubmit). What's more synchronisation scope doesn't define the second set of operations anyway. It's a distinct term. But I've found one more statement that may be helpful here (well, if we'll agree that a subpass dependency is a part of a work item):

    4.3.5. Queue Submission
    Each batch consists of three distinct parts:

    1. Zero or more semaphores to wait on before execution of the rest of the batch.
    2. Zero or more work items to execute.
    3. Zero or more semaphores to signal upon completion of the work items.

    And we need to be sure of this ordering to construct the execution dependency chain:

  2. Waiting on the semaphore and the subpass dependency constitute an execution dependency chain according to:

    6.1. Execution and Memory Dependencies
    An execution dependency chain is a sequence of execution dependencies that form a happens-before relation between the first dependency’s A' and the final dependency’s B'. For each consecutive pair of execution dependencies, a chain exists if the intersection of BS in the first dependency and AS in the second dependency is not an empty set.

    (see krOoze's answer for details)

    From this we know that the destination scope of our subpass dependency will happen-after signaling the semaphore (a signal operation is in the source scope of the semaphore wait operation).
    Now we should be fine with the layout transition rule:

  3. Layout transition happens-after the availability operation for our subpass dependency:

    7.1. Render Pass Creation
    Automatic layout transitions away from initialLayout happens-after the availability operations for all dependencies with a srcSubpass equal to VK_SUBPASS_EXTERNAL, where dstSubpass uses the attachment that will be transitioned.

    To be honest I'm still missing the ordering between signaling the semaphore and the availability operation part in the spec but I think it could be assumed.
    (the above would work because an availability operation is part of the memory dependency operation:

    An operation that performs a memory dependency generates:
    • An availability operation with source scope of all writes in the first access scope of the dependency and a destination scope of the device domain.

    well our first access scope is empty but still it's an availability operation, right?)

There's also this statement:

For attachments however, subpass dependencies work more like a VkImageMemoryBarrier defined similarly to the VkMemoryBarrier above, the queue family indices set to VK_QUEUE_FAMILY_IGNORED, and layouts as follows:
• The equivalent to oldLayout is the attachment’s layout according to the subpass description for srcSubpass.
• The equivalent to newLayout is the attachment’s layout according to the subpass description for dstSubpass.

...which brings another scope to analyse but my head aches already. I'll be more than happy to edit this more when I got some review of the above thoughts.

*All the spec quotes from "Vulkan® 1.2.132 - A Specification (with all registered Vulkan extensions)"

Questioner
listerreg
Viewed
20
krOoze 2020-02-22 02:02

I go over this a bit at krOoze/Hello_Triangle/doc. What is supposed to happen in this scenario is:

enter image description here

Particularly I can't find how to interpret this "dependency from that same stage to itself".

Now, lets first take care of this issue. This is what I like to call cart-before-horse intuition of the synchronization system.

You do not "synchronize stages" or something like that. That is intuition that will only cause you confusion. You synchronize scopes.

People also confuse a pipeline with a flow-chart. There is a huge intuition difference. In flow-chart, you start at a start, then you go over all the stages in order, then you are finished and forever done. That is not what pipeline is. It never starts, and never finishes. Pipeline just is. It is like a desktop game board. You stuff commands through the pipeline, and they go through the stages like pegs on the board.

A synchronization command is something that introduces a dependency between two things: between the source synchronization scope and destination synchronization scope. It guarantees the src scope happens-before the dst scope.

A scope is some subset of queue operations, and at what stage they currently their execution can be at.

So, with this better intuition,

    .srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
    .dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,

is a perfectly normal thing to do. It means the commands in the source scope (for a barrier, the commands recorded earlier, or more formally those "earlier in submission order") reached COLOR_ATTACHMENT stage, before any commands in the destination scope reached COLOR_ATTACHMENT stage. (In contrast, without the dependendency it would mean that any command can be at any stage in its execution at any given time).

For example when (according to the specification) an availability operation does happen.

These are somewhat inserted in the dependency the barrier defines. Assuming you included a memory dependency in your barrier.

The availability operation (if any) happens-after the source synchronization scope. Then happens the layout transition (if any). Then happens the visibility op (if any). And only after then can the source synchronization scope execute.

Could someone please provide me with relevant sections in the specification that combined guarantee (constitute a chain of reasoning)

I just want to pat you on the head right now for wanting the authoritative information... :D

So, you need to know the formalism and the nomenclature. It is the thing all the synchronization primitives are described with. It is only one-page but relatively hard read. I tried explain the important parts above. I am not gonna quote it here, it is the 6.1. Execution and Memory Dependencies chapter.

Now, semaphore wait has its own chapter. It is important to note it behaves bit differently for other commands than for vkQueueSubmit(which gets annoying). Anyway (6.4.2. Semaphore Waiting):

The second synchronization scope includes every command submitted in the same batch. In the case of vkQueueSubmit, the second synchronization scope is limited to operations on the pipeline stages determined by the destination stage mask specified by the corresponding element of pWaitDstStageMask. Also, in the case of vkQueueSubmit, the second synchronization scope additionally includes all commands that occur later in submission order.

The second access scope includes all memory access performed by the device.

Batch (for vkQueueSubmit) is the single VkSubmitInfo. Submission order also has its own chapter; basically it means "all the other Batches that are later in the submission array, as well as any future vkQueueSubmit on the same queue too".

So, this means: "if you wait on a semaphore, all commands in the VkSubmitInfo can reach a pWaitDstStageMask stage only after the semaphore was signaled".

Now it is important to understand what a Render Pass does. Apart from the recorded commands, it has other "synchronizables": the automatic layout transitions, the load operations, and the store operations.

The automatic layout transition:

Automatic layout transitions away from initialLayout happens-after the availability operations for all dependencies with a srcSubpass equal to VK_SUBPASS_EXTERNAL, where dstSubpass uses the attachment that will be transitioned

Automatic layout transitions into the layout used in a subpass happen-before the visibility operations for all dependencies with that subpass as the dstSubpass.

So in simple terms, layout transition is sneaked in inside the dependency that the VkSubpassDependency you list defines. It happens after the .srcStageMask with .srcAccessMask. And it happens before the .dstSubpass and .dstStageMask with .dstAccessMask.

The load op:

The load operation for each sample in an attachment happens-before any recorded command which accesses the sample in the first subpass where the attachment is used. [...] Load operations for attachments with a color format execute in the VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT pipeline stage.

VK_ATTACHMENT_LOAD_OP_LOAD [...] For attachments with a color format, this uses the access type VK_ACCESS_COLOR_ATTACHMENT_READ_BIT.

VK_ATTACHMENT_LOAD_OP_CLEAR(or VK_ATTACHMENT_LOAD_OP_DONT_CARE) [...] For attachments with a color format, this uses the access type VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT.

The load op happens as part of the first subpass that uses the attachment (your .dstSubpass). And the above unambiguously determines your .dstStageMask and .dstAccessMask)

Now, it comes to our choice of pWaitDstStageMask and .srcStageMask and .srcAccessMask. You list it is pWaitDstStageMask = COLOR_ATTACHMENT_OUTPUT, .srcStageMask = COLOR_ATTACHMENT_OUTPUT and .srcAccessMask = 0.

The Semaphore wait operation has to happen-before the VkSubpassDependency stuff. This is specified as dependency chain:

An execution dependency chain is a sequence of execution dependencies that form a happens-before relation between the first dependency’s A' and the final dependency’s B'. For each consecutive pair of execution dependencies, a chain exists if the intersection of BS in the first dependency and AS in the second dependency is not an empty set.

I.e. two subsequent synch primitives also do synch with each other and form a transitional property. Our A' here is the semaphore signal, and our B' here is the dst scope of the VkSubpassDependency. Our BS here is the semaphore dst scope, i.e. pWaitDstStageMask. And our AS is the src scope of our VkSubpassDependency.

So our pWaitDstStageMask intersection with .srcStageMask is still COLOR_ATTACHMENT_OUTPUT. Therefore a dependency chain is formed, that guarantees the semaphore signal happens-before the COLOR_ATTACHMENT_OUTPUT of the commands in the 0 subpass of the render pass.

Now, putting it all together: the semaphore signal from vkAcquireNextImage makes the swapchain image available from the read of the presentation engine. The semaphore wait in vkQueueSubmit makes the swapchain image visible to all commands in the Batch limited to COLOR_ATTACHMENT_OUTPUT. The VkSubpassDependency chains to that semaphore wait. The image is still visible to, so no additional memory dependency is needed and so our .srcAccessMask is 0. The layout transition writes the image and makes it (implicitly) available from the layout transition and visible to whatever the .dst* was provided to the VkSubpassDependency.