Warm tip: This article is reproduced from stackoverflow.com, please click
c++ gpu synchronization vulkan

Do I need to transfer ownership *back* to the transfer queue on next transfer?

发布于 2020-05-05 07:52:54

I'm planning on using one of the vulkan synchronization examples as reference for how to handle infrequently updated uniform buffers. Specifically I was looking at this one:

vkBeginCommandBuffer(...);

// Submission guarantees the host write being complete, as per
// https://www.khronos.org/registry/vulkan/specs/1.0/html/vkspec.html#synchronization-submission-host-writes
// So no need for a barrier before the transfer

// Copy the staging buffer contents to the vertex buffer
VkBufferCopy vertexCopyRegion = {
    .srcOffset = stagingMemoryOffset,
    .dstOffset = vertexMemoryOffset,
    .size      = vertexDataSize};

vkCmdCopyBuffer(
    commandBuffer,
    stagingBuffer,
    vertexBuffer,
    1,
    &vertexCopyRegion);


// If the graphics queue and transfer queue are the same queue
if (isUnifiedGraphicsAndTransferQueue)
{
    // If there is a semaphore signal + wait between this being submitted and
    // the vertex buffer being used, then skip this pipeline barrier.

    // Pipeline barrier before using the vertex data
    // Note that this can apply to all buffers uploaded in the same way, so
    // ideally batch all copies before this.
    VkMemoryBarrier memoryBarrier = {
        ...
        .srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT,                           
        .dstAccessMask = VK_ACCESS_VERTEX_ATTRIBUTE_READ_BIT};

    vkCmdPipelineBarrier(
        ...
        VK_PIPELINE_STAGE_TRANSFER_BIT ,      // srcStageMask
        VK_PIPELINE_STAGE_VERTEX_INPUT_BIT,   // dstStageMask
        1,                                    // memoryBarrierCount
        &memoryBarrier,                       // pMemoryBarriers
        ...);


    vkEndCommandBuffer(...);

    vkQueueSubmit(unifiedQueue, ...);
}
else
{
    // Pipeline barrier to start a queue ownership transfer after the copy
    VkBufferMemoryBarrier bufferMemoryBarrier = {
        ...
        .srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT,                           
        .dstAccessMask = 0,
        .srcQueueFamilyIndex = transferQueueFamilyIndex,
        .dstQueueFamilyIndex = graphicsQueueFamilyIndex,
        .buffer = vertexBuffer,
        ...};

    vkCmdPipelineBarrier(
        ...
        VK_PIPELINE_STAGE_TRANSFER_BIT ,      // srcStageMask
        VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, // dstStageMask
        1,                                    // bufferMemoryBarrierCount
        &bufferMemoryBarrier,                 // pBufferMemoryBarriers
        ...);


    vkEndCommandBuffer(...);

    // Ensure a semaphore is signalled here which will be waited on by the graphics queue.
    vkQueueSubmit(transferQueue, ...);

    // Record a command buffer for the graphics queue.
    vkBeginCommandBuffer(...);

    // Pipeline barrier before using the vertex buffer, after finalising the ownership transfer
    VkBufferMemoryBarrier bufferMemoryBarrier = {
        ...
        .srcAccessMask = 0,                           
        .dstAccessMask = VK_ACCESS_VERTEX_ATTRIBUTE_READ_BIT,
        .srcQueueFamilyIndex = transferQueueFamilyIndex,
        .dstQueueFamilyIndex = graphicsQueueFamilyIndex,
        .buffer = vertexBuffer,
        ...};

    vkCmdPipelineBarrier(
        ...
        VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,    // srcStageMask
        VK_PIPELINE_STAGE_VERTEX_INPUT_BIT,   // dstStageMask
        ...
        1,                                    // bufferMemoryBarrierCount
        &bufferMemoryBarrier,                 // pBufferMemoryBarriers
        ...);


    vkEndCommandBuffer(...);

    vkQueueSubmit(graphicsQueue, ...);
}

In this example, I simplify it to mean:

map updated buffer which is host coherent
perform transfer in transfer queue to device local memory
    make sure to put a buffer memory barrier to handle the queue ownership transfer
perform normal draw commands
    make sure to put a buffer memory barrier to handle receiving of buffer in queue ownership

Must I then give back the ability for the transfer queue to copy that data again? None of the examples seem to mention it, but I could have missed it. I can't really see how adding another buffer barrier would work for the same draw command buffer as it would stall on the next submission even if I didn't have anything to transfer, so would I just need to submit another command buffer just for queue ownership transfer before I submit my next transfer operation?

ie

//begin with transfer ownership
submit(copy)
submit(ownership to graphics)
submit(draw)
submit(ownership to transfer)
submit(copy)
submit(ownership to graphics)
submit(draw)
submit(draw)
submit(draw)
submit(ownership to transfer)
submit(copy)
submit(ownership to graphics)
submit(draw)

If that is the case, I'm unsure of how to handle the semaphore signaling between draw and transfer, and copy and draw. At the beginning it is easy, but then it gets strange because of multiple i nflight frames, as there would be no dependency between draw submits. Basically I guess I would need to set what ever the most recent draw command I submitted to have a semaphore to signal the transfer of ownership, which would signal the copy, which would signal the ownership of graphics, and if it was on a separate thread I would then check if this copy was submitted, and require a wait on the ownership of graphics transfer and resetting the copy submitted check. But I'm not sure what happens to the next frame which doesn't have this dependency, and could finish before what would chronologically be the previous frame?

Questioner
whn
Viewed
25
krOoze 2020-02-20 09:26

You can use a Resource on any queue family (without a transfer) as long as you do not mind that the data becomes undefined. You still need a Semaphore to make sure there is no memory hazard.

Ye olde spec:

NOTE

If an application does not need the contents of a resource to remain valid when transferring from one queue family to another, then the ownership transfer should be skipped.

Examples do not mention it, because they are only examples.

As for synchronization (which is a separate problem from QFOT), a semaphore signal which is part of vkQueueSubmit covers everything previously in submission order. So when you submit the copy you would let it wait on the Semaphore that the last draw submitted has signaled. That means that draw and any previous draw on that queue is finished, before the copy can start on the other queue.

Then you signal a semaphore by the copy, and wait on it on the first next draw you submit. That means the copy finished writing, before the draw (and any subsequent draw) reads it on the graphics queue.

e.g.:

submit(copy, release ownership from tranfer, semaphore signal)
submit(semaphore wait, acquire ownership to graphics, draw)
submit(draw)
submit(draw)
submit(draw)
submit(draw)
submit(draw, semaphore signal)
submit(semaphore wait, copy, release ownership from tranfer, semaphore signal)
submit(semaphore wait, acquire ownership to graphics, draw)
submit(draw)
submit(draw)
etc

Though note the above approach practically serializes the two kinds of accesses, so it might be suboptimal. Employing double-buffering (or generally N-buffering) can be better. If you have more buffers, you could start copying into one without worying it is already used by something else. That means a copy can happen in paralel with a draw, which would be great.