-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Vulkan] Support uniform buffer object for passing many scalar arguments #7717
Conversation
54a534c
to
62d19eb
Compare
@tqchen This is ready for review. On the codegen side, UBO is declared in a similar way as push constants, so I made On the runtime side, I introduced One potentially controversial point could be the use of runtime function |
cc @ajtulloch @antinucleon please help to see if you can take a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @masahi for these changes, I left a comment on cosmetics
Thanks you @masahi your PR has been merged |
Sorry that i didn't get a chance to review. Here are some items that needs to be fixed as followup PRs:
|
I'm ok if we decide to use a fixed max push constant size, but for other HW params we need to have a solution for runtime querying. |
Thanks @masahi. I agree that the use of delete is minor. In the case of push constant, we can try to obtain these constant and put them into target. The main problem is the code as it is we are querying on the local machine but we need parameters on the remote. It would be useful to consider de-coupling to avoid such inconsistency(of information obtained by compiler and information in the runtime). e.g. The kernel should declare whether UBO or push constant is being used in its function meta-data, so we do not need to make dependency on the runtime parameters (e.g. the compiler decides whether to use UBO by arg size, and write that to an optional meta data of the shader to enable UBO). e.g. we can add a bitmask flag to VulkanShader::flag to indicate the difference The runtime simply read that information and run dispatch, this will avoid inconsistency during compile time and runtime. We can then enhance the target registration to include such informations of max push constant per target, while defaults to the minimum value. After taking a closer look, there are a few more potential things that needs to be fixed:
|
I confirmed that it also works for deferred mode (by always returning false from
Need to look at the reallocate problem |
Sorry I was wrong about the re-allocation, seems you are allocating a single UBO per-kernel, so it might be fine. My original thought was along the line of single UBO per runtime. |
Yes, one UBO per one pipeline https://github.com/masahi/tvm/blob/1a3dbee99c9a2c362373707678d5657e59ea6827/src/runtime/vulkan/vulkan.cc#L108 |
To followup on the deffered execution case: Because memcpy happens here: https://github.com/masahi/tvm/blob/1a3dbee99c9a2c362373707678d5657e59ea6827/src/runtime/vulkan/vulkan.cc#L1150 Imagine there are two consecutive launch of the same kernel A(uses UBO) that are deferred. Then the memcpy of the second kernel launch would override the value of the first launch(both of which hasn't yet been launched). So we want to move memcpy to the lambda in the immediate and deferred kernels(besides the push constant calls) |
To followup on the per pipeline UBO, after thinking a bit more about this design. Per pipeline UBO is fine for single threaded case, but can also be problematic under a multi-threaded setting when multiple threads are launching the same kernel A. To make the runtime thread-safe, we normally needs to divide the data structure into constants(e.g. pipeline) and runtime structure(e.g. staging buffer, streams). The runtime structure part belongs to VulkanThreadEntry that comes with a thread-local copy to avoid threading issue. So we would want to do that, and uses a similar logic as StagingBuffer to create staging buffer for UBO. Finally, right now seems we relies on being able to have an unified memory(host mapped memory) for UBO. Will all GPU support such kind of memory? Shall we allow explicit data copy into the UBO buffer when such memory is not supported? |
Thanks @masahi for great discussions. It seems to me that we would have benefited more reviews and discussion :) Great that we have catched these points for improvements during the discussion after merging. Given it is already merged, we could consider:
To summarize the items that needs to be addressed:
|
…r arguments (apache#7717)" This reverts commit 5bc1cec.
ok sent the revert #7821 |
…nts (apache#7717) * ubo codegen first cut * begin runtime change for UBO * allocate and bind ubo * query memory type for uniform * refactor * do not use float64 * trying an approach similar to push constant * add more log * do not delete ubo when not using it * cumsum and nms test working with ubo * remove log * cleaning up * formatting * revert BufferArgument change * refactored codegen * minor fix * introduce value kind for ubo * fix cpplint and revert float64 change * query push constant size using runtime API * let vkmap/unmap allocate and delete host_buf * doc update * fix typo Co-authored-by: Masahiro Masuda <masahi@[email protected]>
…r arguments (apache#7717)" (apache#7821) This reverts commit 5bc1cec.
…nts (apache#7717) * ubo codegen first cut * begin runtime change for UBO * allocate and bind ubo * query memory type for uniform * refactor * do not use float64 * trying an approach similar to push constant * add more log * do not delete ubo when not using it * cumsum and nms test working with ubo * remove log * cleaning up * formatting * revert BufferArgument change * refactored codegen * minor fix * introduce value kind for ubo * fix cpplint and revert float64 change * query push constant size using runtime API * let vkmap/unmap allocate and delete host_buf * doc update * fix typo Co-authored-by: Masahiro Masuda <masahi@[email protected]>
…r arguments (apache#7717)" (apache#7821) This reverts commit 5bc1cec.
…nts (apache#7717) * ubo codegen first cut * begin runtime change for UBO * allocate and bind ubo * query memory type for uniform * refactor * do not use float64 * trying an approach similar to push constant * add more log * do not delete ubo when not using it * cumsum and nms test working with ubo * remove log * cleaning up * formatting * revert BufferArgument change * refactored codegen * minor fix * introduce value kind for ubo * fix cpplint and revert float64 change * query push constant size using runtime API * let vkmap/unmap allocate and delete host_buf * doc update * fix typo Co-authored-by: Masahiro Masuda <masahi@[email protected]>
…r arguments (apache#7717)" (apache#7821) This reverts commit 5bc1cec.
…nts (apache#7717) * ubo codegen first cut * begin runtime change for UBO * allocate and bind ubo * query memory type for uniform * refactor * do not use float64 * trying an approach similar to push constant * add more log * do not delete ubo when not using it * cumsum and nms test working with ubo * remove log * cleaning up * formatting * revert BufferArgument change * refactored codegen * minor fix * introduce value kind for ubo * fix cpplint and revert float64 change * query push constant size using runtime API * let vkmap/unmap allocate and delete host_buf * doc update * fix typo Co-authored-by: Masahiro Masuda <masahi@[email protected]>
…r arguments (apache#7717)" (apache#7821) This reverts commit 5bc1cec.
…nts (apache#7717) * ubo codegen first cut * begin runtime change for UBO * allocate and bind ubo * query memory type for uniform * refactor * do not use float64 * trying an approach similar to push constant * add more log * do not delete ubo when not using it * cumsum and nms test working with ubo * remove log * cleaning up * formatting * revert BufferArgument change * refactored codegen * minor fix * introduce value kind for ubo * fix cpplint and revert float64 change * query push constant size using runtime API * let vkmap/unmap allocate and delete host_buf * doc update * fix typo Co-authored-by: Masahiro Masuda <masahi@[email protected]>
…r arguments (apache#7717)" (apache#7821) This reverts commit 5bc1cec.
We are using push constants to pass scalar arguments to SPIR-V kernels. The vulkan spec ensures that the size of push constants storage is at least 128 bytes. Since we pass each scalar via 64 bit union, that means we can only pass 16 parameters via push constants. Unfortunately, for fused, dynamic input kernel, the number of
any_dim
and other params can easily go beyond that limit. See for example this crazy kernel that needs 20 scalars to be passed in https://gist.github.com/masahi/ce51c0d51c6115109203b3732f185aabThis PR enables passing parameters beyond push constants limit, using uniform buffer object. In particular, with this PR I was able to run PyTorch MaskRCNN end to end on Vulkan!
Running NMS tests in
onnx/test_forward.py
serves as a test case.There are some minor TODOs left, but I'd like to get some feedback. Supporting UBO requires making non trivial changes to both codegen and runtime, but hopefully they are straightforward.
@tqchen @tmoreau89 @jwfromm @ajtulloch
TODO
Fix segfault on deleting UBO host buffer(Seems I shouldn't allocate/deallocate the host buf explicitly, rather use vkMapMemory/vkUnMapMemory)