-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Contrib] Workspace for cuBLAS backend #16413
[Contrib] Workspace for cuBLAS backend #16413
Conversation
a102768
to
86a6213
Compare
There is a pass to allocate such workspace and append it to arguments of a BYOC function: https://github.com/apache/tvm/blob/unity/python/tvm/relax/transform/transform.py#L1305-L1317 If this can be used, I think that would be preferred. |
86a6213
to
6e50364
Compare
@masahi Thank you for the great suggestion! Yes I think this one can be a next step, so that the workspace size can be adjustable. For now, given I am not particular familiar with the BYOC flow, I may not be able to quickly enable the workspace to work with the AllocateWorksapce pass. Adding a fixed size workspace in thread entry is the fastest way to enable workspace. I agree that we can leverage the pass later on. |
This PR adds a 32MB workspace for cuBLAS backend, so that functions like `cublasLtMatmul` can take the workspace as input. The workspace is managed under CuBlasThreadEntry so that it will be allocated only once in each thread.
6e50364
to
8edf01f
Compare
@masahi would you mind sharing more guidance to @MasterJH5574? |
void* workspace_ptr{nullptr}; | ||
// 32MB workspace as suggested by NVIDIA | ||
// https://docs.nvidia.com/cuda/cublas/index.html#cublassetworkspace. | ||
static constexpr const size_t workspace_size = 33554432; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming that 32MB is also good for pre-Hopper since it is bigger than the recommended size, 4MB. @vinx13
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, given the very specific nature of this workspace, I think a fast path like this is reasonable. The AllocateWorkspace
-based approach can be used in more general settings but it is overkill when we only need one workspace of a fixed size.
This PR adds a 32MB workspace for cuBLAS backend, so that functions like
cublasLtMatmul
can take the workspace as input.The workspace is managed under CuBlasThreadEntry so that it will be allocated only once in each thread.