Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Contrib] Workspace for cuBLAS backend #16413

Merged

Conversation

MasterJH5574
Copy link
Contributor

This PR adds a 32MB workspace for cuBLAS backend, so that functions like cublasLtMatmul can take the workspace as input.

The workspace is managed under CuBlasThreadEntry so that it will be allocated only once in each thread.

@MasterJH5574 MasterJH5574 force-pushed the unity-dev/2024-01-16-cublas-workspace branch from a102768 to 86a6213 Compare January 17, 2024 01:54
@MasterJH5574
Copy link
Contributor Author

cc @vinx13 @masahi @junrushao @tqchen

@masahi
Copy link
Member

masahi commented Jan 17, 2024

There is a pass to allocate such workspace and append it to arguments of a BYOC function: https://github.com/apache/tvm/blob/unity/python/tvm/relax/transform/transform.py#L1305-L1317

If this can be used, I think that would be preferred.

@MasterJH5574 MasterJH5574 force-pushed the unity-dev/2024-01-16-cublas-workspace branch from 86a6213 to 6e50364 Compare January 25, 2024 04:48
@MasterJH5574 MasterJH5574 changed the base branch from unity to main January 25, 2024 04:48
@MasterJH5574 MasterJH5574 changed the title [Unity][Contrib] Workspace for cuBLAS backend [Contrib] Workspace for cuBLAS backend Jan 25, 2024
@MasterJH5574
Copy link
Contributor Author

There is a pass to allocate such workspace and append it to arguments of a BYOC function: https://github.com/apache/tvm/blob/unity/python/tvm/relax/transform/transform.py#L1305-L1317

If this can be used, I think that would be preferred.

@masahi Thank you for the great suggestion! Yes I think this one can be a next step, so that the workspace size can be adjustable. For now, given I am not particular familiar with the BYOC flow, I may not be able to quickly enable the workspace to work with the AllocateWorksapce pass. Adding a fixed size workspace in thread entry is the fastest way to enable workspace. I agree that we can leverage the pass later on.

This PR adds a 32MB workspace for cuBLAS backend, so that
functions like `cublasLtMatmul` can take the workspace as
input.

The workspace is managed under CuBlasThreadEntry so that
it will be allocated only once in each thread.
@MasterJH5574 MasterJH5574 force-pushed the unity-dev/2024-01-16-cublas-workspace branch from 6e50364 to 8edf01f Compare January 25, 2024 05:11
@junrushao
Copy link
Member

@masahi would you mind sharing more guidance to @MasterJH5574?

void* workspace_ptr{nullptr};
// 32MB workspace as suggested by NVIDIA
// https://docs.nvidia.com/cuda/cublas/index.html#cublassetworkspace.
static constexpr const size_t workspace_size = 33554432;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that 32MB is also good for pre-Hopper since it is bigger than the recommended size, 4MB. @vinx13

Copy link
Member

@masahi masahi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, given the very specific nature of this workspace, I think a fast path like this is reasonable. The AllocateWorkspace-based approach can be used in more general settings but it is overkill when we only need one workspace of a fixed size.

@masahi masahi merged commit bbbc895 into apache:main Jan 25, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants