-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUBLAS] Add cuBLAS as a Relay partitioning target (BYOC) #10820
Conversation
This PR adds a partitioning pass for cuBLAS so that supported Relay patterns can be offloaded to cuBLAS. This initial commit only adds offloading support for nn.matmul. Although cuBLAS is already enabled in TVM by using strategy selection in TE, by exposing it explicitly as a Relay partitioning target we can more precisely describe how to execute a model in Relay. This is desirable particularly in the Collage effort to improve multi-backend graph partitioning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite nice I think, thanks. It will be fun to see this used for complex patterns.
When we add the second+ examples I think we can consider:
- removing per-op boilerplate, since placeholder construction, create_schedule and build will always be the same
- avoiding the tvm.build entry point since I believe the support for schedules & tensors is now considered anachronistic. Perhaps a tvm.build_te(op, placeholders, target, name.
Thanks @mbs-octoml, I've refactored both the 'lower funcs' and tests to extract all the boilerplate into reusable code. I've also switched to first using create_prim_func, then I call tvm.build on the resulting TIR. |
thanks @mbaret LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @mbaret @mbs-octoml @mikepapadim |
* [CUBLAS] Add cuBLAS as a Relay partitioning target (BYOC) This PR adds a partitioning pass for cuBLAS so that supported Relay patterns can be offloaded to cuBLAS. This initial commit only adds offloading support for nn.matmul. Although cuBLAS is already enabled in TVM by using strategy selection in TE, by exposing it explicitly as a Relay partitioning target we can more precisely describe how to execute a model in Relay. This is desirable particularly in the Collage effort to improve multi-backend graph partitioning. * Refactor to remove boilerplate
This PR adds a partitioning pass for cuBLAS so that supported Relay patterns can be offloaded to cuBLAS. This initial commit only adds offloading support for nn.matmul.
Although cuBLAS is already enabled in TVM by using strategy selection in TE, by exposing it explicitly as a Relay partitioning target we can more precisely describe how to execute a model in Relay. This is desirable particularly in the Collage effort to improve multi-backend graph partitioning.