-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BMM into BMM triggers internal assert #5211
Comments
It's an interesting case that we'll investigate |
Hot take: We should implement implicit broadcasting of layouts à la NumPy and support in one go ND dots |
I also met the same issue in my code, where a |
Tried to install the fix from latest main but failed to build from source 😂 |
Describe the bug
Performing two back-to-back bmm calls on NVIDIA GPU triggers an internal assert. On triton 3.1.0:
and on main (d5ba6ac):
Below is my WIP kernel that triggers the issue. The kernel is supposed to perform two batched matrix multiply and some reshapes to compute the forward pass of a block tensor train as in the reference einsum. The code gives the correct result when run using the interpreter. If this is a known issue with a workaround I would appreciate some help :)
Environment details
Triton: Tested on 3.1.0 and main (d5ba6ac)
GPU: A100 and 4070 Ti
The text was updated successfully, but these errors were encountered: