-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Scalable Matrix Extension enablement #107
Conversation
A RFC for enabling Scalable Matrix Extension code generation in TVM. Change-Id: If2cc84de2ccc09ec8c526bf154ba099715e46596
I like how we can leverage tensorization and keep most things within the existing infrastructure. Would love to see how we can align some of the scheduling support towards IRModule=>IRModule transformation in dlight style mechanisms, so we can get even better composability. I take sometime to write down related thoughts here https://discuss.tvm.apache.org/t/discuss-tvm-core-strategy-for-operator-scheduling-and-tuning/16352 which should help clarify some of the context. |
Thanks for taking a look @tqchen! Since scheduling will be completed with TensorIR, it will provide the building blocks for being plugged into an IRModule=>IRModule transformation pass. For our current use-case, it's important to be able to fallback to previous optimizations in the form of TE schedules / TOPI where coverage of the TensorIR schedules doesn't exist. From the proposed strategy, I understand it's important to ensure the schedule can operate on a generic compute definition of the operation. In the case of matmul-style operations, we'd want to apply "array packing" to the input, which is currently expressed via the compute definition. Is it possible to express this through TIR scheduling alone? |
to clarify a bit, we do not need have to ask for doing everything as form of schedule, so it is OK for example to generate a compute definition that already contains packing (you can view that as one special dispatch pass). The main ask is that the TIR schedule pass should detect the already packed TIR and continue schedule it(one way might be detect an attached tag in block). So the ApplySchedule pass can be done independently from the compute definition This being said, i think it should be possible to insert array packing through cache_read and transform layout |
Got it, thanks @tqchen :) It sounds as though we're already doing something similar by adding a tag in the compute definition to identify the block during scheduling. |
… error in example Change-Id: I042523e0bd34dc3b8bc62176e983604a6af33b4d
Thanks for the discussion so far @tqchen, I added a small example detailing how we're registering schedules for the Relay flow. I believe this will have minimal impact for how the schedule might be used in a Relax based flow, but it would be good to hear your thoughts. |
Thanks @lhutton1 , for relax and moving forward, one canonical example that can be helpful is the https://github.com/apache/tvm/tree/main/python/tvm/dlight package, which defines pattern matching and apply of transforms, that can then be used as part of pass. Right now dlight started from gpu based schedule for LLMs but would be great to expand it to include CPU flows. Notably the operator definition still resides in topi or other places and dlight focuses on detecting TIR pattern and apply transformations |
Thanks @tqchen, at the moment the Relax flow would be out of scope for our current use-cases, although we'd want to make sure this RFC doesn't introduce obstacles for porting to the Relax flow in the future. Do you foresee any blockers with the current approach, or could we consider merging? |
I think it is helpful to add a discussion about how the flow would fit into the DLight usecases. I don't think it would likely cause too much of an overhead :) |
Change-Id: Icefa54694706faef0330c1988af3a2528394540d
Change-Id: I2f239b3eaeb76245c8e79057126578ee5830796e
This has been approved a few days back, so merging it now so that we continue the discussions in the context of the tracking issue and upcoming PRs. Thank you for all the discussion everyone! |
This commit adds a new scalable fp32 dense schedule that calls SME intrinsics according to the SME RFC: apache/tvm-rfcs#107. Currently the schedule does not make use of predication, meaning the output from the matmul compute must be copied in a subsequent compute stage. This will be removed once support for predication is added. Change-Id: I9d5ec03d10b03b0637a48116d0cb4076f0ca8192
This commit adds a new scalable fp32 dense schedule that calls SME intrinsics according to the SME RFC: apache/tvm-rfcs#107. Currently the schedule does not make use of predication, meaning the output from the matmul compute must be copied in a subsequent compute stage. This will be removed once support for predication is added. Change-Id: I9d5ec03d10b03b0637a48116d0cb4076f0ca8192
This commit adds a new scalable fp32 dense schedule that calls SME intrinsics according to the SME RFC: apache/tvm-rfcs#107. Currently the schedule does not make use of predication, meaning the output from the matmul compute must be copied in a subsequent compute stage. This will be removed once support for predication is added.
A RFC for enabling Scalable Matrix Extension code generation in TVM.
Rendered