[RFC] Scalable Matrix Extension enablement #107

lhutton1 · 2024-01-31T10:28:55Z

A RFC for enabling Scalable Matrix Extension code generation in TVM.

A RFC for enabling Scalable Matrix Extension code generation in TVM. Change-Id: If2cc84de2ccc09ec8c526bf154ba099715e46596

tqchen · 2024-02-01T23:21:54Z

I like how we can leverage tensorization and keep most things within the existing infrastructure. Would love to see how we can align some of the scheduling support towards IRModule=>IRModule transformation in dlight style mechanisms, so we can get even better composability.

I take sometime to write down related thoughts here https://discuss.tvm.apache.org/t/discuss-tvm-core-strategy-for-operator-scheduling-and-tuning/16352 which should help clarify some of the context.

lhutton1 · 2024-02-14T18:00:24Z

Thanks for taking a look @tqchen! Since scheduling will be completed with TensorIR, it will provide the building blocks for being plugged into an IRModule=>IRModule transformation pass. For our current use-case, it's important to be able to fallback to previous optimizations in the form of TE schedules / TOPI where coverage of the TensorIR schedules doesn't exist.

From the proposed strategy, I understand it's important to ensure the schedule can operate on a generic compute definition of the operation. In the case of matmul-style operations, we'd want to apply "array packing" to the input, which is currently expressed via the compute definition. Is it possible to express this through TIR scheduling alone?

tqchen · 2024-02-14T18:09:12Z

to clarify a bit, we do not need have to ask for doing everything as form of schedule, so it is OK for example to generate a compute definition that already contains packing (you can view that as one special dispatch pass).

The main ask is that the TIR schedule pass should detect the already packed TIR and continue schedule it(one way might be detect an attached tag in block). So the ApplySchedule pass can be done independently from the compute definition

This being said, i think it should be possible to insert array packing through cache_read and transform layout

lhutton1 · 2024-02-15T09:46:26Z

Got it, thanks @tqchen :) It sounds as though we're already doing something similar by adding a tag in the compute definition to identify the block during scheduling.

… error in example Change-Id: I042523e0bd34dc3b8bc62176e983604a6af33b4d

lhutton1 · 2024-03-12T14:08:21Z

Thanks for the discussion so far @tqchen, I added a small example detailing how we're registering schedules for the Relay flow. I believe this will have minimal impact for how the schedule might be used in a Relax based flow, but it would be good to hear your thoughts.

tqchen · 2024-03-12T14:33:12Z

Thanks @lhutton1 , for relax and moving forward, one canonical example that can be helpful is the https://github.com/apache/tvm/tree/main/python/tvm/dlight package, which defines pattern matching and apply of transforms, that can then be used as part of pass.

Right now dlight started from gpu based schedule for LLMs but would be great to expand it to include CPU flows. Notably the operator definition still resides in topi or other places and dlight focuses on detecting TIR pattern and apply transformations

lhutton1 · 2024-03-12T15:33:56Z

Thanks @tqchen, at the moment the Relax flow would be out of scope for our current use-cases, although we'd want to make sure this RFC doesn't introduce obstacles for porting to the Relax flow in the future. Do you foresee any blockers with the current approach, or could we consider merging?

tqchen · 2024-03-12T15:56:27Z

I think it is helpful to add a discussion about how the flow would fit into the DLight usecases. I don't think it would likely cause too much of an overhead :)

Change-Id: Icefa54694706faef0330c1988af3a2528394540d

Change-Id: I2f239b3eaeb76245c8e79057126578ee5830796e

leandron · 2024-03-19T09:26:09Z

This has been approved a few days back, so merging it now so that we continue the discussions in the context of the tracking issue and upcoming PRs.

Thank you for all the discussion everyone!

This commit adds a new scalable fp32 dense schedule that calls SME intrinsics according to the SME RFC: apache/tvm-rfcs#107. Currently the schedule does not make use of predication, meaning the output from the matmul compute must be copied in a subsequent compute stage. This will be removed once support for predication is added. Change-Id: I9d5ec03d10b03b0637a48116d0cb4076f0ca8192

This commit adds a new scalable fp32 dense schedule that calls SME intrinsics according to the SME RFC: apache/tvm-rfcs#107. Currently the schedule does not make use of predication, meaning the output from the matmul compute must be copied in a subsequent compute stage. This will be removed once support for predication is added.

lhutton1 force-pushed the sme-rfc branch from 049662d to aca5262 Compare January 31, 2024 10:32

[RFC] Scalable Matrix Extension enablement

e6ed9eb

A RFC for enabling Scalable Matrix Extension code generation in TVM. Change-Id: If2cc84de2ccc09ec8c526bf154ba099715e46596

lhutton1 force-pushed the sme-rfc branch from aca5262 to e6ed9eb Compare January 31, 2024 10:33

Add paragraph on registering STIR schedules in Relay strategy and fix…

259b7ed

… error in example Change-Id: I042523e0bd34dc3b8bc62176e983604a6af33b4d

add discussion for applying sme schedules to Relax compilation flow

0a3c63c

Change-Id: Icefa54694706faef0330c1988af3a2528394540d

tqchen approved these changes Mar 12, 2024

View reviewed changes

lhutton1 mentioned this pull request Mar 18, 2024

[Tracking Issue] Scalable Matrix Extension (SME) upstreaming apache/tvm#16734

Open

11 tasks

add tracking issue link

5f2949f

Change-Id: I2f239b3eaeb76245c8e79057126578ee5830796e

leandron merged commit 176a14e into apache:main Mar 19, 2024

lhutton1 deleted the sme-rfc branch March 19, 2024 09:28

ysh329 mentioned this pull request Apr 21, 2024

[Release] v0.16.0 Release Candidate Notes apache/tvm#16911

Closed

lhutton1 mentioned this pull request Apr 24, 2024

[SME] Introduce scalable fp32 dense schedule apache/tvm#16921

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Scalable Matrix Extension enablement #107

[RFC] Scalable Matrix Extension enablement #107

lhutton1 commented Jan 31, 2024 •

edited

Loading

tqchen commented Feb 1, 2024

lhutton1 commented Feb 14, 2024

tqchen commented Feb 14, 2024 •

edited

Loading

lhutton1 commented Feb 15, 2024

lhutton1 commented Mar 12, 2024

tqchen commented Mar 12, 2024

lhutton1 commented Mar 12, 2024

tqchen commented Mar 12, 2024

leandron commented Mar 19, 2024

[RFC] Scalable Matrix Extension enablement #107

[RFC] Scalable Matrix Extension enablement #107

Conversation

lhutton1 commented Jan 31, 2024 • edited Loading

tqchen commented Feb 1, 2024

lhutton1 commented Feb 14, 2024

tqchen commented Feb 14, 2024 • edited Loading

lhutton1 commented Feb 15, 2024

lhutton1 commented Mar 12, 2024

tqchen commented Mar 12, 2024

lhutton1 commented Mar 12, 2024

tqchen commented Mar 12, 2024

leandron commented Mar 19, 2024

lhutton1 commented Jan 31, 2024 •

edited

Loading

tqchen commented Feb 14, 2024 •

edited

Loading