[ascend] optimize tp > 1 latency by using graph operation and torch_npu op command launcher #143

CyCle1024 · 2024-12-27T07:11:04Z

No description provided.

CyCle1024 · 2025-01-13T06:53:47Z

GQA model have issue in aclnn op SplitV, require debugging.

CyCle1024 requested a review from jinminxi104 as a code owner December 27, 2024 07:11

CyCle1024 added the ascend platform ascend label Dec 27, 2024

CyCle1024 changed the title ~~optimize tp > 1 latency by using graph operation and torch_npu op command launcher~~ [ascend] optimize tp > 1 latency by using graph operation and torch_npu op command launcher Dec 27, 2024

add graph operation fusion with env option

9062c6b

CyCle1024 force-pushed the ccy/tp_opt branch from edbb747 to 9062c6b Compare December 27, 2024 07:25

CyCle1024 marked this pull request as draft January 13, 2025 06:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ascend] optimize tp > 1 latency by using graph operation and torch_npu op command launcher #143

[ascend] optimize tp > 1 latency by using graph operation and torch_npu op command launcher #143

CyCle1024 commented Dec 27, 2024

CyCle1024 commented Jan 13, 2025

[ascend] optimize tp > 1 latency by using graph operation and torch_npu op command launcher #143

Are you sure you want to change the base?

[ascend] optimize tp > 1 latency by using graph operation and torch_npu op command launcher #143

Conversation

CyCle1024 commented Dec 27, 2024

CyCle1024 commented Jan 13, 2025