Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize com.microsoft.MatMulNbits operator #28504

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bopeng1234
Copy link

This PR is doing some optimization work on onnxfrontend com.microsoft.MatMulNbits operators

with this changes:

  1. it disabled const folding with use 75GB for phi3 INT4 model and 200+GB for llama3 INT4 model.
  2. it trigger oneDNN matmul primitives, much benefits the GPU performance

we tested this changes along with another PR #28163 , and confirmed phi3/llama3 INT4 model run well in LNL.

    ### Details:
        - use convert instead of convert_like op, it help disabled const
	  folding and run online int2/4/8 dequantize rather than const
	  folding as complie time, benefits compile memory usage and
	  inference latency.
	- use zero point as uint2/4/8, it trigled oneDNN kernel, much
	  benefits the GPU performance.
@bopeng1234 bopeng1234 requested a review from a team as a code owner January 17, 2025 06:47
@github-actions github-actions bot added the category: ONNX FE OpenVINO ONNX FrontEnd label Jan 17, 2025
@sys-openvino-ci sys-openvino-ci added the ExternalIntelPR External contributor from Intel label Jan 17, 2025
@ilya-lavrenov ilya-lavrenov added this to the 2025.1 milestone Jan 17, 2025
@ilya-lavrenov
Copy link
Contributor

build_jenkins

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: ONNX FE OpenVINO ONNX FrontEnd ExternalIntelPR External contributor from Intel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants