Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to fuse non-square (pruned) attention weights for BERT-like models #6850

Merged
merged 3 commits into from
Mar 5, 2021

Conversation

mfuntowicz
Copy link
Contributor

@mfuntowicz mfuntowicz commented Mar 1, 2021

Following @tianleiwu implementation for non squared (i.e. pruned) attention layer this PR introduces the necessary machinery to fuse Attention layer when optimizing BERT model through onnxruntime_tools optimizer.

What does this PR does:

  1. Do no assume squared matrices of shape (self.hidden_size, self.hidden_size) for QKV weights
  2. Split up weights shape into a more generic in_size & out_size mapping -respectively- the incoming number of features and the projected number of features for the matmul operator.
  3. Infer the hidden size representation for each individual head in order to correctly infer the dynamic number of heads in each Attention layer.
  4. Correctly specify the num_heads meta attribute according point 3.

@mfuntowicz mfuntowicz requested a review from a team as a code owner March 1, 2021 14:38
@mfuntowicz mfuntowicz marked this pull request as draft March 1, 2021 14:39
@mfuntowicz mfuntowicz marked this pull request as ready for review March 1, 2021 22:48
tianleiwu
tianleiwu previously approved these changes Mar 2, 2021
Copy link
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

This change supports attention head pruning (hidden size per head remains the same).

@snnn
Copy link
Member

snnn commented Mar 2, 2021

/azp run Windows CPU CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tianleiwu
Copy link
Contributor

tianleiwu commented Mar 3, 2021

@mfuntowicz , I triggered a wrong "Windows CPU CI Pipeline" by mistake. It blocks the merging of this pull request to master. Could you add a dummy commit (like update some comments) and see whether there are two "Windows CPU CI Pipeline" in Required Statuses? If there are still two "Windows CPU CI Pipeline" required after the commit, you will need create a new pull request instead. Sorry for the inconvenience.

@tianleiwu
Copy link
Contributor

/azp run Linux CPU CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,MacOS CI Pipeline,MacOS NoContribops CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run orttraining-linux-ci-pipeline,orttraining-mac-ci-pipeline,orttraining-linux-gpu-ci-pipeline,centos7_cpu,Linux CPU Minimal Build E2E CI Pipeline,Linux Nuphar CI Pipeline,MacOS NoContribops CI Pipeline,Linux OpenVINO CI Pipeline,orttraining-distributed,orttraining-amd-gpu-ci-pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@mfuntowicz
Copy link
Contributor Author

mfuntowicz commented Mar 3, 2021

@tianleiwu Is there anything I can do for all these failing pipelines? Many of them seems to be unrelated to the changes of this PR, but not sure.

Let me know 👌🏻

@tianleiwu
Copy link
Contributor

tianleiwu commented Mar 4, 2021

@mfuntowicz, you will need integrate master to your branch so that it does not show "14 commits behind microsoft:master.". The branch missed a commit of fix for CI Pipeline: ed1883a

@mfuntowicz
Copy link
Contributor Author

mfuntowicz commented Mar 4, 2021

@tianleiwu I just rebased the PR on master, let me know 👍🏻

@tianleiwu tianleiwu merged commit 9126faa into microsoft:master Mar 5, 2021
@mfuntowicz mfuntowicz deleted the hf_fuse_pruned_attention branch April 1, 2021 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants