-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to fuse non-square (pruned) attention weights for BERT-like models #6850
Ability to fuse non-square (pruned) attention weights for BERT-like models #6850
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
This change supports attention head pruning (hidden size per head remains the same).
/azp run Windows CPU CI Pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
@mfuntowicz , I triggered a wrong "Windows CPU CI Pipeline" by mistake. It blocks the merging of this pull request to master. Could you add a dummy commit (like update some comments) and see whether there are two "Windows CPU CI Pipeline" in Required Statuses? If there are still two "Windows CPU CI Pipeline" required after the commit, you will need create a new pull request instead. Sorry for the inconvenience. |
/azp run Linux CPU CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,MacOS CI Pipeline,MacOS NoContribops CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline |
/azp run orttraining-linux-ci-pipeline,orttraining-mac-ci-pipeline,orttraining-linux-gpu-ci-pipeline,centos7_cpu,Linux CPU Minimal Build E2E CI Pipeline,Linux Nuphar CI Pipeline,MacOS NoContribops CI Pipeline,Linux OpenVINO CI Pipeline,orttraining-distributed,orttraining-amd-gpu-ci-pipeline |
Azure Pipelines successfully started running 9 pipeline(s). |
Azure Pipelines successfully started running 10 pipeline(s). |
@tianleiwu Is there anything I can do for all these failing pipelines? Many of them seems to be unrelated to the changes of this PR, but not sure. Let me know 👌🏻 |
@mfuntowicz, you will need integrate master to your branch so that it does not show "14 commits behind microsoft:master.". The branch missed a commit of fix for CI Pipeline: ed1883a |
8d906d3
to
93593ae
Compare
@tianleiwu I just rebased the PR on master, let me know 👍🏻 |
Following @tianleiwu implementation for non squared (i.e. pruned) attention layer this PR introduces the necessary machinery to fuse Attention layer when optimizing BERT model through
onnxruntime_tools
optimizer.What does this PR does:
(self.hidden_size, self.hidden_size)
for QKV weightsin_size
&out_size
mapping -respectively- the incoming number of features and the projected number of features for the matmul operator.num_heads
meta attribute according point 3.