Ability to fuse non-square (pruned) attention weights for BERT-like models #6850

mfuntowicz · 2021-03-01T14:38:52Z

Following @tianleiwu implementation for non squared (i.e. pruned) attention layer this PR introduces the necessary machinery to fuse Attention layer when optimizing BERT model through onnxruntime_tools optimizer.

What does this PR does:

Do no assume squared matrices of shape (self.hidden_size, self.hidden_size) for QKV weights
Split up weights shape into a more generic in_size & out_size mapping -respectively- the incoming number of features and the projected number of features for the matmul operator.
Infer the hidden size representation for each individual head in order to correctly infer the dynamic number of heads in each Attention layer.
Correctly specify the num_heads meta attribute according point 3.

tianleiwu

Looks good to me.

This change supports attention head pruning (hidden size per head remains the same).

snnn · 2021-03-02T19:25:18Z

/azp run Windows CPU CI Pipeline

azure-pipelines · 2021-03-02T19:25:27Z

Azure Pipelines successfully started running 1 pipeline(s).

tianleiwu · 2021-03-03T00:01:15Z

@mfuntowicz , I triggered a wrong "Windows CPU CI Pipeline" by mistake. It blocks the merging of this pull request to master. Could you add a dummy commit (like update some comments) and see whether there are two "Windows CPU CI Pipeline" in Required Statuses? If there are still two "Windows CPU CI Pipeline" required after the commit, you will need create a new pull request instead. Sorry for the inconvenience.

tianleiwu · 2021-03-03T14:31:14Z

/azp run Linux CPU CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,MacOS CI Pipeline,MacOS NoContribops CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline

tianleiwu · 2021-03-03T14:31:23Z

/azp run orttraining-linux-ci-pipeline,orttraining-mac-ci-pipeline,orttraining-linux-gpu-ci-pipeline,centos7_cpu,Linux CPU Minimal Build E2E CI Pipeline,Linux Nuphar CI Pipeline,MacOS NoContribops CI Pipeline,Linux OpenVINO CI Pipeline,orttraining-distributed,orttraining-amd-gpu-ci-pipeline

azure-pipelines · 2021-03-03T14:31:55Z

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines · 2021-03-03T14:32:10Z

Azure Pipelines successfully started running 10 pipeline(s).

mfuntowicz · 2021-03-03T16:38:14Z

@tianleiwu Is there anything I can do for all these failing pipelines? Many of them seems to be unrelated to the changes of this PR, but not sure.

Let me know 👌🏻

tianleiwu · 2021-03-04T07:03:30Z

@mfuntowicz, you will need integrate master to your branch so that it does not show "14 commits behind microsoft:master.". The branch missed a commit of fix for CI Pipeline: ed1883a

mfuntowicz · 2021-03-04T22:48:04Z

@tianleiwu I just rebased the PR on master, let me know 👍🏻

mfuntowicz requested a review from a team as a code owner March 1, 2021 14:38

mfuntowicz marked this pull request as draft March 1, 2021 14:39

mfuntowicz marked this pull request as ready for review March 1, 2021 22:48

tianleiwu previously approved these changes Mar 2, 2021

View reviewed changes

mfuntowicz added 3 commits March 4, 2021 23:35

Ability to fuse non-square (pruned) attention weights.

76e80db

Fix invalid comment.

0735c80

Trigger CI

93593ae

mfuntowicz dismissed tianleiwu’s stale review via 93593ae March 4, 2021 22:36

mfuntowicz force-pushed the hf_fuse_pruned_attention branch from 8d906d3 to 93593ae Compare March 4, 2021 22:36

tianleiwu approved these changes Mar 5, 2021

View reviewed changes

tianleiwu merged commit 9126faa into microsoft:master Mar 5, 2021

mfuntowicz deleted the hf_fuse_pruned_attention branch April 1, 2021 12:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to fuse non-square (pruned) attention weights for BERT-like models #6850

Ability to fuse non-square (pruned) attention weights for BERT-like models #6850

mfuntowicz commented Mar 1, 2021 •

edited

Loading

tianleiwu left a comment •

edited

Loading

snnn commented Mar 2, 2021

azure-pipelines bot commented Mar 2, 2021

tianleiwu commented Mar 3, 2021 •

edited

Loading

tianleiwu commented Mar 3, 2021

tianleiwu commented Mar 3, 2021

azure-pipelines bot commented Mar 3, 2021

azure-pipelines bot commented Mar 3, 2021

mfuntowicz commented Mar 3, 2021 •

edited

Loading

tianleiwu commented Mar 4, 2021 •

edited

Loading

mfuntowicz commented Mar 4, 2021 •

edited

Loading

Ability to fuse non-square (pruned) attention weights for BERT-like models #6850

Ability to fuse non-square (pruned) attention weights for BERT-like models #6850

Conversation

mfuntowicz commented Mar 1, 2021 • edited Loading

tianleiwu left a comment • edited Loading

Choose a reason for hiding this comment

snnn commented Mar 2, 2021

azure-pipelines bot commented Mar 2, 2021

tianleiwu commented Mar 3, 2021 • edited Loading

tianleiwu commented Mar 3, 2021

tianleiwu commented Mar 3, 2021

azure-pipelines bot commented Mar 3, 2021

azure-pipelines bot commented Mar 3, 2021

mfuntowicz commented Mar 3, 2021 • edited Loading

tianleiwu commented Mar 4, 2021 • edited Loading

mfuntowicz commented Mar 4, 2021 • edited Loading

mfuntowicz commented Mar 1, 2021 •

edited

Loading

tianleiwu left a comment •

edited

Loading

tianleiwu commented Mar 3, 2021 •

edited

Loading

mfuntowicz commented Mar 3, 2021 •

edited

Loading

tianleiwu commented Mar 4, 2021 •

edited

Loading

mfuntowicz commented Mar 4, 2021 •

edited

Loading