Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Refactoring model compression doc #2919

Merged
merged 100 commits into from
Oct 10, 2020
Merged
Show file tree
Hide file tree
Changes from 95 commits
Commits
Show all changes
100 commits
Select commit Hold shift + click to select a range
3a45961
Merge pull request #31 from microsoft/master
chicm-ms Aug 6, 2019
633db43
Merge pull request #32 from microsoft/master
chicm-ms Sep 9, 2019
3e926f1
Merge pull request #33 from microsoft/master
chicm-ms Oct 8, 2019
f173789
Merge pull request #34 from microsoft/master
chicm-ms Oct 9, 2019
508850a
Merge pull request #35 from microsoft/master
chicm-ms Oct 9, 2019
5a0e9c9
Merge pull request #36 from microsoft/master
chicm-ms Oct 10, 2019
e7df061
Merge pull request #37 from microsoft/master
chicm-ms Oct 23, 2019
2175cef
Merge pull request #38 from microsoft/master
chicm-ms Oct 29, 2019
2ccbfbb
Merge pull request #39 from microsoft/master
chicm-ms Oct 30, 2019
b29cb0b
Merge pull request #40 from microsoft/master
chicm-ms Oct 30, 2019
4a3ba83
Merge pull request #41 from microsoft/master
chicm-ms Nov 4, 2019
c8a1148
Merge pull request #42 from microsoft/master
chicm-ms Nov 4, 2019
73c6101
Merge pull request #43 from microsoft/master
chicm-ms Nov 5, 2019
6a518a9
Merge pull request #44 from microsoft/master
chicm-ms Nov 11, 2019
a0d587f
Merge pull request #45 from microsoft/master
chicm-ms Nov 12, 2019
e905bfe
Merge pull request #46 from microsoft/master
chicm-ms Nov 14, 2019
4b266f3
Merge pull request #47 from microsoft/master
chicm-ms Nov 15, 2019
237ff4b
Merge pull request #48 from microsoft/master
chicm-ms Nov 21, 2019
682be01
Merge pull request #49 from microsoft/master
chicm-ms Nov 25, 2019
133af82
Merge pull request #50 from microsoft/master
chicm-ms Nov 25, 2019
71a8a25
Merge pull request #51 from microsoft/master
chicm-ms Nov 26, 2019
d2a73bc
Merge pull request #52 from microsoft/master
chicm-ms Nov 26, 2019
198cf5e
Merge pull request #53 from microsoft/master
chicm-ms Dec 5, 2019
cdbfaf9
Merge pull request #54 from microsoft/master
chicm-ms Dec 6, 2019
7e9b29e
Merge pull request #55 from microsoft/master
chicm-ms Dec 10, 2019
d00c46d
Merge pull request #56 from microsoft/master
chicm-ms Dec 10, 2019
de7d1fa
Merge pull request #57 from microsoft/master
chicm-ms Dec 11, 2019
1835ab0
Merge pull request #58 from microsoft/master
chicm-ms Dec 12, 2019
24fead6
Merge pull request #59 from microsoft/master
chicm-ms Dec 20, 2019
0b7321e
Merge pull request #60 from microsoft/master
chicm-ms Dec 23, 2019
60058d4
Merge pull request #61 from microsoft/master
chicm-ms Dec 23, 2019
b111a55
Merge pull request #62 from microsoft/master
chicm-ms Dec 24, 2019
611c337
Merge pull request #63 from microsoft/master
chicm-ms Dec 30, 2019
4a1f14a
Merge pull request #64 from microsoft/master
chicm-ms Jan 10, 2020
7a9e604
Merge pull request #65 from microsoft/master
chicm-ms Jan 14, 2020
b8035b0
Merge pull request #66 from microsoft/master
chicm-ms Feb 4, 2020
47567d3
Merge pull request #67 from microsoft/master
chicm-ms Feb 10, 2020
614d427
Merge pull request #68 from microsoft/master
chicm-ms Feb 10, 2020
a0d9ed6
Merge pull request #69 from microsoft/master
chicm-ms Feb 11, 2020
22dc1ad
Merge pull request #70 from microsoft/master
chicm-ms Feb 19, 2020
0856813
Merge pull request #71 from microsoft/master
chicm-ms Feb 22, 2020
9e97bed
Merge pull request #72 from microsoft/master
chicm-ms Feb 25, 2020
16a1b27
Merge pull request #73 from microsoft/master
chicm-ms Mar 3, 2020
e246633
Merge pull request #74 from microsoft/master
chicm-ms Mar 4, 2020
0439bc1
Merge pull request #75 from microsoft/master
chicm-ms Mar 17, 2020
8b5613a
Merge pull request #76 from microsoft/master
chicm-ms Mar 18, 2020
43e8d31
Merge pull request #77 from microsoft/master
chicm-ms Mar 22, 2020
aae448e
Merge pull request #78 from microsoft/master
chicm-ms Mar 25, 2020
7095716
Merge pull request #79 from microsoft/master
chicm-ms Mar 25, 2020
c51263a
Merge pull request #80 from microsoft/master
chicm-ms Apr 11, 2020
9953c70
Merge pull request #81 from microsoft/master
chicm-ms Apr 14, 2020
f9136c4
Merge pull request #82 from microsoft/master
chicm-ms Apr 16, 2020
b384ad2
Merge pull request #83 from microsoft/master
chicm-ms Apr 20, 2020
ff592dd
Merge pull request #84 from microsoft/master
chicm-ms May 12, 2020
0b5378f
Merge pull request #85 from microsoft/master
chicm-ms May 18, 2020
a53e0b0
Merge pull request #86 from microsoft/master
chicm-ms May 25, 2020
3ea0b89
Merge pull request #87 from microsoft/master
chicm-ms May 28, 2020
cf3fb20
Merge pull request #88 from microsoft/master
chicm-ms May 28, 2020
7f4cdcd
Merge pull request #89 from microsoft/master
chicm-ms Jun 4, 2020
574db2c
Merge pull request #90 from microsoft/master
chicm-ms Jun 15, 2020
32bedcc
Merge pull request #91 from microsoft/master
chicm-ms Jun 21, 2020
6155aa4
Merge pull request #92 from microsoft/master
chicm-ms Jun 22, 2020
8139c9c
Merge pull request #93 from microsoft/master
chicm-ms Jun 23, 2020
43419d7
Merge pull request #94 from microsoft/master
chicm-ms Jun 28, 2020
6b6ee55
Merge pull request #95 from microsoft/master
chicm-ms Jun 28, 2020
1b975e0
Merge pull request #96 from microsoft/master
chicm-ms Jun 28, 2020
c8f3c5d
Merge pull request #97 from microsoft/master
chicm-ms Jun 29, 2020
4c306f0
Merge pull request #98 from microsoft/master
chicm-ms Jun 30, 2020
64de4c2
Merge pull request #99 from microsoft/master
chicm-ms Jun 30, 2020
0e5d3ac
Merge pull request #100 from microsoft/master
chicm-ms Jul 1, 2020
4a52608
Merge pull request #101 from microsoft/master
chicm-ms Jul 3, 2020
208b1ee
Merge pull request #102 from microsoft/master
chicm-ms Jul 8, 2020
e7b1a2e
Merge pull request #103 from microsoft/master
chicm-ms Jul 10, 2020
57bcc85
Merge pull request #104 from microsoft/master
chicm-ms Jul 22, 2020
030f5ef
Merge pull request #105 from microsoft/master
chicm-ms Jul 29, 2020
058c8b7
Merge pull request #106 from microsoft/master
chicm-ms Aug 2, 2020
9abd8c8
Merge pull request #107 from microsoft/master
chicm-ms Aug 10, 2020
13c6623
Merge pull request #108 from microsoft/master
chicm-ms Aug 11, 2020
b50b41e
Merge pull request #109 from microsoft/master
chicm-ms Aug 12, 2020
78f1418
Merge pull request #110 from microsoft/master
chicm-ms Aug 13, 2020
74acc8b
Merge pull request #111 from microsoft/master
chicm-ms Aug 17, 2020
5bf416a
Merge pull request #112 from microsoft/master
chicm-ms Aug 24, 2020
4a207f9
Merge pull request #113 from microsoft/master
chicm-ms Sep 3, 2020
7be897b
Merge pull request #114 from microsoft/master
chicm-ms Sep 16, 2020
f974b2c
Merge pull request #115 from microsoft/master
chicm-ms Sep 17, 2020
0c2f59b
Merge pull request #116 from microsoft/master
chicm-ms Sep 21, 2020
08fbc6d
refactor model compression doc
chicm-ms Sep 22, 2020
8584134
updates
chicm-ms Sep 22, 2020
43311fc
updates
chicm-ms Sep 22, 2020
3c5cef2
Merge pull request #117 from microsoft/master
chicm-ms Sep 25, 2020
ffb816b
rename compressor to compression
chicm-ms Sep 25, 2020
9fd9b6c
Merge branch 'master' into compression-doc
chicm-ms Sep 25, 2020
08db1df
updates
chicm-ms Sep 25, 2020
c68dd55
updates
chicm-ms Sep 25, 2020
e0de6c9
updates
chicm-ms Sep 25, 2020
34f57ac
updates
chicm-ms Sep 28, 2020
6cc2ca9
updates
chicm-ms Sep 28, 2020
9df00b5
Refactor overview
chicm-ms Sep 30, 2020
8907d25
updates
chicm-ms Oct 9, 2020
cbc4f1c
updates
chicm-ms Oct 10, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Automatic Model Compression on NNI
# Automatic Model Pruning using NNI Tuners

It's convenient to implement auto model compression with NNI compression and NNI tuners
It's convenient to implement auto model pruning with NNI compression and NNI tuners

## First, model compression with NNI

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

In order to simplify the process of writing new compression algorithms, we have designed simple and flexible programming interface, which covers pruning and quantization. Below, we first demonstrate how to customize a new pruning algorithm and then demonstrate how to customize a new quantization algorithm.

**Important Note** To better understand how to customize new pruning/quantization algorithms, users should first understand the framework that supports various pruning algorithms in NNI. Reference [Framework overview of model compression](https://nni.readthedocs.io/en/latest/Compressor/Framework.html)
**Important Note** To better understand how to customize new pruning/quantization algorithms, users should first understand the framework that supports various pruning algorithms in NNI. Reference [Framework overview of model compression](https://nni.readthedocs.io/en/latest/Compression/Framework.html)


## Customize a new pruning algorithm
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,20 +27,20 @@ Pruning algorithms compress the original network by removing redundant weights o

|Name|Brief Introduction of Algorithm|
|---|---|
| [Level Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#level-pruner) | Pruning the specified ratio on each weight based on absolute values of weights |
| [AGP Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#agp-pruner) | Automated gradual pruning (To prune, or not to prune: exploring the efficacy of pruning for model compression) [Reference Paper](https://arxiv.org/abs/1710.01878)|
| [Lottery Ticket Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#lottery-ticket-hypothesis) | The pruning process used by "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks". It prunes a model iteratively. [Reference Paper](https://arxiv.org/abs/1803.03635)|
| [FPGM Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#fpgm-pruner) | Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration [Reference Paper](https://arxiv.org/pdf/1811.00250.pdf)|
| [L1Filter Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#l1filter-pruner) | Pruning filters with the smallest L1 norm of weights in convolution layers (Pruning Filters for Efficient Convnets) [Reference Paper](https://arxiv.org/abs/1608.08710) |
| [L2Filter Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#l2filter-pruner) | Pruning filters with the smallest L2 norm of weights in convolution layers |
| [ActivationAPoZRankFilterPruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#activationapozrankfilterpruner) | Pruning filters based on the metric APoZ (average percentage of zeros) which measures the percentage of zeros in activations of (convolutional) layers. [Reference Paper](https://arxiv.org/abs/1607.03250) |
| [ActivationMeanRankFilterPruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#activationmeanrankfilterpruner) | Pruning filters based on the metric that calculates the smallest mean value of output activations |
| [Slim Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#slim-pruner) | Pruning channels in convolution layers by pruning scaling factors in BN layers(Learning Efficient Convolutional Networks through Network Slimming) [Reference Paper](https://arxiv.org/abs/1708.06519) |
| [TaylorFO Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#taylorfoweightfilterpruner) | Pruning filters based on the first order taylor expansion on weights(Importance Estimation for Neural Network Pruning) [Reference Paper](http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf) |
| [ADMM Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#admm-pruner) | Pruning based on ADMM optimization technique [Reference Paper](https://arxiv.org/abs/1804.03294) |
| [NetAdapt Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#netadapt-pruner) | Automatically simplify a pretrained network to meet the resource budget by iterative pruning [Reference Paper](https://arxiv.org/abs/1804.03230) |
| [SimulatedAnnealing Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#simulatedannealing-pruner) | Automatic pruning with a guided heuristic search method, Simulated Annealing algorithm [Reference Paper](https://arxiv.org/abs/1907.03141) |
| [AutoCompress Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#autocompress-pruner) | Automatic pruning by iteratively call SimulatedAnnealing Pruner and ADMM Pruner [Reference Paper](https://arxiv.org/abs/1907.03141) |
| [Level Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#level-pruner) | Pruning the specified ratio on each weight based on absolute values of weights |
| [AGP Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#agp-pruner) | Automated gradual pruning (To prune, or not to prune: exploring the efficacy of pruning for model compression) [Reference Paper](https://arxiv.org/abs/1710.01878)|
| [Lottery Ticket Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#lottery-ticket-hypothesis) | The pruning process used by "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks". It prunes a model iteratively. [Reference Paper](https://arxiv.org/abs/1803.03635)|
| [FPGM Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#fpgm-pruner) | Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration [Reference Paper](https://arxiv.org/pdf/1811.00250.pdf)|
| [L1Filter Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#l1filter-pruner) | Pruning filters with the smallest L1 norm of weights in convolution layers (Pruning Filters for Efficient Convnets) [Reference Paper](https://arxiv.org/abs/1608.08710) |
| [L2Filter Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#l2filter-pruner) | Pruning filters with the smallest L2 norm of weights in convolution layers |
| [ActivationAPoZRankFilterPruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#activationapozrankfilterpruner) | Pruning filters based on the metric APoZ (average percentage of zeros) which measures the percentage of zeros in activations of (convolutional) layers. [Reference Paper](https://arxiv.org/abs/1607.03250) |
| [ActivationMeanRankFilterPruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#activationmeanrankfilterpruner) | Pruning filters based on the metric that calculates the smallest mean value of output activations |
| [Slim Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#slim-pruner) | Pruning channels in convolution layers by pruning scaling factors in BN layers(Learning Efficient Convolutional Networks through Network Slimming) [Reference Paper](https://arxiv.org/abs/1708.06519) |
| [TaylorFO Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#taylorfoweightfilterpruner) | Pruning filters based on the first order taylor expansion on weights(Importance Estimation for Neural Network Pruning) [Reference Paper](http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf) |
| [ADMM Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#admm-pruner) | Pruning based on ADMM optimization technique [Reference Paper](https://arxiv.org/abs/1804.03294) |
| [NetAdapt Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#netadapt-pruner) | Automatically simplify a pretrained network to meet the resource budget by iterative pruning [Reference Paper](https://arxiv.org/abs/1804.03230) |
| [SimulatedAnnealing Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#simulatedannealing-pruner) | Automatic pruning with a guided heuristic search method, Simulated Annealing algorithm [Reference Paper](https://arxiv.org/abs/1907.03141) |
| [AutoCompress Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#autocompress-pruner) | Automatic pruning by iteratively call SimulatedAnnealing Pruner and ADMM Pruner [Reference Paper](https://arxiv.org/abs/1907.03141) |

You can refer to this [benchmark](https://github.com/microsoft/nni/tree/master/docs/en_US/CommunitySharings/ModelCompressionComparison.md) for the performance of these pruners on some benchmark problems.

Expand All @@ -50,14 +50,14 @@ Quantization algorithms compress the original network by reducing the number of

|Name|Brief Introduction of Algorithm|
|---|---|
| [Naive Quantizer](https://nni.readthedocs.io/en/latest/Compressor/Quantizer.html#naive-quantizer) | Quantize weights to default 8 bits |
| [QAT Quantizer](https://nni.readthedocs.io/en/latest/Compressor/Quantizer.html#qat-quantizer) | Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. [Reference Paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf)|
| [DoReFa Quantizer](https://nni.readthedocs.io/en/latest/Compressor/Quantizer.html#dorefa-quantizer) | DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. [Reference Paper](https://arxiv.org/abs/1606.06160)|
| [BNN Quantizer](https://nni.readthedocs.io/en/latest/Compressor/Quantizer.html#bnn-quantizer) | Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. [Reference Paper](https://arxiv.org/abs/1602.02830)|
| [Naive Quantizer](https://nni.readthedocs.io/en/latest/Compression/Quantizer.html#naive-quantizer) | Quantize weights to default 8 bits |
| [QAT Quantizer](https://nni.readthedocs.io/en/latest/Compression/Quantizer.html#qat-quantizer) | Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. [Reference Paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf)|
| [DoReFa Quantizer](https://nni.readthedocs.io/en/latest/Compression/Quantizer.html#dorefa-quantizer) | DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. [Reference Paper](https://arxiv.org/abs/1606.06160)|
| [BNN Quantizer](https://nni.readthedocs.io/en/latest/Compression/Quantizer.html#bnn-quantizer) | Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. [Reference Paper](https://arxiv.org/abs/1602.02830)|

## Automatic Model Compression

Given targeted compression ratio, it is pretty hard to obtain the best compressed ratio in a one shot manner. An automatic model compression algorithm usually need to explore the compression space by compressing different layers with different sparsities. NNI provides such algorithms to free users from specifying sparsity of each layer in a model. Moreover, users could leverage NNI's auto tuning power to automatically compress a model. Detailed document can be found [here](./AutoCompression.md).
Given targeted compression ratio, it is pretty hard to obtain the best compressed ratio in a one shot manner. An automatic model compression algorithm usually need to explore the compression space by compressing different layers with different sparsities. NNI provides such algorithms to free users from specifying sparsity of each layer in a model. Moreover, users could leverage NNI's auto tuning power to automatically compress a model. Detailed document can be found [here](./AutoPruningUsingTuners.md).

## Model Speedup

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ In this tutorial, we use the [first section](#quick-start-to-compress-a-model) t

## Quick Start to Compress a Model

NNI provides very simple APIs for compressing a model. The compression includes pruning algorithms and quantization algorithms. The usage of them are the same, thus, here we use [slim pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#slim-pruner) as an example to show the usage.
NNI provides very simple APIs for compressing a model. The compression includes pruning algorithms and quantization algorithms. The usage of them are the same, thus, here we use [slim pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#slim-pruner) as an example to show the usage.

### Write configuration

Expand Down Expand Up @@ -175,7 +175,7 @@ In this example, 'op_names' is the name of layer and four layers will be quantiz

### APIs for Updating Fine Tuning Status

Some compression algorithms use epochs to control the progress of compression (e.g. [AGP](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#agp-pruner)), and some algorithms need to do something after every minibatch. Therefore, we provide another two APIs for users to invoke: `pruner.update_epoch(epoch)` and `pruner.step()`.
Some compression algorithms use epochs to control the progress of compression (e.g. [AGP](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#agp-pruner)), and some algorithms need to do something after every minibatch. Therefore, we provide another two APIs for users to invoke: `pruner.update_epoch(epoch)` and `pruner.step()`.

`update_epoch` should be invoked in every epoch, while `step` should be invoked after each minibatch. Note that most algorithms do not require calling the two APIs. Please refer to each algorithm's document for details. For the algorithms that do not need them, calling them is allowed but has no effect.

Expand Down
26 changes: 26 additions & 0 deletions docs/en_US/Compression/pruning.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#################
Pruning
#################

A common technique to increase sparsity in neural network model weights and activations is pruning.
chicm-ms marked this conversation as resolved.
Show resolved Hide resolved
The pruning methods explore the redundancy in the model weights(parameters) and try to remove/prune the redundant and uncritical weights.
The redundant elements are pruned from the model, their values are zeroed and we make sure they don't take part in the back-propagation process.
chicm-ms marked this conversation as resolved.
Show resolved Hide resolved

From pruning granularity perspective, fine-grained pruning refers to pruning each individual weights seperately.
chicm-ms marked this conversation as resolved.
Show resolved Hide resolved
Coase-grained pruning or structured pruning is pruning entire group of elements, such as convolutional filter.
chicm-ms marked this conversation as resolved.
Show resolved Hide resolved

NNI provides multiple fine-grained pruning and structured pruning algorithms.
chicm-ms marked this conversation as resolved.
Show resolved Hide resolved
It supports Tensorflow and PyTorch with unified interface.
For users to prune their models, they only need to add several lines in their code.
For the structured filter pruning, NNI also provides a dependency-aware mode. In the dependency-aware mode, the
filter pruner will get better speed gain after the speedup.

For details, please refer to the following tutorials:

.. toctree::
:maxdepth: 2

Pruners <Pruner>
Dependency Aware Mode <DependencyAware>
Model Speedup <ModelSpeedup>
Automatic Model Pruning with NNI Tuners <AutoPruningUsingTuners>
17 changes: 17 additions & 0 deletions docs/en_US/Compression/quantization.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#################
Quantization
#################

Quantization refers to compressing models by reducing the number of bits required to represent weights or activations,
which can reduce the computations and the inference time. In the context of deep neural networks, the major numerical
format for model weights is 32-bit float, or FP32. Many research works have demonstrated that weights and activations
can be represented using 8-bit integers without significant loss in accuracy. Even lower bit-widths, such as 4/2/1 bits,
is an active field of research.

A quntizer is a quantization algorithm implementation in NNI, NNI provides multiple quntizers as below. You can also
chicm-ms marked this conversation as resolved.
Show resolved Hide resolved
create you own quntizer using NNI model compression inteface.
chicm-ms marked this conversation as resolved.
Show resolved Hide resolved

.. toctree::
:maxdepth: 2

Quntizers <Quantizer>
chicm-ms marked this conversation as resolved.
Show resolved Hide resolved
26 changes: 16 additions & 10 deletions docs/en_US/model_compression.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,15 @@
Model Compression
#################

NNI provides an easy-to-use toolkit to help user design and use compression algorithms.
Deep neural networks (DNNs) have achieved great success in many tasks. However, typical neural networks are both
computationally expensive and energy intensive, can be difficult to be deployed on devices with low computation
resources or with strict latency requirements. Therefore, a natural thought is to perform model compression to
reduce model size and accelarete model training/inference without losing performance significantly. Model compression
chicm-ms marked this conversation as resolved.
Show resolved Hide resolved
techniques can be divided into two categories: pruning and quantization. The pruning methods explore the redundancy
in the model weights and try to remove/prune the redundant and uncritical weights. Quantization refers to compressing
models by reducing the number of bits required to represent weights or activations.

NNI provides an easy-to-use toolkit to help user design and use model pruning and quantization algorithms.
It supports Tensorflow and PyTorch with unified interface.
For users to compress their models, they only need to add several lines in their code.
There are some popular model compression algorithms built-in in NNI.
Expand All @@ -15,12 +23,10 @@ For details, please refer to the following tutorials:
.. toctree::
:maxdepth: 2

Overview <Compressor/Overview>
Quick Start <Compressor/QuickStart>
Pruning <pruning>
Quantizers <Compressor/Quantizer>
Automatic Model Compression <Compressor/AutoCompression>
Model Speedup <Compressor/ModelSpeedup>
Compression Utilities <Compressor/CompressionUtils>
Compression Framework <Compressor/Framework>
Customize Compression Algorithms <Compressor/CustomizeCompressor>
Overview <Compression/Overview>
Quick Start <Compression/QuickStart>
Pruning <Compression/pruning>
Quantization <Compression/quantization>
Utilities <Compression/CompressionUtils>
Framework <Compression/Framework>
Customize Model Compression Algorithms <Compression/CustomizeCompressor>
17 changes: 0 additions & 17 deletions docs/en_US/pruning.rst

This file was deleted.

2 changes: 1 addition & 1 deletion docs/en_US/sdk_reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,5 @@ Python API Reference

Auto Tune <autotune_ref>
NAS <NAS/NasReference>
Compression Utilities <Compressor/CompressionReference>
Compression Utilities <Compression/CompressionReference>
NNI Client <nnicli_ref>