Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Doc for quantization #1881

Merged
merged 11 commits into from
Dec 29, 2019
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 66 additions & 14 deletions docs/en_US/Compressor/Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ We have provided several compression algorithms, including several pruning and q
| [Naive Quantizer](./Quantizer.md#naive-quantizer) | Quantize weights to default 8 bits |
| [QAT Quantizer](./Quantizer.md#qat-quantizer) | Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. [Reference Paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf)|
| [DoReFa Quantizer](./Quantizer.md#dorefa-quantizer) | DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. [Reference Paper](https://arxiv.org/abs/1606.06160)|
| [BNN Quantizer](./Quantizer.md#BNN-Quantizer) | Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. [Reference Paper](https://arxiv.org/abs/1602.02830)|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to have a brief description of the concept Quantization before the table, same for Pruning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add a reference link to recent survey of model compression and brief introduction of Quantization and Pruning.


## Usage of built-in compression algorithms

Expand Down Expand Up @@ -61,17 +62,34 @@ The function call `pruner.compress()` modifies user defined model (in Tensorflow
When instantiate a compression algorithm, there is `config_list` passed in. We describe how to write this config below.

### User configuration for a compression algorithm
When compressing a model, users may want to specify the ratio for sparsity, to specify different ratios for different types of operations, to exclude certain types of operations, or to compress only a certain types of operations. For users to express these kinds of requirements, we define a configuration specification. It can be seen as a python `list` object, where each element is a `dict` object.

When compressing a model, users may want to specify the ratio for sparsity, to specify different ratios for different types of operations, to exclude certain types of operations, or to compress only a certain types of operations. For users to express these kinds of requirements, we define a configuration specification. It can be seen as a python `list` object, where each element is a `dict` object. In each `dict`, there are some keys commonly supported by NNI compression:
The `dict`s in the `list` are applied one by one, that is, the configurations in latter `dict` will overwrite the configurations in former ones on the operations that are within the scope of both of them.

#### Common keys
In each `dict`, there are some keys commonly supported by NNI compression:

* __op_types__: This is to specify what types of operations to be compressed. 'default' means following the algorithm's default setting.
* __op_names__: This is to specify by name what operations to be compressed. If this field is omitted, operations will not be filtered by it.
* __exclude__: Default is False. If this field is True, it means the operations with specified types and names will be excluded from the compression.

There are also other keys in the `dict`, but they are specific for every compression algorithm. For example, some , some.
#### Keys for quantization algorithms
**If you use quantization algorithms, you need to especify more keys. If you use pruning algorithms, you can safely skip these keys**
Cjkkkk marked this conversation as resolved.
Show resolved Hide resolved

The `dict`s in the `list` are applied one by one, that is, the configurations in latter `dict` will overwrite the configurations in former ones on the operations that are within the scope of both of them.
* __quant_types__ : list of string.

Type of quantization you want to apply, currently support 'weight', 'input', 'output'.
Cjkkkk marked this conversation as resolved.
Show resolved Hide resolved

* __quant_bits__ : int or dict of {str : int}

bits length of quantization, key is the quantization type, value is the length, eg. {'weight': 8},
when the type is int, all quantization types share same bits length.
Cjkkkk marked this conversation as resolved.
Show resolved Hide resolved

#### Other keys specified for every compression algorithm
There are also other keys in the `dict`, but they are specific for every compression algorithm. For example, [Level Pruner](./Pruner.md#level-pruner) requires `sparsity` key to specify how much a model should be pruned.


#### example
A simple example of configuration is shown below:

```python
Expand Down Expand Up @@ -183,11 +201,9 @@ Some algorithms may want global information for generating masks, for example, a
The interface for customizing quantization algorithm is similar to that of pruning algorithms. The only difference is that `calc_mask` is replaced with `quantize_weight`. `quantize_weight` directly returns the quantized weights rather than mask, because for quantization the quantized weights cannot be obtained by applying mask.

```python
# This is writing a Quantizer in tensorflow.
# For writing a Quantizer in PyTorch, you can simply replace
# nni.compression.tensorflow.Quantizer with
# nni.compression.torch.Quantizer
class YourQuantizer(nni.compression.tensorflow.Quantizer):
from nni.compression.torch.compressor import Quantizer

class YourQuantizer(Quantizer):
def __init__(self, model, config_list):
"""
Suggest you to use the NNI defined spec for config
Expand Down Expand Up @@ -245,19 +261,55 @@ class YourQuantizer(nni.compression.tensorflow.Quantizer):

return new_input

# note for pytorch version, there is no sess in input arguments
def update_epoch(self, epoch_num, sess):
def update_epoch(self, epoch_num):
pass

# note for pytorch version, there is no sess in input arguments
def step(self, sess):
def step(self):
"""
Can do some processing based on the model or weights binded
in the func bind_model
"""
pass
```
#### Customize backward function
Sometimes it's necessary for a quantization operation to have a customized backward function, such as [Straight-Through Estimator](https://stackoverflow.com/questions/38361314/the-concept-of-straight-through-estimator-ste), user can customize a backward function as follow:

### Usage of user customized compression algorithm
```python
from nni.compression.torch.compressor import Quantizer, QuantGrad, QuantType

class ClipGrad(QuantGrad):
@staticmethod
def quant_backward(tensor, grad_output, quant_type):
"""
This method should be overrided by subclass to provide customized backward function,
default implementation is Straight-Through Estimator
Parameters
----------
tensor : Tensor
input of quantization operation
grad_output : Tensor
gradient of the output of quantization operation
quant_type : QuantType
the type of quantization, it can be `QuantType.QUANT_INPUT`, `QuantType.QUANT_WEIGHT`, `QuantType.QUANT_OUTPUT`,
you can define different behavior for different types.
Returns
-------
tensor
gradient of the input of quantization operation
"""

# for quant_output function, set grad to zero if the absolute value of tensor is larger than 1
if quant_type == QuantType.QUANT_OUTPUT:
grad_output[torch.abs(tensor) > 1] = 0
return grad_output


class YourQuantizer(Quantizer):
def __init__(self, model, config_list):
super().__init__(model, config_list)
# set your customized backward function to overwrite default backward function
self.quant_grad = ClipGrad

```

__[TODO]__ ...
If you do not customize `QuantGrad`, the default backward is Straight-Through Estimator.
55 changes: 8 additions & 47 deletions docs/en_US/Compressor/Quantizer.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,22 +51,9 @@ quantizer.compress()
You can view example for more information

#### User configuration for QAT Quantizer
* **quant_types:** : list of string
common configuration needed by compression algorithms can be found at : [Common configuration](./Overview.md#User-configuration-for-a-compression-algorithm)

type of quantization you want to apply, currently support 'weight', 'input', 'output'.

* **op_types:** list of string

specify the type of modules that will be quantized. eg. 'Conv2D'

* **op_names:** list of string

specify the name of modules that will be quantized. eg. 'conv1'

* **quant_bits:** int or dict of {str : int}

bits length of quantization, key is the quantization type, value is the length, eg. {'weight': 8},
when the type is int, all quantization types share same bits length.
configuration needed by this algorithm :

* **quant_start_step:** int

Expand Down Expand Up @@ -98,22 +85,9 @@ quantizer.compress()
You can view example for more information

#### User configuration for DoReFa Quantizer
* **quant_types:** : list of string

type of quantization you want to apply, currently support 'weight', 'input', 'output'.

* **op_types:** list of string

specify the type of modules that will be quantized. eg. 'Conv2D'

* **op_names:** list of string

specify the name of modules that will be quantized. eg. 'conv1'
common configuration needed by compression algorithms can be found at : [Common configuration](./Overview.md#User-configuration-for-a-compression-algorithm)

* **quant_bits:** int or dict of {str : int}

bits length of quantization, key is the quantization type, value is the length, eg. {'weight': 8},
when the type is int, all quantization types share same bits length.
configuration needed by this algorithm :


## BNN Quantizer
Expand All @@ -130,13 +104,13 @@ from nni.compression.torch import BNNQuantizer
model = VGG_Cifar10(num_classes=10)

configure_list = [{
'quant_types': ['weight'],
'quant_bits': 1,
'quant_types': ['weight'],
'op_types': ['Conv2d', 'Linear'],
'op_names': ['features.0', 'features.3', 'features.7', 'features.10', 'features.14', 'features.17', 'classifier.0', 'classifier.3']
}, {
'quant_types': ['output'],
'quant_bits': 1,
'quant_types': ['output'],
'op_types': ['Hardtanh'],
'op_names': ['features.6', 'features.9', 'features.13', 'features.16', 'features.20', 'classifier.2', 'classifier.5']
}]
Expand All @@ -148,22 +122,9 @@ model = quantizer.compress()
You can view example [examples/model_compress/BNN_quantizer_cifar10.py]( https://github.com/microsoft/nni/tree/master/examples/model_compress/BNN_quantizer_cifar10.py) for more information.

#### User configuration for BNN Quantizer
* **quant_types:** : list of string

type of quantization you want to apply, currently support 'weight', 'input', 'output'.

* **op_types:** list of string

specify the type of modules that will be quantized. eg. 'Conv2D'

* **op_names:** list of string

specify the name of modules that will be quantized. eg. 'conv1'

* **quant_bits:** int or dict of {str : int}
common configuration needed by compression algorithms can be found at : [Common configuration](./Overview.md#User-configuration-for-a-compression-algorithm)

bits length of quantization, key is the quantization type, value is the length, eg. {'weight': 8},
when the type is int, all quantization types share same bits length.
configuration needed by this algorithm :

### Experiment
We implemented one of the experiments in [Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1](https://arxiv.org/abs/1602.02830), we quantized the **VGGNet** for CIFAR-10 in the paper. Our experiments results are as follows:
Expand Down