Skip to content

Commit

Permalink
update quantization doc (#2783)
Browse files Browse the repository at this point in the history
* update documentation for quantization script

* plus some spell corrections
  • Loading branch information
askhade authored Jan 13, 2020
1 parent c4e4abc commit cc75e5a
Showing 1 changed file with 26 additions and 1 deletion.
27 changes: 26 additions & 1 deletion onnxruntime/python/tools/quantization/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,30 @@
# Quantization tool Overview
This tool supports quantization of an onnx model. quantize() takes a model in ModelProto format and returns the quantized model in ModelProto format.
This tool supports 8 bit linear quantization of an onnx model. quantize() takes a model in ModelProto format and returns the quantized model in ModelProto format.
Today ORT does not guarantee support for E2E model quantization, meaning all ONNX ops do not have support for 8 bit data types therefore only the supported ops in the model are quantized. For rest of the ops inputs are reconverted to FP32.

List of Supported Quantized Ops:
The following ops were chosen as phase 1 ops because in most of the CNN models these ops consume most amount of compute and power and therefore there is benefit in quantizing these ops to get perf benefits.
* Convolution
* Matmul
* Data type agnostic ops like transpose, identity etc ( Note: special quantization is not done for these ops. )

## Quantization specifics
ONNX implements 8 bit linear quantization. During quantization the floating point real values are mapped to a 8 bit quantization space and it is of the form :
VAL_fp32 = Scale * (VAL_quantized - Zero_point)

Scale is a positive real number used to map the floating point numbers to a quantization space. It is calculated as follows :
For unsigned 8 bit
```
scale = (data_rage_max - data_range_min) / (quantization_range_max - quantization_range_min)
```

For signed 8 bit
```
scale = Abs(data_rage_max, data_range_min) * 2 / (quantization_range_max - quantization_range_min)
```

Zero point represents zero in quantization space. It is important that floating point zero value be exactly representable in quantization space. This is because in lot of CNNs, zero padding is used and if after quantization it is not possible to represent 0 uniquely then it will lead to accuracy errors.


## Quantize an ONNX model
```python
Expand Down

0 comments on commit cc75e5a

Please sign in to comment.