mdigits/quantization/readme.md at master · AntonOkhotnikov/mdigits · GitHub

Quantization

Here .nemo -> .onnx -> .quant.onnx transform is used
Original .nemo model could be converted to .onnx as follows:

git clone https://github.com/NVIDIA/NeMo.git
cd NeMo/scripts/export
python3 convasr_to_single_onnx.py --nemo_file <path_to>/QuartzNet5x1Aug.nemo --onnx_file <path_to>/QuartzNet5x1Aug.onnx --model_type asr

.onnx -> .quant.onnx transform could be performed using script in this folder as follows:

python3 onnx_quantize.py <fp32_model_path>.onnx <int8_model_path>.quant.onnx>

Both int8 and fp32 models could be evaluated using the scripts in ../evaluation
ONNX-quantized and PyTorch fp32 models outputs currently don't match, e.g.: 897*868 vs 89*47. Quantization still requires further work to match outputs