Skip to content

Latest commit

 

History

History
19 lines (14 loc) · 792 Bytes

readme.md

File metadata and controls

19 lines (14 loc) · 792 Bytes

Quantization

  • Here .nemo -> .onnx -> .quant.onnx transform is used

  • Original .nemo model could be converted to .onnx as follows:

git clone https://github.com/NVIDIA/NeMo.git
cd NeMo/scripts/export
python3 convasr_to_single_onnx.py --nemo_file <path_to>/QuartzNet5x1Aug.nemo --onnx_file <path_to>/QuartzNet5x1Aug.onnx --model_type asr
  • .onnx -> .quant.onnx transform could be performed using script in this folder as follows:
python3 onnx_quantize.py <fp32_model_path>.onnx <int8_model_path>.quant.onnx>
  • Both int8 and fp32 models could be evaluated using the scripts in ../evaluation

  • ONNX-quantized and PyTorch fp32 models outputs currently don't match, e.g.: 897*868 vs 89*47. Quantization still requires further work to match outputs