Quantize Inception V3 by Intel® Extension for Tensorflow* on Intel® Xeon®

Background

Intel® Extension for Tensorflow* provides quantization feature by cooperating with Intel® Neural Compressor and oneDNN Graph. It will provide better quantization: better performance and accuracy loss under control.

Intel® Neural Compressor executes the calibration process to output the QDQ quantization model which inserts Quantize and Dequantize layers to includes help information for quantization.

When use Intel® Extension for Tensorflow* to execute the inference of this model, oneDNN Graph will be called to quantize and optimize the model. Then the quantized model will be executed by Intel® Extension for Tensorflow* and accelerated by Intel® Deep Learning Boost or Intel® Advanced Matrix Extensions on Intel® Xeon®.

Introduction

The example shows an end-to-end pipeline:

Train an Inception V3 model with a flower photo dataset by transfer learning.
Execute the calibration by Intel® Neural Compressor.
Quantize and accelerate the inference by Intel® Extension for Tensorflow* for CPU.

Configuration

Intel® Extension for Tensorflow* Version

Please install Intel® Extension for Tensorflow* > 1.1.0 and newer for this feature.

Enable oneDNN Graph

By default, oneDNN Graph is enabled in Intel® Extension for Tensorflow* on CPU for INT8 models.

Enable it explicitly by:

  import os
  os.environ["ITEX_ONEDNN_GRAPH"] = "1"

Disable Constant Folding Function

We need to disable Constant Folding function in 2 stages:

Intel® Neural Compressor creates QDQ quantization model.
Intel® Extension for Tensorflow* executes the oneDNN Graph quantization path.

There are 2 methods to configure:

a. Environment Variable

export ITEX_TF_CONSTANT_FOLDING=0

b. Python API

from tensorflow.core.protobuf import rewriter_config_pb2

infer_config = tf.compat.v1.ConfigProto()
infer_config.graph_options.rewrite_options.constant_folding = rewriter_config_pb2.RewriterConfig.OFF

session = tf.compat.v1.Session(config=infer_config)
tf.compat.v1.keras.backend.set_session(session)

Hardware Environment

CPU

It's recommended to run the example on the Intel® Xeon® which supports Intel® Deep Learning Boost or Intel® Advanced Matrix Extensions.

Without the hardware features above for AI workloads, the performance speedup with FP32 will not be increased much, such as only 1.x.

Check Intel® Deep Learning Boost

In Linux, run command:

lscpu | grep vnni

Check Intel® Advanced Matrix Extensions

In Linux, run command:

lscpu | grep amx

Intel® DevCloud

If you have no such CPU support Intel® Deep Learning Boost or Intel® Advanced Matrix Extensions, you could register to Intel® DevCloud and try this example on new Xeon with Intel® Deep Learning Boost freely. To learn more about working with Intel® DevCloud, please refer to Intel® DevCloud

Running Environment

Install Python 3.7~3.10 supported by Intel® Extension for Tensorflow*.
Create the running environment env_itex.

bash pip_set_env.sh

Activate

source env_itex/bin/activate

Startup Jupyter Notebook

Startup

bash run_jupyter.sh

...
http://xxx.yyy.com:8888/xxxxxxxx

Open the link outputted by Jupyter Notebook in Chrome.
Choose and open the quantize_inception_v3.ipynb in Jupyter Notebook.

Set the kernel to "env_itex".

Execute the code as the guide.

License

Code samples are licensed under the MIT license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Quantize Inception V3 by Intel® Extension for Tensorflow* on Intel® Xeon®

Background

Introduction

Configuration

Intel® Extension for Tensorflow* Version

Enable oneDNN Graph

Disable Constant Folding Function

Hardware Environment

CPU

Check Intel® Deep Learning Boost

Check Intel® Advanced Matrix Extensions

Intel® DevCloud

Running Environment

Startup Jupyter Notebook

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Quantize Inception V3 by Intel® Extension for Tensorflow* on Intel® Xeon®

Background

Introduction

Configuration

Intel® Extension for Tensorflow* Version

Enable oneDNN Graph

Disable Constant Folding Function

Hardware Environment

CPU

Check Intel® Deep Learning Boost

Check Intel® Advanced Matrix Extensions

Intel® DevCloud

Running Environment

Startup Jupyter Notebook

License