Wide & Deep

This document has instructions for how to run Wide & Deep benchmark for the following modes/precisions:

INT8 inference
FP32 inference

Benchmarking instructions and scripts for model training coming later.

INT8 Inference Instructions

Download large Kaggle Display Advertising Challenge Dataset from http://labs.criteo.com/2014/02/kaggle-display-advertising-challenge-dataset/
Pre-process the downloaded dataset to tfrecords using preprocess_csv_tfrecords.py
```
$ python3.6 preprocess_csv_tfrecords.py --csv-datafile eval.csv
```

Download and extract the pre-trained model.

$ wget https://storage.googleapis.com/intel-optimized-tensorflow/models/wide_deep_int8_pretrained_model.pb

Clone the intelai/models repo.

This repo has the launch script for running benchmarks, which we will use in the next step.
```
$ git clone https://github.com/IntelAI/models.git
```

How to run benchmarks

Running benchmarks in latency mode, set --batch-size 1

$ cd /home/myuser/models/benchmarks

$ python launch_benchmark.py 
     --model-name wide_deep_large_ds \
     --precision int8 \
     --mode inference \
     --framework tensorflow \
     --benchmark-only \
     --batch-size 1 \
     --socket-id 0 \
     --docker-image tensorflow/tensorflow:latest-mkl \
     --in-graph /root/user/wide_deep_files/wide_deep_int8_pretrained_model.pb \
     --data-location /root/user/wide_deep_files/preprocessed_eval.tfrecords

Running benchmarks in throughput mode, set --batch-size 1024

$ cd /home/myuser/models/benchmarks

 $ python launch_benchmark.py 
     --model-name wide_deep_large_ds \
     --precision int8 \
     --mode inference \
     --framework tensorflow \
     --benchmark-only \
     --batch-size 1024 \
     --socket-id 0 \
     --docker-image tensorflow/tensorflow:latest-mkl \
     --in-graph /root/user/wide_deep_files/wide_deep_int8_pretrained_model.pb \
     --data-location /root/user/wide_deep_files/preprocessed_eval.tfrecords

The log file is saved to the value of --output-dir.

The tail of the log output when the benchmarking completes should look something like this:

--------------------------------------------------
Total test records           :  2000000
No of correct predicitons    :  1549508
Batch size is                :  1024
Number of batches            :  1954
Classification accuracy (%)  :  77.4754
Inference duration (seconds) :  1.9765
Latency (millisecond/batch)  :  0.000988
Throughput is (records/sec)  :  1151892.25
--------------------------------------------------
numactl --cpunodebind=0 --membind=0 python /workspace/intelai_models/int8/inference.py --input-graph=/in_graph/wide_deep_int8_pretrained_model.pb --inter-op-parallelism-threads=28 --intra-op-parallelism-threads=1 --omp-num-threads=1 --batch-size=1024 --kmp-blocktime=0 --datafile-path=/dataset
Ran inference with batch size 1024
Log location outside container:  {--output-dir value}/benchmark_wide_deep_large_ds_inference_int8_20190225_061815.log

FP32 Inference Instructions

Download large Kaggle Display Advertising Challenge Dataset from http://labs.criteo.com/2014/02/kaggle-display-advertising-challenge-dataset/
Pre-process the downloaded dataset to tfrecords using preprocess_csv_tfrecords.py
```
$ python3.6 preprocess_csv_tfrecords.py --csv-datafile eval.csv
```

Download and extract the pre-trained model.

$ wget https://storage.googleapis.com/intel-optimized-tensorflow/models/wide_deep_fp32_pretrained_model.pb

Clone the intelai/models repo.

This repo has the launch script for running benchmarks, which we will use in the next step.
```
$ git clone https://github.com/IntelAI/models.git
```

How to run benchmarks

Running benchmarks in latency mode, set --batch-size 1

$ cd /home/myuser/models/benchmarks

$ python launch_benchmark.py 
     --model-name wide_deep_large_ds \
     --precision fp32 \
     --mode inference \
     --framework tensorflow \
     --benchmark-only \
     --batch-size 1 \
     --socket-id 0 \
     --docker-image tensorflow/tensorflow:latest-mkl \
     --in-graph /root/user/wide_deep_files/wide_deep_fp32_pretrained_model.pb \
     --data-location /root/user/wide_deep_files/preprocessed_eval.tfrecords

Running benchmarks in throughput mode, set --batch-size 1024

$ cd /home/myuser/models/benchmarks

 $ python launch_benchmark.py 
     --model-name wide_deep_large_ds \
     --precision fp32 \
     --mode inference \
     --framework tensorflow \
     --benchmark-only \
     --batch-size 1024 \
     --socket-id 0 \
     --docker-image tensorflow/tensorflow:latest-mkl \
     --in-graph /root/user/wide_deep_files/wide_deep_fp32_pretrained_model.pb \
     --data-location /root/user/wide_deep_files/preprocessed_eval.tfrecords

The log file is saved to the value of --output-dir.

The tail of the log output when the benchmarking completes should look something like this:


--------------------------------------------------
Total test records           :  2000000
No of correct predicitons    :  1550447
Batch size is                :  1024
Number of batches            :  1954
Classification accuracy (%)  :  77.5223
Inference duration (seconds) :  3.4977
Latency (millisecond/batch)  :  0.001749
Throughput is (records/sec)  :  571802.228
--------------------------------------------------
numactl --cpunodebind=0 --membind=0 python /workspace/intelai_models/int8/inference.py --input-graph=/in_graph/wide_deep_fp32_pretrained_model.pb --inter-op-parallelism-threads=28 --intra-op-parallelism-threads=1 --omp-num-threads=1 --batch-size=1024 --kmp-blocktime=0 --datafile-path=/dataset
Ran inference with batch size 1024
Log location outside container: {--output-dir value}/benchmark_wide_deep_large_ds_inference_fp32_20190225_062206.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Wide & Deep

INT8 Inference Instructions

FP32 Inference Instructions

Files

README.md

Latest commit

History

README.md

File metadata and controls

Wide & Deep

INT8 Inference Instructions

FP32 Inference Instructions