This document has instructions for how to run Wide & Deep benchmark for the following modes/precisions:
Benchmarking instructions and scripts for model training coming later.
-
Download large Kaggle Display Advertising Challenge Dataset from http://labs.criteo.com/2014/02/kaggle-display-advertising-challenge-dataset/
-
Pre-process the downloaded dataset to tfrecords using preprocess_csv_tfrecords.py
$ python3.6 preprocess_csv_tfrecords.py --csv-datafile eval.csv
-
Download and extract the pre-trained model.
$ wget https://storage.googleapis.com/intel-optimized-tensorflow/models/wide_deep_int8_pretrained_model.pb
-
Clone the intelai/models repo.
This repo has the launch script for running benchmarks, which we will use in the next step.
$ git clone https://github.com/IntelAI/models.git
-
How to run benchmarks
- Running benchmarks in latency mode, set
--batch-size 1
$ cd /home/myuser/models/benchmarks $ python launch_benchmark.py --model-name wide_deep_large_ds \ --precision int8 \ --mode inference \ --framework tensorflow \ --benchmark-only \ --batch-size 1 \ --socket-id 0 \ --docker-image tensorflow/tensorflow:latest-mkl \ --in-graph /root/user/wide_deep_files/wide_deep_int8_pretrained_model.pb \ --data-location /root/user/wide_deep_files/preprocessed_eval.tfrecords
- Running benchmarks in throughput mode, set
--batch-size 1024
$ cd /home/myuser/models/benchmarks $ python launch_benchmark.py --model-name wide_deep_large_ds \ --precision int8 \ --mode inference \ --framework tensorflow \ --benchmark-only \ --batch-size 1024 \ --socket-id 0 \ --docker-image tensorflow/tensorflow:latest-mkl \ --in-graph /root/user/wide_deep_files/wide_deep_int8_pretrained_model.pb \ --data-location /root/user/wide_deep_files/preprocessed_eval.tfrecords
- Running benchmarks in latency mode, set
-
The log file is saved to the value of
--output-dir
.The tail of the log output when the benchmarking completes should look something like this:
-------------------------------------------------- Total test records : 2000000 No of correct predicitons : 1549508 Batch size is : 1024 Number of batches : 1954 Classification accuracy (%) : 77.4754 Inference duration (seconds) : 1.9765 Latency (millisecond/batch) : 0.000988 Throughput is (records/sec) : 1151892.25 -------------------------------------------------- numactl --cpunodebind=0 --membind=0 python /workspace/intelai_models/int8/inference.py --input-graph=/in_graph/wide_deep_int8_pretrained_model.pb --inter-op-parallelism-threads=28 --intra-op-parallelism-threads=1 --omp-num-threads=1 --batch-size=1024 --kmp-blocktime=0 --datafile-path=/dataset Ran inference with batch size 1024 Log location outside container: {--output-dir value}/benchmark_wide_deep_large_ds_inference_int8_20190225_061815.log
-
Download large Kaggle Display Advertising Challenge Dataset from http://labs.criteo.com/2014/02/kaggle-display-advertising-challenge-dataset/
-
Pre-process the downloaded dataset to tfrecords using preprocess_csv_tfrecords.py
$ python3.6 preprocess_csv_tfrecords.py --csv-datafile eval.csv
-
Download and extract the pre-trained model.
$ wget https://storage.googleapis.com/intel-optimized-tensorflow/models/wide_deep_fp32_pretrained_model.pb
-
Clone the intelai/models repo.
This repo has the launch script for running benchmarks, which we will use in the next step.
$ git clone https://github.com/IntelAI/models.git
-
How to run benchmarks
- Running benchmarks in latency mode, set
--batch-size 1
$ cd /home/myuser/models/benchmarks $ python launch_benchmark.py --model-name wide_deep_large_ds \ --precision fp32 \ --mode inference \ --framework tensorflow \ --benchmark-only \ --batch-size 1 \ --socket-id 0 \ --docker-image tensorflow/tensorflow:latest-mkl \ --in-graph /root/user/wide_deep_files/wide_deep_fp32_pretrained_model.pb \ --data-location /root/user/wide_deep_files/preprocessed_eval.tfrecords
- Running benchmarks in throughput mode, set
--batch-size 1024
$ cd /home/myuser/models/benchmarks $ python launch_benchmark.py --model-name wide_deep_large_ds \ --precision fp32 \ --mode inference \ --framework tensorflow \ --benchmark-only \ --batch-size 1024 \ --socket-id 0 \ --docker-image tensorflow/tensorflow:latest-mkl \ --in-graph /root/user/wide_deep_files/wide_deep_fp32_pretrained_model.pb \ --data-location /root/user/wide_deep_files/preprocessed_eval.tfrecords
- Running benchmarks in latency mode, set
-
The log file is saved to the value of
--output-dir
.The tail of the log output when the benchmarking completes should look something like this:
-------------------------------------------------- Total test records : 2000000 No of correct predicitons : 1550447 Batch size is : 1024 Number of batches : 1954 Classification accuracy (%) : 77.5223 Inference duration (seconds) : 3.4977 Latency (millisecond/batch) : 0.001749 Throughput is (records/sec) : 571802.228 -------------------------------------------------- numactl --cpunodebind=0 --membind=0 python /workspace/intelai_models/int8/inference.py --input-graph=/in_graph/wide_deep_fp32_pretrained_model.pb --inter-op-parallelism-threads=28 --intra-op-parallelism-threads=1 --omp-num-threads=1 --batch-size=1024 --kmp-blocktime=0 --datafile-path=/dataset Ran inference with batch size 1024 Log location outside container: {--output-dir value}/benchmark_wide_deep_large_ds_inference_fp32_20190225_062206.log