The aim of this laboratory is to check the work of Apache TVM model fine-tuning feature.
-
Go through the Auto-tuning a Convolutional Network for x86 CPU.
-
Go fine_tuning_experiments script and check what it does.
-
The class
TVMFineTunedModel
inherits following methods from TVMModel from l04_tvm assignment:preprocess_input
,postprocess_outputs
,prepare_model
,run_inference
.
-
Use the implemented
TVMModel
from L04 assignment or implement above-mentioned methods. -
[5pt]
Implementtune_kernels
method:- Use
get_tuner
method for each task from tasks, - Use
len(task.config_space)
asn_trial
fortuner.tune
- Use
measure_option
in tuning method, - Use
autotvm.callback.progress_bar
andautotvm.callback.log_to_file
callbacks (useself.optlogpath
as log path). - Add early stopping after 20% of
n_trial
trials with no improvement.
- Use
-
[6pt]
Implementtune_graph
method:- Focus on
nn.conv2d
operator only (userelay.op.get
). - Use
PBQPTuner
as tuning executor. - Use
mod
,self.input_name
,self.input_shape
,self.optlogpath
,self.target
variables fromtune_kernels
andoptimize_model
to set up the executor. - Use
benchmark_layout_transform
for setting up benchmarks (usemin_exec_num
5). - Run the executor.
- Save tuning results to
self.graphoptlogpath
file.
- Focus on
-
[4pt]
Finish the implementation of model fine-tuning:- Extract tasks from the initially optimized module using
autotvm.task.extract_from_program
(focus onnn.conv2d
operator only). - Tune kernels using
tune_kernels
method. - Tune the whole graph using
tune_graph
method. - Compile and save the model library - use
autotvm.apply_graph_best
method to load log fromself.graphoptlogpath
.
- Extract tasks from the initially optimized module using
-
Run benchmarks using:
python3 -m dl_in_iot_course.l06_tvm_fine_tuning.fine_tuning_experiments \ --fp32-model-path models/pet-dataset-tensorflow.fp32.tflite \ --dataset-root build/pet-dataset/ \ --results-path build/fine-tuning-results
NOTE:
The fine-tuning takes quite a long time. -
[2pt]
In directory for assignment's summary, include:- Kernel log file (
pet-dataset-tensorflow.fp32.tvm-tune.kernellog
), - Graph log file (
pet-dataset-tensorflow.fp32.tvm-tune.graphlog
).
- Kernel log file (
-
Write a very brief summary:
[1pt]
Compare the inference time between the fine-tuned model and FP32 model with NCHW layout with opt level 3 andllvm -mcpu=core-avx2
target.
At least a very slight improvement should be observed.
There should be no need for additional imports. The blocks of code to implement (3 blocks) should take at most around 20 lines.
Additional factors:
[2pt]
Git history quality