Skip to content

Latest commit

 

History

History
138 lines (101 loc) · 9.11 KB

File metadata and controls

138 lines (101 loc) · 9.11 KB

Optimization Algorithms - Pruning and Clustering

The aim of this laboratory is to demonstrate the pruning and clustering of deep learning models. For this task, we will use TensorFlow Model Optimization Toolkit and Neural Network Intelligence (NNI) Toolkit.

Unstructured pruning and clustering with TFMOT

Tasks

  • Go over the pruning_clustering_experiments script and check what it does (especially check the optimize_model method that prepares structures for model optimization and fine-tuning).

  • In the end of the afore-mentioned optimize_model function there is a compress_and_fine_tune method that will need to be completed in the following tasks - it will handle model optimization and fine-tuning

  • [2pt] Finish the implementation of the TFMOTOptimizedModel class:

    • This class is supposed to firstly optimize and fine-tune the Keras model using TensorFlow Optimization Toolkit, and then compile it to TensorFlow Lite in FP32 mode for inference purposes.

    • Implement the prepare_model, preprocess_input, run_inference and postprocess_outputs methods.

      NOTE: If you completed the l02_quantization tasks, just set the parent class of the TFMOTOptimizedModel to FP32Model from dl_in_iot_course.l02_quantization.quantization_experiments - all methods for preparing a model, running inference and processing inputs and outputs will be reused.

  • [4pt] Finish the implementation of the model clustering in ClusteredModel class:

    • Load the Keras model.
    • Create a clustered model - use a proper tfmot method for adding clustering to model, use self.num_clusters to set the number of clusters, use linear centroid initialization.
    • Fine-tune the model - compile and fit the model using self.optimizer, self.loss, self.metrics, self.traindataset, self.epochs, self.validdataset objects created in optimize_model method.
    • Convert the model to FP32 TFLite model and save to self.modelpath.
  • [4pt] Finish the implementation of the model clustering in PrunedModel class:

    • The compress_and_fine_tune method comes with number of epochs and a simple pruning schedule.
    • Load the Keras model.
    • Create a pruned model - use the prune_low_magnitude pruning along with the sched schedule.
    • Fine-tune the model - compile and fit the model using self.optimizer, self.loss, self.metrics, self.traindataset, self.epochs, self.validdataset objects created in optimize_model method.
    • Remember to set up proper callbacks for pruning.
  • In the main script, uncomment all already supported classes and run it (it will definitely take some time):

    python3 -m dl_in_iot_course.l03_pruning_clustering.pruning_clustering_experiments \
          --model-path models/pet-dataset-tensorflow.h5 \
          --dataset-root build/pet-dataset/ \
          --results-path build/results
    

    In the build/results directory, the script will create:

    • <prefix>-metrics.md file - contains basic metrics, such as accuracy, precision, sensitivity or G-Mean, along with inference time

    • <prefix>-confusion-matrix.png file - contains visualization of confusion matrix for the model evaluation. Those files will be created for:

    • clustered-<num_clusters>-fp32 - the clustered model with num_clusters clusters.

    • pruned-<sparsity>-fp32 - the pruned model.

    NOTE: To download the dataset, add --download-dataset flag. The dataset downloaded from the previous task can be reused.

    NOTE: If the evaluation takes too long, reduce the test dataset size by setting --test-dataset-fraction to some lower value, but inform about this in the Summary note.

  • Write a small summary for experiments containing:

    • Performance and quality data for each experiment:

      • The computed metrics,
      • Confusion matrix.
    • [2pt] Write the deflation percentage along with the size of the compressed models with ZIP tool. In the directory containing .tflite files, you can run (check sharkdp/fd for details):

      fd -t f -e tflite -x zip {.}.zip {}
      

      NOTE: in Singularity/Docker environments the fd tool is named fdfind.

    • Answers for the questions:

      • [1pt] In their best variants, how do quantization, pruning and clustering compare to each other (both performance- and quality-wise)?
      • [1pt] How do the sizes of compressed models compare to the size of the TFLite FP32 model (how many times is each of the models smaller than the uncompressed FP32 solution).
      • [1pt] What does deflation means in ZIP tool and how does it correspond to the observed model optimizations?
      • [1pt] Which of the compression methods gives the smallest and best-performing model (is there a dominating solution on both factors)?

    The summary should be put in the project's summaries directory - follow the README.md in this directory for details.

NOTE: each of the required code snippets should take around 30 lines, but in general less

Additional factors:

  • [2pt] Git history quality

Structured pruning with NNI

In this task we will work on a much simpler model trained for Fashion MNIST dataset. Since NNI supports structured pruning mainly in PyTorch, we will switch to this framework for this task.

Tasks

  • Read the structured_pruning script thoroughly.

  • NOTE: You DO NOT NEED to train the model, use the existing model fashion-mnist-classifier.pth from models directory.

  • [1pt] Create a traced optimizer (check nni.trace method), use Adam optimizer with TRAINING_LEARNING_RATE (as in training optimizer).

  • [1pt] Formulate the configuration list for the ActivationAPoZRank pruner - we want to prune both 2D convolutions and linear layers.

  • [1pt] What is more, NNI by default prunes ALL layers of given type, even the output ones - EXCLUDE the final linear layer from pruning schedule.

  • [1pt] Define ActivationAPoZRank pruner using the model, defined configuration list, trainer method, traced optimizer and criterion. Set training_batches to 1000. I highly recommend using legacy version of the pruner (documented in ActivationAPoZRank v2.8

  • [1pt] The pruner.compress() method will compute the pruning mask, and the additional prints will show the pruning status - Please include logs from terminal for those printouts

  • [2pt] Speedup the model using ModelSpeedup.speedup_model function. The expected dummy input for the network should have 1x1x28x28 shape (use torch.randn(shape).to(model.device)).

  • [3pt] Fine-tune the model:

    • Define the optimizer (use Adam with FINETUNE_LEARNING_RATE)
    • Train the model using model.train_model for FINE_TUNE_EPOCHS epochs.
  • [2pt] Implement convert_to_onnx method to save the model to ONNX format.

  • [2pt] Implement convert_onnx_to_tflite method to convert the ONNX model to FP32 TensorFlow Lite model.

  • [1pt] Use Netron tool to visualize the network

  • In summary:

    • [1pt] Include the shape of the model before pruning (script logs provide those, as well as other data), along with its accuracy and inference time
    • [1pt] Include the pruning logs collected around pruning.compress()
    • [1pt] Include the shape of the model after pruning, along with its accuracy and inference time before fine-tuning
    • [1pt] Include the accuracy and inference time of the model after fine-tuning
    • [1pt] Include the fine-tuned PyTorch model in the summary directory (use Git LFS, check git lfs install, git lfs track commands, you need to apply them before adding the file and committing it)
    • [1pt] Include the ONNX file with pruned PyTorch model (use Git LFS)
    • [1pt] Include the TFLite file with pruned PyTorch model (use Git LFS)
    • [1pt] In the summary, include the visualization of the graph using Netron

The command should be executed more or less as follows:

python3 -m dl_in_iot_course.l03_pruning_clustering.structured_pruning_experiments \
    --input-model models/fashion-mnist-classifier.pth \
    --backup-model backup-model.pth \
    --final-model final-model.pth \
    --dataset-path fashion-dataset \
    --onnx-model model.onnx \
    --tflite-model model.tflite

Additional factors:

  • [2pt] Git history quality

Resources