microsoft · chicm-ms · Oct 10, 2020 · Aug 6, 2019 · Sep 9, 2019 · Oct 8, 2019
diff --git a/docs/en_US/CommunitySharings/ModelCompressionComparison.md b/docs/en_US/CommunitySharings/ModelCompressionComparison.md
@@ -23,7 +23,7 @@ The experiments are performed with the following pruners/datasets/models:
 
     For the pruners with scheduling, `L1Filter Pruner` is used as the base algorithm. That is to say, after the sparsities distribution is decided by the scheduling algorithm, `L1Filter Pruner` is used to performn real pruning.
 
-    - All the pruners listed above are implemented in [nni](https://github.com/microsoft/nni/tree/master/docs/en_US/Compressor/Overview.md).
+    - All the pruners listed above are implemented in [nni](https://github.com/microsoft/nni/tree/master/docs/en_US/Compression/Overview.md).
 
 ## Experiment Result
 
@@ -60,7 +60,7 @@ From the experiment result, we get the following conclusions:
 
 * The experiment results are all collected with the default configuration of the pruners in nni, which means that when we call a pruner class in nni, we don't change any default class arguments.
 
-* Both FLOPs and the number of parameters are counted with [Model FLOPs/Parameters Counter](https://github.com/microsoft/nni/tree/master/docs/en_US/Compressor/CompressionUtils.md#model-flopsparameters-counter) after [model speed up](https://github.com/microsoft/nni/tree/master/docs/en_US/Compressor/ModelSpeedup.md).
+* Both FLOPs and the number of parameters are counted with [Model FLOPs/Parameters Counter](https://github.com/microsoft/nni/tree/master/docs/en_US/Compression/CompressionUtils.md#model-flopsparameters-counter) after [model speed up](https://github.com/microsoft/nni/tree/master/docs/en_US/Compression/ModelSpeedup.md).
 This avoids potential issues of counting them of masked models.
 
 * The experiment code can be found [here]( https://github.com/microsoft/nni/tree/master/examples/model_compress/auto_pruners_torch.py).

diff --git a/docs/en_US/Compressor/AutoCompression.md → ..._US/Compression/AutoPruningUsingTuners.md b/docs/en_US/Compressor/AutoCompression.md → ..._US/Compression/AutoPruningUsingTuners.md
@@ -1,6 +1,6 @@
-# Automatic Model Compression on NNI
+# Automatic Model Pruning using NNI Tuners
 
-It's convenient to implement auto model compression with NNI compression and NNI tuners
+It's convenient to implement auto model pruning with NNI compression and NNI tuners
 
 ## First, model compression with NNI
 

diff --git a/.../en_US/Compressor/CompressionReference.md → ...en_US/Compression/CompressionReference.md b/.../en_US/Compressor/CompressionReference.md → ...en_US/Compression/CompressionReference.md
diff --git a/docs/en_US/Compressor/CompressionUtils.md → docs/en_US/Compression/CompressionUtils.md b/docs/en_US/Compressor/CompressionUtils.md → docs/en_US/Compression/CompressionUtils.md
diff --git a/docs/en_US/Compressor/CustomizeCompressor.md → .../en_US/Compression/CustomizeCompressor.md b/docs/en_US/Compressor/CustomizeCompressor.md → .../en_US/Compression/CustomizeCompressor.md
@@ -6,7 +6,7 @@
 
 In order to simplify the process of writing new compression algorithms, we have designed simple and flexible programming interface, which covers pruning and quantization. Below, we first demonstrate how to customize a new pruning algorithm and then demonstrate how to customize a new quantization algorithm.
 
-**Important Note** To better understand how to customize new pruning/quantization algorithms, users should first understand the framework that supports various pruning algorithms in NNI. Reference [Framework overview of model compression](https://nni.readthedocs.io/en/latest/Compressor/Framework.html)
+**Important Note** To better understand how to customize new pruning/quantization algorithms, users should first understand the framework that supports various pruning algorithms in NNI. Reference [Framework overview of model compression](https://nni.readthedocs.io/en/latest/Compression/Framework.html)
 
 
 ## Customize a new pruning algorithm

diff --git a/docs/en_US/Compressor/DependencyAware.md → docs/en_US/Compression/DependencyAware.md b/docs/en_US/Compressor/DependencyAware.md → docs/en_US/Compression/DependencyAware.md
diff --git a/docs/en_US/Compressor/Framework.md → docs/en_US/Compression/Framework.md b/docs/en_US/Compressor/Framework.md → docs/en_US/Compression/Framework.md
diff --git a/docs/en_US/Compressor/ModelSpeedup.md → docs/en_US/Compression/ModelSpeedup.md b/docs/en_US/Compressor/ModelSpeedup.md → docs/en_US/Compression/ModelSpeedup.md
diff --git a/docs/en_US/Compressor/Overview.md → docs/en_US/Compression/Overview.md b/docs/en_US/Compressor/Overview.md → docs/en_US/Compression/Overview.md
@@ -27,20 +27,20 @@ Pruning algorithms compress the original network by removing redundant weights o
 
 |Name|Brief Introduction of Algorithm|
 |---|---|
-| [Level Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#level-pruner) | Pruning the specified ratio on each weight based on absolute values of weights |
-| [AGP Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#agp-pruner) | Automated gradual pruning (To prune, or not to prune: exploring the efficacy of pruning for model compression) [Reference Paper](https://arxiv.org/abs/1710.01878)|
-| [Lottery Ticket Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#lottery-ticket-hypothesis) | The pruning process used by "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks". It prunes a model iteratively. [Reference Paper](https://arxiv.org/abs/1803.03635)|
-| [FPGM Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#fpgm-pruner) | Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration [Reference Paper](https://arxiv.org/pdf/1811.00250.pdf)|
-| [L1Filter Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#l1filter-pruner) | Pruning filters with the smallest L1 norm of weights in convolution layers (Pruning Filters for Efficient Convnets) [Reference Paper](https://arxiv.org/abs/1608.08710) |
-| [L2Filter Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#l2filter-pruner) | Pruning filters with the smallest L2 norm of weights in convolution layers |
-| [ActivationAPoZRankFilterPruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#activationapozrankfilterpruner) | Pruning filters based on the metric APoZ (average percentage of zeros) which measures the percentage of zeros in activations of (convolutional) layers. [Reference Paper](https://arxiv.org/abs/1607.03250) |
-| [ActivationMeanRankFilterPruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#activationmeanrankfilterpruner) | Pruning filters based on the metric that calculates the smallest mean value of output activations |
-| [Slim Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#slim-pruner) | Pruning channels in convolution layers by pruning scaling factors in BN layers(Learning Efficient Convolutional Networks through Network Slimming) [Reference Paper](https://arxiv.org/abs/1708.06519) |
-| [TaylorFO Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#taylorfoweightfilterpruner) | Pruning filters based on the first order taylor expansion on weights(Importance Estimation for Neural Network Pruning) [Reference Paper](http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf) |
-| [ADMM Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#admm-pruner) | Pruning based on ADMM optimization technique [Reference Paper](https://arxiv.org/abs/1804.03294) |
-| [NetAdapt Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#netadapt-pruner) | Automatically simplify a pretrained network to meet the resource budget by iterative pruning  [Reference Paper](https://arxiv.org/abs/1804.03230) |
-| [SimulatedAnnealing Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#simulatedannealing-pruner) | Automatic pruning with a guided heuristic search method, Simulated Annealing algorithm [Reference Paper](https://arxiv.org/abs/1907.03141) |
-| [AutoCompress Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#autocompress-pruner) | Automatic pruning by iteratively call SimulatedAnnealing Pruner and ADMM Pruner [Reference Paper](https://arxiv.org/abs/1907.03141) |
+| [Level Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#level-pruner) | Pruning the specified ratio on each weight based on absolute values of weights |
+| [AGP Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#agp-pruner) | Automated gradual pruning (To prune, or not to prune: exploring the efficacy of pruning for model compression) [Reference Paper](https://arxiv.org/abs/1710.01878)|
+| [Lottery Ticket Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#lottery-ticket-hypothesis) | The pruning process used by "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks". It prunes a model iteratively. [Reference Paper](https://arxiv.org/abs/1803.03635)|
+| [FPGM Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#fpgm-pruner) | Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration [Reference Paper](https://arxiv.org/pdf/1811.00250.pdf)|
+| [L1Filter Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#l1filter-pruner) | Pruning filters with the smallest L1 norm of weights in convolution layers (Pruning Filters for Efficient Convnets) [Reference Paper](https://arxiv.org/abs/1608.08710) |
+| [L2Filter Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#l2filter-pruner) | Pruning filters with the smallest L2 norm of weights in convolution layers |
+| [ActivationAPoZRankFilterPruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#activationapozrankfilterpruner) | Pruning filters based on the metric APoZ (average percentage of zeros) which measures the percentage of zeros in activations of (convolutional) layers. [Reference Paper](https://arxiv.org/abs/1607.03250) |
+| [ActivationMeanRankFilterPruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#activationmeanrankfilterpruner) | Pruning filters based on the metric that calculates the smallest mean value of output activations |
+| [Slim Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#slim-pruner) | Pruning channels in convolution layers by pruning scaling factors in BN layers(Learning Efficient Convolutional Networks through Network Slimming) [Reference Paper](https://arxiv.org/abs/1708.06519) |
+| [TaylorFO Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#taylorfoweightfilterpruner) | Pruning filters based on the first order taylor expansion on weights(Importance Estimation for Neural Network Pruning) [Reference Paper](http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf) |
+| [ADMM Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#admm-pruner) | Pruning based on ADMM optimization technique [Reference Paper](https://arxiv.org/abs/1804.03294) |
+| [NetAdapt Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#netadapt-pruner) | Automatically simplify a pretrained network to meet the resource budget by iterative pruning  [Reference Paper](https://arxiv.org/abs/1804.03230) |
+| [SimulatedAnnealing Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#simulatedannealing-pruner) | Automatic pruning with a guided heuristic search method, Simulated Annealing algorithm [Reference Paper](https://arxiv.org/abs/1907.03141) |
+| [AutoCompress Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#autocompress-pruner) | Automatic pruning by iteratively call SimulatedAnnealing Pruner and ADMM Pruner [Reference Paper](https://arxiv.org/abs/1907.03141) |
 
 You can refer to this [benchmark](https://github.com/microsoft/nni/tree/master/docs/en_US/CommunitySharings/ModelCompressionComparison.md) for the performance of these pruners on some benchmark problems.
 
@@ -50,14 +50,14 @@ Quantization algorithms compress the original network by reducing the number of
 
 |Name|Brief Introduction of Algorithm|
 |---|---|
-| [Naive Quantizer](https://nni.readthedocs.io/en/latest/Compressor/Quantizer.html#naive-quantizer) |  Quantize weights to default 8 bits |
-| [QAT Quantizer](https://nni.readthedocs.io/en/latest/Compressor/Quantizer.html#qat-quantizer) | Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. [Reference Paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf)|
-| [DoReFa Quantizer](https://nni.readthedocs.io/en/latest/Compressor/Quantizer.html#dorefa-quantizer) | DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. [Reference Paper](https://arxiv.org/abs/1606.06160)|
-| [BNN Quantizer](https://nni.readthedocs.io/en/latest/Compressor/Quantizer.html#bnn-quantizer) | Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. [Reference Paper](https://arxiv.org/abs/1602.02830)|
+| [Naive Quantizer](https://nni.readthedocs.io/en/latest/Compression/Quantizer.html#naive-quantizer) |  Quantize weights to default 8 bits |
+| [QAT Quantizer](https://nni.readthedocs.io/en/latest/Compression/Quantizer.html#qat-quantizer) | Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. [Reference Paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf)|
+| [DoReFa Quantizer](https://nni.readthedocs.io/en/latest/Compression/Quantizer.html#dorefa-quantizer) | DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. [Reference Paper](https://arxiv.org/abs/1606.06160)|
+| [BNN Quantizer](https://nni.readthedocs.io/en/latest/Compression/Quantizer.html#bnn-quantizer) | Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. [Reference Paper](https://arxiv.org/abs/1602.02830)|
 
 ## Automatic Model Compression
 
-Given targeted compression ratio, it is pretty hard to obtain the best compressed ratio in a one shot manner. An automatic model compression algorithm usually need to explore the compression space by compressing different layers with different sparsities. NNI provides such algorithms to free users from specifying sparsity of each layer in a model. Moreover, users could leverage NNI's auto tuning power to automatically compress a model. Detailed document can be found [here](./AutoCompression.md).
+Given targeted compression ratio, it is pretty hard to obtain the best compressed ratio in a one shot manner. An automatic model compression algorithm usually need to explore the compression space by compressing different layers with different sparsities. NNI provides such algorithms to free users from specifying sparsity of each layer in a model. Moreover, users could leverage NNI's auto tuning power to automatically compress a model. Detailed document can be found [here](./AutoPruningUsingTuners.md).
 
 ## Model Speedup
 

diff --git a/docs/en_US/Compressor/Pruner.md → docs/en_US/Compression/Pruner.md b/docs/en_US/Compressor/Pruner.md → docs/en_US/Compression/Pruner.md
diff --git a/docs/en_US/Compressor/Quantizer.md → docs/en_US/Compression/Quantizer.md b/docs/en_US/Compressor/Quantizer.md → docs/en_US/Compression/Quantizer.md
diff --git a/docs/en_US/Compressor/QuickStart.md → docs/en_US/Compression/QuickStart.md b/docs/en_US/Compressor/QuickStart.md → docs/en_US/Compression/QuickStart.md
@@ -8,7 +8,7 @@ In this tutorial, we use the [first section](#quick-start-to-compress-a-model) t
 
 ## Quick Start to Compress a Model
 
-NNI provides very simple APIs for compressing a model. The compression includes pruning algorithms and quantization algorithms. The usage of them are the same, thus, here we use [slim pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#slim-pruner) as an example to show the usage.
+NNI provides very simple APIs for compressing a model. The compression includes pruning algorithms and quantization algorithms. The usage of them are the same, thus, here we use [slim pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#slim-pruner) as an example to show the usage.
 
 ### Write configuration
 
@@ -175,7 +175,7 @@ In this example, 'op_names' is the name of layer and four layers will be quantiz
 
 ### APIs for Updating Fine Tuning Status
 
-Some compression algorithms use epochs to control the progress of compression (e.g. [AGP](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#agp-pruner)), and some algorithms need to do something after every minibatch. Therefore, we provide another two APIs for users to invoke: `pruner.update_epoch(epoch)` and `pruner.step()`.
+Some compression algorithms use epochs to control the progress of compression (e.g. [AGP](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#agp-pruner)), and some algorithms need to do something after every minibatch. Therefore, we provide another two APIs for users to invoke: `pruner.update_epoch(epoch)` and `pruner.step()`.
 
 `update_epoch` should be invoked in every epoch, while `step` should be invoked after each minibatch. Note that most algorithms do not require calling the two APIs. Please refer to each algorithm's document for details. For the algorithms that do not need them, calling them is allowed but has no effect.
 

diff --git a/docs/en_US/Compression/pruning.rst b/docs/en_US/Compression/pruning.rst
@@ -0,0 +1,26 @@
+#################
+Pruning
+#################
+
+A common technique to increase sparsity in neural network model weights and activations is pruning.
+The pruning methods explore the redundancy in the model weights(parameters) and try to remove/prune the redundant and uncritical weights.
+The redundant elements are pruned from the model, their values are zeroed and we make sure they don't take part in the back-propagation process.
+
+From pruning granularity perspective, fine-grained pruning or unstructured pruning refers to pruning each individual weights seperately.
+Coarse-grained pruning or structured pruning is pruning entire group of weights, such as convolutional filter.
+
+NNI provides multiple unstructured pruning and structured pruning algorithms.
+It supports Tensorflow and PyTorch with unified interface.
+For users to prune their models, they only need to add several lines in their code.
+For the structured filter pruning, NNI also provides a dependency-aware mode. In the dependency-aware mode, the
+filter pruner will get better speed gain after the speedup.
+
+For details, please refer to the following tutorials:
+
+..  toctree::
+    :maxdepth: 2
+
+    Pruners <Pruner>
+    Dependency Aware Mode <DependencyAware>
+    Model Speedup <ModelSpeedup>
+    Automatic Model Pruning with NNI Tuners <AutoPruningUsingTuners>
diff --git a/docs/en_US/Compression/quantization.rst b/docs/en_US/Compression/quantization.rst
@@ -0,0 +1,17 @@
+#################
+Quantization
+#################
+
+Quantization refers to compressing models by reducing the number of bits required to represent weights or activations,
+which can reduce the computations and the inference time. In the context of deep neural networks, the major numerical
+format for model weights is 32-bit float, or FP32. Many research works have demonstrated that weights and activations
+can be represented using 8-bit integers without significant loss in accuracy. Even lower bit-widths, such as 4/2/1 bits,
+is an active field of research.
+
+A quntizer is a quantization algorithm implementation in NNI, NNI provides multiple quntizers as below. You can also
+create your own quntizer using NNI model compression interface.
+
+..  toctree::
+    :maxdepth: 2
+
+    Quntizers <Quantizer>
diff --git a/docs/en_US/FeatureEngineering/Overview.md b/docs/en_US/FeatureEngineering/Overview.md
@@ -267,5 +267,5 @@ The code could be refenrence `/examples/feature_engineering/gradient_feature_sel
 * To [report a bug](https://github.com/microsoft/nni/issues/new?template=bug-report.md) for this feature in GitHub;
 * To [file a feature or improvement request](https://github.com/microsoft/nni/issues/new?template=enhancement.md) for this feature in GitHub;
 * To know more about [Neural Architecture Search with NNI](https://github.com/microsoft/nni/blob/master/docs/en_US/NAS/Overview.md);
-* To know more about [Model Compression with NNI](https://github.com/microsoft/nni/blob/master/docs/en_US/Compressor/Overview.md);
+* To know more about [Model Compression with NNI](https://github.com/microsoft/nni/blob/master/docs/en_US/Compression/Overview.md);
 * To know more about [Hyperparameter Tuning with NNI](https://github.com/microsoft/nni/blob/master/docs/en_US/Tuner/BuiltinTuner.md);