microsoft · QuanluZhang · Nov 20, 2019 · Oct 29, 2019 · Oct 30, 2019 · Oct 30, 2019
diff --git a/docs/en_US/Compressor/Overview.md b/docs/en_US/Compressor/Overview.md
@@ -12,6 +12,7 @@ We have provided two naive compression algorithms and three popular ones for use
 |---|---|
 | [Level Pruner](./Pruner.md#level-pruner) | Pruning the specified ratio on each weight based on absolute values of weights |
 | [AGP Pruner](./Pruner.md#agp-pruner) | Automated gradual pruning (To prune, or not to prune: exploring the efficacy of pruning for model compression) [Reference Paper](https://arxiv.org/abs/1710.01878)|
+| [Lottery Ticket Pruner](./Pruner.md#agp-pruner) | The pruning process used by "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks". It prunes a model iteratively. [Reference Paper](https://arxiv.org/abs/1803.03635)|
 | [Naive Quantizer](./Quantizer.md#naive-quantizer) |  Quantize weights to default 8 bits |
 | [QAT Quantizer](./Quantizer.md#qat-quantizer) | Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. [Reference Paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf)|
 | [DoReFa Quantizer](./Quantizer.md#dorefa-quantizer) | DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. [Reference Paper](https://arxiv.org/abs/1606.06160)|

diff --git a/docs/en_US/Compressor/Pruner.md b/docs/en_US/Compressor/Pruner.md
@@ -92,3 +92,44 @@ You can view example for more information
 
 ***
 
+## Lottery Ticket Hypothesis
+[The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks](https://arxiv.org/abs/1803.03635), authors Jonathan Frankle and Michael Carbin,provides comprehensive measurement and analysis, and articulate the *lottery ticket hypothesis*: dense, randomly-initialized, feed-forward networks contain subnetworks (*winning tickets*) that -- when trained in isolation -- reach test accuracy comparable to the original network in a similar number of iterations.
+
+In this paper, the authors use the following process to prune a model, called *iterative prunning*:
+>1. Randomly initialize a neural network f(x;theta_0) (where theta_0 follows D_{theta}).
+>2. Train the network for j iterations, arriving at parameters theta_j.
+>3. Prune p% of the parameters in theta_j, creating a mask m.
+>4. Reset the remaining parameters to their values in theta_0, creating the winning ticket f(x;m*theta_0).
+>5. Repeat step 2, 3, and 4.
+
+If the configured final sparsity is P (e.g., 0.8) and there are n times iterative pruning, each iterative pruning prunes 1-(1-P)^(1/n) of the weights that survive the previous round.
+
+### Usage
+
+PyTorch code
+```python
+from nni.compression.torch import LotteryTicketPruner
+config_list = [{
+    'prune_iterations': 5,
+    'sparsity': 0.8,
+    'op_types': ['default']
+}]
+pruner = LotteryTicketPruner(model, config_list, optimizer)
+pruner.compress()
+for _ in pruner.get_prune_iterations():
+    pruner.prune_iteration_start()
+    for epoch in range(epoch_num):
+        ...
+```
+
+The above configuration means that there are 5 times of iterative pruning. As the 5 times iterative pruning are executed in the same run, LotteryTicketPruner needs `model` and `optimizer` (**Note that should add `lr_scheduler` if used**) to reset their states every time a new prune iteration starts. Please use `get_prune_iterations` to get the pruning iterations, and invoke `prune_iteration_start` at the beginning of each iteration. `epoch_num` is better to be large enough for model convergence, because the hypothesis is that the performance (accuracy) got in latter rounds with high sparsity could be comparable with that got in the first round.
+
+
+*Tensorflow version will be supported later.*
+
+#### User configuration for LotteryTicketPruner
+
+* **prune_iterations:** The number of rounds for the iterative pruning, i.e., the number of iterative pruning.
+* **sparsity:** The final sparsity when the compression is done.
+
+***
diff --git a/examples/model_compress/lottery_torch_mnist.py b/examples/model_compress/lottery_torch_mnist.py
@@ -0,0 +1,95 @@
+from nni.compression.torch import AGP_Pruner, LevelPruner, LotteryTicketPruner
+import torch
+import torch.nn.functional as F
+from torchvision import datasets, transforms
+
+
+class Mnist(torch.nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = torch.nn.Conv2d(1, 20, 5, 1)
+        self.conv2 = torch.nn.Conv2d(20, 50, 5, 1)
+        self.fc1 = torch.nn.Linear(4 * 4 * 50, 500)
+        self.fc2 = torch.nn.Linear(500, 10)
+
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        x = F.max_pool2d(x, 2, 2)
+        x = F.relu(self.conv2(x))
+        x = F.max_pool2d(x, 2, 2)
+        x = x.view(-1, 4 * 4 * 50)
+        x = F.relu(self.fc1(x))
+        x = self.fc2(x)
+        return F.log_softmax(x, dim=1)
+
+
+def train(model, device, train_loader, optimizer):
+    model.train()
+    for batch_idx, (data, target) in enumerate(train_loader):
+        data, target = data.to(device), target.to(device)
+        optimizer.zero_grad()
+        output = model(data)
+        loss = F.nll_loss(output, target)
+        loss.backward()
+        optimizer.step()
+        if batch_idx % 100 == 0:
+            print('{:2.0f}%  Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
+
+
+def test(model, device, test_loader):
+    model.eval()
+    test_loss = 0
+    correct = 0
+    with torch.no_grad():
+        for data, target in test_loader:
+            data, target = data.to(device), target.to(device)
+            output = model(data)
+            test_loss += F.nll_loss(output, target, reduction='sum').item()
+            pred = output.argmax(dim=1, keepdim=True)
+            correct += pred.eq(target.view_as(pred)).sum().item()
+    test_loss /= len(test_loader.dataset)
+
+    print('Loss: {}  Accuracy: {}%)\n'.format(
+        test_loss, 100 * correct / len(test_loader.dataset)))
+
+def apply_pruner(model, optimizer):
+    configure_list = [{
+        'prune_iterations': 5,
+        'sparsity': 0.8,
+        'op_types': ['default']
+    }]
+    pruner = LotteryTicketPruner(model, configure_list, optimizer)
+    pruner.compress()
+    return pruner
+
+def main():
+    torch.manual_seed(0)
+    device = torch.device('cpu')
+
+    trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
+    train_loader = torch.utils.data.DataLoader(
+        datasets.MNIST('data', train=True, download=True, transform=trans),
+        batch_size=64, shuffle=True)
+    test_loader = torch.utils.data.DataLoader(
+        datasets.MNIST('data', train=False, transform=trans),
+        batch_size=1000, shuffle=True)
+
+    model = Mnist()
+    model.to(device)
+
+    optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
+
+    pruner = apply_pruner(model, optimizer)
+
+    for _ in pruner.get_prune_iterations():
+        pruner.prune_iteration_start()
+        for epoch in range(5):
+            pruner.update_epoch(epoch)
+            print('# Epoch {} #'.format(epoch))
+            train(model, device, train_loader, optimizer)
+            test(model, device, test_loader)
+    pruner.export_model('model.pth', 'mask.pth', 'model.onnx', [1, 1, 28, 28])
+
+
+if __name__ == '__main__':
+    main()
diff --git a/examples/model_compress/lottery_torch_mnist_fc.py b/examples/model_compress/lottery_torch_mnist_fc.py
@@ -0,0 +1,83 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.utils.data
+import torchvision.datasets as datasets
+import torchvision.transforms as transforms
+from nni.compression.torch import LotteryTicketPruner
+
+class fc1(nn.Module):
+
+    def __init__(self, num_classes=10):
+        super(fc1, self).__init__()
+        self.classifier = nn.Sequential(
+            nn.Linear(28*28, 300),
+            nn.ReLU(inplace=True),
+            nn.Linear(300, 100),
+            nn.ReLU(inplace=True),
+            nn.Linear(100, num_classes),
+        )
+
+    def forward(self, x):
+        x = torch.flatten(x, 1)
+        x = self.classifier(x)
+        return x
+
+def train(model, train_loader, optimizer, criterion):
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    model.train()
+    for batch_idx, (imgs, targets) in enumerate(train_loader):
+        optimizer.zero_grad()
+        imgs, targets = imgs.to(device), targets.to(device)
+        output = model(imgs)
+        train_loss = criterion(output, targets)
+        train_loss.backward()
+        optimizer.step()
+    return train_loss.item()
+
+def test(model, test_loader, criterion):
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    model.eval()
+    test_loss = 0
+    correct = 0
+    with torch.no_grad():
+        for data, target in test_loader:
+            data, target = data.to(device), target.to(device)
+            output = model(data)
+            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
+            pred = output.data.max(1, keepdim=True)[1]  # get the index of the max log-probability
+            correct += pred.eq(target.data.view_as(pred)).sum().item()
+        test_loss /= len(test_loader.dataset)
+        accuracy = 100. * correct / len(test_loader.dataset)
+    return accuracy
+
+
+if __name__ == '__main__':
+    transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
+    traindataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
+    testdataset = datasets.MNIST('./data', train=False, transform=transform)
+    train_loader = torch.utils.data.DataLoader(traindataset, batch_size=60, shuffle=True, num_workers=0, drop_last=False)
+    test_loader = torch.utils.data.DataLoader(testdataset, batch_size=60, shuffle=False, num_workers=0, drop_last=True)
+
+    model = fc1().to("cuda" if torch.cuda.is_available() else "cpu")
+    optimizer = torch.optim.Adam(model.parameters(), lr=1.2e-3)
+    criterion = nn.CrossEntropyLoss()
+
+    configure_list = [{
+        'prune_iterations': 5,
+        'sparsity': 0.8,
+        'op_types': ['default']
+    }]
+    pruner = LotteryTicketPruner(model, configure_list, optimizer)
+    pruner.compress()
+
+    for i in pruner.get_prune_iterations():
+        pruner.prune_iteration_start()
+        loss = 0
+        accuracy = 0
+        for epoch in range(50):
+            loss = train(model, train_loader, optimizer, criterion)
+            accuracy = test(model, test_loader, criterion)
+            print('current epoch: {0}, loss: {1}, accuracy: {2}'.format(epoch, loss, accuracy))
+        print('prune iteration: {0}, loss: {1}, accuracy: {2}'.format(i, loss, accuracy))
+    pruner.export_model('model.pth', 'mask.pth')
diff --git a/src/sdk/pynni/nni/compression/torch/__init__.py b/src/sdk/pynni/nni/compression/torch/__init__.py
@@ -1,3 +1,4 @@
 from .compressor import LayerInfo, Compressor, Pruner, Quantizer
 from .builtin_pruners import *
 from .builtin_quantizers import *
+from .lottery_ticket import LotteryTicketPruner
diff --git a/src/sdk/pynni/nni/compression/torch/compressor.py b/src/sdk/pynni/nni/compression/torch/compressor.py
@@ -13,7 +13,6 @@ def __init__(self, name, module):
 
         self._forward = None
 
-
 class Compressor:
     """
     Abstract base PyTorch compressor
@@ -32,7 +31,21 @@ def __init__(self, model, config_list):
         """
         self.bound_model = model
         self.config_list = config_list
-        self.modules_to_compress = []
+        self.modules_to_compress = None
+
+    def detect_modules_to_compress(self):
+        """
+        detect all modules should be compressed, and save the result in `self.modules_to_compress`.
+        The model will be instrumented and user should never edit it after calling this method.
+        """
+        if self.modules_to_compress is None:
+            self.modules_to_compress = []
+            for name, module in self.bound_model.named_modules():
+                layer = LayerInfo(name, module)
+                config = self.select_config(layer)
+                if config is not None:
+                    self.modules_to_compress.append((layer, config))
+        return self.modules_to_compress
 
     def compress(self):
         """
@@ -41,26 +54,11 @@ def compress(self):
         The model will be instrumented and user should never edit it after calling this method.
         `self.modules_to_compress` records all the to-be-compressed layers
         """
-        for name, module in self.bound_model.named_modules():
-            layer = LayerInfo(name, module)
-            config = self.select_config(layer)
-            if config is not None:
-                self._instrument_layer(layer, config)
-                self.modules_to_compress.append((layer, config))
+        modules_to_compress = self.detect_modules_to_compress()
+        for layer, config in modules_to_compress:
+            self._instrument_layer(layer, config)
         return self.bound_model
 
-    def get_modules_to_compress(self):
-        """
-        To obtain all the to-be-compressed layers.
-
-        Returns
-        -------
-        self.modules_to_compress : list
-            a list of the layers, each of which is a tuple (`layer`, `config`),
-            `layer` is `LayerInfo`, `config` is a `dict`
-        """
-        return self.modules_to_compress
-
     def select_config(self, layer):
         """
         Find the configuration for `layer` by parsing `self.config_list`
@@ -204,6 +202,9 @@ def export_model(self, model_path, mask_path=None, onnx_path=None, input_shape=N
         input_shape : list or tuple
             input shape to onnx model
         """
+        if self.detect_modules_to_compress() and not self.mask_dict:
+            _logger.warning('You may not use self.mask_dict in base Pruner class to record masks')
+            print('Warning: You may not use self.mask_dict in base Pruner class to record masks')
         assert model_path is not None, 'model_path must be specified'
         for name, m in self.bound_model.named_modules():
             mask = self.mask_dict.get(name)