microsoft · leckie-chn · Jan 7, 2019 · Oct 9, 2018 · Oct 11, 2018 · Oct 11, 2018
diff --git a/docs/AdvancedNAS.md b/docs/AdvancedNAS.md
@@ -0,0 +1,111 @@
+# Tutorial for Advanced Neural Architecture Search
+Currently many of the NAS algorithms leverage the technique of **weight sharing** among trials to accelerate its training process. For example, [ENAS][1] delivers 1000x effiency with '_parameter sharing between child models_', compared with the previous [NASNet][2] algorithm. Other NAS algorithms such as [DARTS][3], [Network Morphism][4], and [Evolution][5] is also leveraging, or has the potential to leverage weight sharing.
+This is a tutorial on how to enable weight sharing in NNI. The example we use is based on the example of [Neural Architecture Search for Reading Comprehension](../examples/trials/ga_squad/), and is placed [here](../examples/trials/weight_sharing/ga_squad).
+
+## Weight Sharing among trials
+Currently we recommend sharing weights through NFS (Network File System), which supports sharing files across machines, and is light-weighted, (relatively) efficient. We also welcome contributions from the community on more efficient techniques.
+
+### NFS Setup
+In NFS, files are physically stored on a server machine, and trials on the client machine can read/write those files in the same way that they access local files.
+
+#### Install NFS on server machine
+First, install NFS server:
+```bash
+sudo apt-get install nfs-kernel-server
+```
+Suppose `/tmp/nni/shared` is used as the physical storage, then run:
+```bash
+sudo mkdir -p /tmp/nni/shared
+sudo echo "/tmp/nni/shared *(rw,sync,no_subtree_check,no_root_squash)" >> /etc/exports
+sudo service nfs-kernel-server restart
+```
+You can check if the above directory is successfully exported by NFS using `sudo showmount -e localhost`
+
+#### Install NFS on client machine
+First, install NFS client:
+```bash
+sudo apt-get install nfs-common
+```
+Then create & mount the mounted directory of shared files:
+```bash
+sudo mkdir -p /mnt/nfs/nni/
+sudo mount -t nfs 10.10.10.10:/tmp/nni/shared /mnt/nfs/nni
+```
+where `10.10.10.10` should be replaced by the real IP of NFS server machine in practice.
+
+### Example code for trial
+In our example, we assign each layer a `hash_id` to identify whether a previously trained model weight is sharable,and construct the tensorflow graph using `hash_id` as the variable scope name:
+```python
+with tf.variable_scope(p_graph.layers[i].hash_id, reuse=tf.AUTO_REUSE):
+    # generate tensorflow operators for p_graph.layers[i]
+    ...
+```
+With hashes of all the sharable layer fed as `shared_id` hyper parameter, we can automatically initialize all the sharable layer from the previous trained model:
+```python
+tf.init_from_checkpoint(param['restore_path'], dict(zip(param['shared_id'], param['shared_id'])))
+```
+Where `param` is retrieved from customized tuner with `nni.get_next_parameter()`. An example configuration is shown as follows:
+```json
+{
+    "shared_id": [
+        "4a11b2ef9cb7211590dfe81039b27670",
+        "370af04de24985e5ea5b3d72b12644c9",
+        "11f646e9f650f5f3fedc12b6349ec60f",
+        "0604e5350b9c734dd2d770ee877cfb26",
+        "6dbeb8b022083396acb721267335f228",
+        "ba55380d6c84f5caeb87155d1c5fa654"
+    ],
+    "graph": {
+        "layers": [
+            ...
+            {
+                "hash_id": "ba55380d6c84f5caeb87155d1c5fa654",
+                "is_delete": false,
+                "size": "x",
+                "graph_type": 0,
+                "output": [
+                    6
+                ],
+                "output_size": 1,
+                "input": [
+                    7,
+                    1
+                ],
+                "input_size": 2
+            },
+            ...
+        ]
+    },
+    "restore_dir": "/mnt/nfs/nni/ga_squad/87",
+    "save_dir": "/mnt/nfs/nni/ga_squad/95"
+}
+```
+
+### Tuner customization for sharing policy
+We recommend implementing sharing policy for customized tuner through the calculation of `Layer.hash_id`. In our example, a layer is sharable iff. the configurations of the layer itself and all its previous layers are not changed. For details, see `Layer.update_hash` and `Graph.update_hash` function in [graph.py](../examples/tuners/weight_sharing/ga_customer_tuner/graph.py)
+
+
+## Asynchornous Dispatcher Mode for trial dependency control
+The feature of weight sharing enables trials from different machines, in which most of the time **read after write** consistency must be assured. After all, the child model should not load parent model before parent trial finishes training. To deal with this, users can enable **asynchronous dispatcher mode** with `multiThread: true` in `config.yml` in NNI, where the dispatcher assign a tuner thread each time a `NEW_TRIAL` request comes in, and the tuner thread can decide when to submit a new trial by blocking and unblocking the thread itself. For example:
+```python
+    def generate_parameters(self, parameter_id):
+        self.thread_lock.acquire()
+        indiv = # configuration for a new trial
+        self.events[parameter_id] = threading.Event()
+        self.thread_lock.release()
+        if indiv.parent_id is not None:
+            self.events[indiv.parent_id].wait()
+
+    def receive_trial_result(self, parameter_id, parameters, reward):
+        self.thread_lock.acquire()
+        # code for processing trial results
+        self.thread_lock.release()
+        self.events[parameter_id].set()
+```
+
+
+[1]: https://arxiv.org/abs/1802.03268
+[2]: https://arxiv.org/abs/1707.07012
+[3]: https://arxiv.org/abs/1806.09055
+[4]: https://arxiv.org/abs/1806.10282
+[5]: https://arxiv.org/abs/1703.01041 
diff --git a/examples/trials/ga_squad/trial.py b/examples/trials/ga_squad/trial.py
@@ -338,7 +338,7 @@ def train_with_graph(graph, qp_pairs, dev_qp_pairs):
                 answers = generate_predict_json(
                     position1, position2, ids, contexts)
                 if save_path is not None:
-                    with open(save_path + 'epoch%d.prediction' % epoch, 'w') as file:
+                    with open(os.path.join(save_path, 'epoch%d.prediction' % epoch), 'w') as file:
                         json.dump(answers, file)
                 else:
                     answers = json.dumps(answers)
@@ -359,8 +359,8 @@ def train_with_graph(graph, qp_pairs, dev_qp_pairs):
                     bestacc = acc
 
                     if save_path is not None:
-                        saver.save(sess, save_path + 'epoch%d.model' % epoch)
-                        with open(save_path + 'epoch%d.score' % epoch, 'wb') as file:
+                        saver.save(os.path.join(sess, save_path + 'epoch%d.model' % epoch))
+                        with open(os.path.join(save_path, 'epoch%d.score' % epoch), 'wb') as file:
                             pickle.dump(
                                 (position1, position2, ids, contexts), file)
                 logger.debug('epoch %d acc %g bestacc %g' %

diff --git a/examples/trials/weight_sharing/ga_squad/attention.py b/examples/trials/weight_sharing/ga_squad/attention.py
@@ -0,0 +1,171 @@
+# Copyright (c) Microsoft Corporation
+# All rights reserved.
+#
+# MIT License
+#
+# Permission is hereby granted, free of charge,
+# to any person obtaining a copy of this software and associated
+# documentation files (the "Software"),
+# to deal in the Software without restriction, including without limitation
+# the rights to use, copy, modify, merge, publish, distribute, sublicense,
+# and/or sell copies of the Software, and
+# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+# The above copyright notice and this permission notice shall be included
+# in all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
+# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
+# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+import math
+
+import tensorflow as tf
+from tensorflow.python.ops.rnn_cell_impl import RNNCell
+
+
+def _get_variable(variable_dict, name, shape, initializer=None, dtype=tf.float32):
+    if name not in variable_dict:
+        variable_dict[name] = tf.get_variable(
+            name=name, shape=shape, initializer=initializer, dtype=dtype)
+    return variable_dict[name]
+
+
+class DotAttention:
+    '''
+    DotAttention
+    '''
+
+    def __init__(self, name,
+                 hidden_dim,
+                 is_vanilla=True,
+                 is_identity_transform=False,
+                 need_padding=False):
+        self._name = '/'.join([name, 'dot_att'])
+        self._hidden_dim = hidden_dim
+        self._is_identity_transform = is_identity_transform
+        self._need_padding = need_padding
+        self._is_vanilla = is_vanilla
+        self._var = {}
+
+    @property
+    def is_identity_transform(self):
+        return self._is_identity_transform
+
+    @property
+    def is_vanilla(self):
+        return self._is_vanilla
+
+    @property
+    def need_padding(self):
+        return self._need_padding
+
+    @property
+    def hidden_dim(self):
+        return self._hidden_dim
+
+    @property
+    def name(self):
+        return self._name
+
+    @property
+    def var(self):
+        return self._var
+
+    def _get_var(self, name, shape, initializer=None):
+        with tf.variable_scope(self.name):
+            return _get_variable(self.var, name, shape, initializer)
+
+    def _define_params(self, src_dim, tgt_dim):
+        hidden_dim = self.hidden_dim
+        self._get_var('W', [src_dim, hidden_dim])
+        if not self.is_vanilla:
+            self._get_var('V', [src_dim, hidden_dim])
+            if self.need_padding:
+                self._get_var('V_s', [src_dim, src_dim])
+                self._get_var('V_t', [tgt_dim, tgt_dim])
+            if not self.is_identity_transform:
+                self._get_var('T', [tgt_dim, src_dim])
+        self._get_var('U', [tgt_dim, hidden_dim])
+        self._get_var('b', [1, hidden_dim])
+        self._get_var('v', [hidden_dim, 1])
+
+    def get_pre_compute(self, s):
+        '''
+        :param s: [src_sequence, batch_size, src_dim]
+        :return: [src_sequence, batch_size. hidden_dim]
+        '''
+        hidden_dim = self.hidden_dim
+        src_dim = s.get_shape().as_list()[-1]
+        assert src_dim is not None, 'src dim must be defined'
+        W = self._get_var('W', shape=[src_dim, hidden_dim])
+        b = self._get_var('b', shape=[1, hidden_dim])
+        return tf.tensordot(s, W, [[2], [0]]) + b
+
+    def get_prob(self, src, tgt, mask, pre_compute, return_logits=False):
+        '''
+        :param s: [src_sequence_length, batch_size, src_dim]
+        :param h: [batch_size, tgt_dim] or [tgt_sequence_length, batch_size, tgt_dim]
+        :param mask: [src_sequence_length, batch_size]\
+             or [tgt_sequence_length, src_sequence_length, batch_sizse]
+        :param pre_compute: [src_sequence_length, batch_size, hidden_dim]
+        :return: [src_sequence_length, batch_size]\
+             or [tgt_sequence_length, src_sequence_length, batch_size]
+        '''
+        s_shape = src.get_shape().as_list()
+        h_shape = tgt.get_shape().as_list()
+        src_dim = s_shape[-1]
+        tgt_dim = h_shape[-1]
+        assert src_dim is not None, 'src dimension must be defined'
+        assert tgt_dim is not None, 'tgt dimension must be defined'
+
+        self._define_params(src_dim, tgt_dim)
+
+        if len(h_shape) == 2:
+            tgt = tf.expand_dims(tgt, 0)
+        if pre_compute is None:
+            pre_compute = self.get_pre_compute(src)
+
+        buf0 = pre_compute
+        buf1 = tf.tensordot(tgt, self.var['U'], axes=[[2], [0]])
+        buf2 = tf.tanh(tf.expand_dims(buf0, 0) + tf.expand_dims(buf1, 1))
+
+        if not self.is_vanilla:
+            xh1 = tgt
+            xh2 = tgt
+            s1 = src
+            if self.need_padding:
+                xh1 = tf.tensordot(xh1, self.var['V_t'], 1)
+                xh2 = tf.tensordot(xh2, self.var['S_t'], 1)
+                s1 = tf.tensordot(s1, self.var['V_s'], 1)
+            if not self.is_identity_transform:
+                xh1 = tf.tensordot(xh1, self.var['T'], 1)
+                xh2 = tf.tensordot(xh2, self.var['T'], 1)
+            buf3 = tf.expand_dims(s1, 0) * tf.expand_dims(xh1, 1)
+            buf3 = tf.tanh(tf.tensordot(buf3, self.var['V'], axes=[[3], [0]]))
+            buf = tf.reshape(tf.tanh(buf2 + buf3), shape=tf.shape(buf3))
+        else:
+            buf = buf2
+        v = self.var['v']
+        e = tf.tensordot(buf, v, [[3], [0]])
+        e = tf.squeeze(e, axis=[3])
+        tmp = tf.reshape(e + (mask - 1) * 10000.0, shape=tf.shape(e))
+        prob = tf.nn.softmax(tmp, 1)
+        if len(h_shape) == 2:
+            prob = tf.squeeze(prob, axis=[0])
+            tmp = tf.squeeze(tmp, axis=[0])
+        if return_logits:
+            return prob, tmp
+        return prob
+
+    def get_att(self, s, prob):
+        '''
+        :param s: [src_sequence_length, batch_size, src_dim]
+        :param prob: [src_sequence_length, batch_size]\
+            or [tgt_sequence_length, src_sequence_length, batch_size]
+        :return: [batch_size, src_dim] or [tgt_sequence_length, batch_size, src_dim]
+        '''
+        buf = s * tf.expand_dims(prob, axis=-1)
+        att = tf.reduce_sum(buf, axis=-3)
+        return att
diff --git a/examples/trials/weight_sharing/ga_squad/config_remote.yml b/examples/trials/weight_sharing/ga_squad/config_remote.yml
@@ -0,0 +1,31 @@
+authorName: default
+experimentName: ga_squad_weight_sharing
+trialConcurrency: 2
+maxExecDuration: 1h
+maxTrialNum: 200
+#choice: local, remote, pai
+trainingServicePlatform: remote
+#choice: true, false
+useAnnotation: false
+multiThread: true
+tuner:
+  codeDir: ../../../tuners/weight_sharing/ga_customer_tuner
+  classFileName: customer_tuner.py 
+  className: CustomerTuner
+  classArgs:
+    optimize_mode: maximize
+    population_size: 32
+    save_dir_root: /mnt/nfs/nni/ga_squad
+trial:
+  command: python3 trial.py --input_file /mnt/nfs/nni/train-v1.1.json --dev_file /mnt/nfs/nni/dev-v1.1.json --max_epoch 1 --embedding_file /mnt/nfs/nni/glove.6B.300d.txt
+  codeDir: .
+  gpuNum: 1
+machineList:
+  - ip: remote-ip-0
+    port: 8022
+    username: root 
+    passwd: screencast
+  - ip: remote-ip-1
+    port: 8022
+    username: root 
+    passwd: screencast