[docs] add favicon and fix index html title

[docs] imported all from doc/ [docs] rename doc/conf.py [docs] don't build mxnet for generating the docs [docs] fix readthedocs [docs] add build requirement [docs] fix readthedocs [docs] fix [docs] fix [docs]
apache · Mar 25, 2016 · b89832b · b89832b
1 parent c0daa19
commit b89832b
Show file tree

Hide file tree

Showing 10 changed files with 170 additions and 30 deletions.
diff --git a/docs/_static/mxnet-theme/layout.html b/docs/_static/mxnet-theme/layout.html
@@ -29,25 +29,16 @@
 {%- macro sidebarglobal() %}
 <ul class="globaltoc">
   {{ toctree(maxdepth=2|toint, collapse=False,includehidden=theme_globaltoc_includehidden|tobool) }}
-
 </ul>
 {%- endmacro %}
 
 {%- macro sidebar() %}
       {%- if render_sidebar %}
       <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
         <div class="sphinxsidebarwrapper">
-          {%- if sidebars != None %}
-            {#- new style sidebar: explicitly include/exclude templates #}
-            {%- for sidebartemplate in sidebars %}
-            {%- include sidebartemplate %}
-            {%- endfor %}
-          {%- else %}
-            {#- old style sidebars: using blocks -- should be deprecated #}
-            {%- block sidebartoc %}
-            {%- include "localtoc.html" %}
-            {%- endblock %}
-          {%- endif %}
+          {%- block sidebartoc %}
+          {%- include "localtoc.html" %}
+          {%- endblock %}
         </div>
       </div>
       {%- endif %}
@@ -64,9 +55,16 @@
         HAS_SOURCE:  {{ has_source|lower }}
       };
     </script>
-    {%- for scriptfile in script_files %}
-    <script type="text/javascript" src="{{ pathto(scriptfile, 1) }}"></script>
-    {%- endfor %}
+
+    {% for name in ['jquery.js', 'underscore.js', 'doctools.js'] %}
+    <script type="text/javascript" src="{{ pathto('_static/' + name, 1) }}"></script>
+    {% endfor %}
+
+    <script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
+
+    <!-- {%- for scriptfile in script_files %} -->
+    <!-- <script type="text/javascript" src="{{ pathto(scriptfile, 1) }}"></script> -->
+    <!-- {%- endfor %} -->
 {%- endmacro %}
 
 {%- macro css() %}
@@ -90,7 +88,11 @@
        must come *after* these tags. #}
     {{ metatags }}
     {%- block htmltitle %}
+    {%- if pagename != 'index' %}
     <title>{{ title|striptags|e }}{{ titlesuffix }}</title>
+    {%- else %}
+    <title>MXNet Documents</title>
+    {%- endif %}
     {%- endblock %}
     {{ css() }}
     {%- if not embedded %}
@@ -128,6 +130,8 @@
     {%- endif %}
 {%- endblock %}
 {%- block extrahead %} {% endblock %}
+
+    <link rel="icon" type="image/png" href="https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/image/mxnet-icon.png">
   </head>
   <body role="document">
     {%- include "navbar.html" %}

diff --git a/docs/conf.py b/docs/conf.py
@@ -36,9 +36,6 @@
 # If your documentation needs a minimal Sphinx version, state it here.
 needs_sphinx = '1.2'
 
-if os.environ.get('READTHEDOCS', None) == 'True':
-  subprocess.call('doxygen')
-
 # Add any Sphinx extension module names here, as strings. They can be extensions
 # coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
 extensions = ['sphinx.ext.ifconfig', 'breathe']
@@ -307,17 +304,15 @@ def run_doxygen(folder):
     """Run the doxygen make command in the designated folder."""
     try:
         retcode = subprocess.call("cd %s; make doxygen" % folder, shell=True)
-        retcode = subprocess.call("cp -rf doxygen/html _build/html/doxygen", shell=True)
         if retcode < 0:
             sys.stderr.write("doxygen terminated by signal %s" % (-retcode))
     except OSError as e:
         sys.stderr.write("doxygen execution failed: %s" % e)
 
 
 def generate_doxygen_xml(app):
-    """Run the doxygen make commands if we're on the ReadTheDocs server"""
+    """Run the doxygen make commands"""
     run_doxygen('..')
-    sys.stderr.write('The Lib path: %s\n' % str(os.listdir('../lib')))
 
 def setup(app):
     # Add hook for building doxygen xml when needed

diff --git a/docs/how_to/faq.md b/docs/how_to/faq.md
@@ -0,0 +1,40 @@
+Frequently Asked Questions
+==========================
+This document contains the frequently asked questions to mxnet.
+
+How to Copy Part of Parameters to Another Model
+-----------------------------------------------
+Most MXNet's model consists two parts, the argument arrays and symbol. You can simply copy the argument arrary to the argument array of another model. For example, in python model API, you can do
+```python
+copied_model =  mx.model.FeedForward(ctx=mx.gpu(), symbol=new_symbol,
+                                     arg_params=old_arg_params, aux_params=old_aux_params,
+                                     allow_extra_params=True);
+```
+To copy model parameter from existing ```old_arg_params```, see also this [notebook](https://github.com/dmlc/mxnet/blob/master/example/notebooks/predict-with-pretrained-model.ipynb)
+
+How to Extract Feature Map of Certain Layer
+------------------------------------------
+See this [notebook](https://github.com/dmlc/mxnet/blob/master/example/notebooks/predict-with-pretrained-model.ipynb)
+
+
+What is the relation between MXNet and CXXNet, Minerva, Purine2
+---------------------------------------------------------------
+MXNet is created in collaboration by authors from the three projects.
+The project reflects what we have learnt from the past projects.
+It combines important flavour of the existing projects, being
+efficient, flexible and memory efficient.
+
+It also contains new ideas, that allows user to combine different
+ways of programming, and write CPU/GPU applications that are more
+memory efficient than cxxnet, purine and more flexible than minerva.
+
+
+What is the Relation to Tensorflow
+----------------------------------
+Both MXNet and Tensorflow use a computation graph abstraction, which is initially used by Theano, then also adopted by other packages such as CGT, caffe2, purine. Currently TensorFlow adopts an optimized symbolic API. While mxnet supports a more [mixed flavor](https://mxnet.readthedocs.org/en/latest/program_model.html), with a dynamic dependency scheduler to combine symbolic and imperative programming together. 
+In short, mxnet is lightweight and “mixed”, with flexiblity from imperative programing, while getting similar advantages by using a computation graph to make it very fast and memory efficient. That being said, most systems will involve and we expect both systems can learn and benefit from each other.
+
+
+How to Build the Project
+------------------------
+See [build instruction](build.md)
diff --git a/docs/how_to/index.md b/docs/how_to/index.md
@@ -24,6 +24,8 @@ Install and run MXNet on smart devices such as mobile phones
 
 Run MXNet on a clound such as Amazon AWS
 
+### [Pretrained Models](./pretrained.html)
+
 ## Develope with MXNet
 
 ### [Create New Operators](./new_op.html)
@@ -33,3 +35,5 @@ Run MXNet on a clound such as Amazon AWS
 ### [Make a Contribution](./contribute.html)
 
 ### [Buckeing](./bucketing.html)
+
+### [FAQ](./faq.html)
diff --git a/docs/how_to/pretrained.md b/docs/how_to/pretrained.md
@@ -0,0 +1,7 @@
+Pretrained Model Gallary
+========================
+This document contains the the pretrained in MXNet
+
+* [89.9% Top-5 Validation Accuracy for ImageNet 1,000 Classes Challenge](https://github.com/dmlc/mxnet-model-gallery/blob/master/imagenet-1k-inception-bn.md)
+* [37.2% Top-1 Training Accuracy for Full ImageNet 21,841 Classes](https://github.com/dmlc/mxnet-model-gallery/blob/master/imagenet-21k-inception.md)
+
diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -0,0 +1 @@
+breathe
diff --git a/docs/sphinx_util.py b/docs/sphinx_util.py
@@ -19,25 +19,23 @@ def run_build_mxnet(folder):
     except OSError as e:
         sys.stderr.write("build execution failed: %s" % e)
 
-if not os.path.exists('../recommonmark'):
-    subprocess.call('cd ..; rm -rf recommonmark;' +
-                    'git clone https://github.com/tqchen/recommonmark', shell = True)
-else:
-    subprocess.call('cd ../recommonmark/; git pull', shell=True)
+# run_build_mxnet("..")
+# sys.stderr.write('READTHEDOCS=%s\n' % (READTHEDOCS_BUILD))
 
 # if not os.path.exists('web-data'):
 #     subprocess.call('rm -rf web-data;' +
 #                     'git clone https://github.com/dmlc/web-data', shell = True)
 # else:
 #     subprocess.call('cd web-data; git pull', shell=True)
 
+if not os.path.exists('../recommonmark'):
+    subprocess.call('cd ..; rm -rf recommonmark;' +
+                    'git clone https://github.com/tqchen/recommonmark', shell = True)
+else:
+    subprocess.call('cd ../recommonmark/; git pull', shell=True)
 
-run_build_mxnet("..")
 sys.path.insert(0, os.path.abspath('../recommonmark/'))
 
-
-sys.stderr.write('READTHEDOCS=%s\n' % (READTHEDOCS_BUILD))
-
 from recommonmark import parser, transform
 
 MarkdownParser = parser.CommonMarkParser

diff --git a/docs/system/index.md b/docs/system/index.md
@@ -38,6 +38,7 @@ other. The modules are
 	- Introduces how we can reduce memory consumption of deep nets
 * [Efficient Data Loading Module for Deep Learning](note_data_loading.html)
 	- Push the efficiency offline data preparation and online data loading.
+* [Survay of RNN Interface](rnn_interface.html)
 
 ## How to Read the Code
 - All the module interface are listed in [include](../../include), these

diff --git a/docs/system/rnn_interface.md b/docs/system/rnn_interface.md
@@ -0,0 +1,87 @@
+# Survey of existing interfaces and implementations
+
+Some commonly used deep learning libraries with good RNN / LSTM support include [Theano](http://deeplearning.net/software/theano/library/scan.html) and its wrappers [Lasagne](http://lasagne.readthedocs.org/en/latest/modules/layers/recurrent.html) and [Keras](http://keras.io/layers/recurrent/); [CNTK](https://cntk.codeplex.com/); [TensorFlow](https://www.tensorflow.org/versions/master/tutorials/recurrent/index.html); and various implementations in Torch like [char-rnn](https://github.com/karpathy/char-rnn), [this](https://github.com/Element-Research/rnn) and [this](https://github.com/wojzaremba/lstm).
+
+## Theano
+
+RNN support in Theano is provided by its [scan operator](http://deeplearning.net/software/theano/library/scan.html), which allows construction of a loop where the number of iterations is specified via a runtime value of a symbolic variable. An official example of LSTM implementation with scan can be found [here](http://deeplearning.net/tutorial/lstm.html).
+
+### Implementation
+
+I'm not very familiar with the Theano internals, but it seems from [theano/scan_module/scan_op.py#execute](https://github.com/Theano/Theano/blob/master/theano/scan_module/scan_op.py#L1225) that the scan operator is implemented with a loop in Python that perform one iteration at a time:
+
+```python
+fn = self.fn.fn
+
+while (i < n_steps) and cond:
+    # ...
+    fn()
+```
+
+The `grad` function in Theano constructs a symbolic graph for computing gradients. So the `grad` for the scan operator is actually implemented by [constructing another scan operator](https://github.com/Theano/Theano/blob/master/theano/scan_module/scan_op.py#L2527):
+
+```python
+local_op = Scan(inner_gfn_ins, inner_gfn_outs, info)
+outputs = local_op(*outer_inputs)
+```
+
+The [performance guide](http://deeplearning.net/software/theano/library/scan.html#optimizing-scan-s-performance) for Theano's scan operator suggests minimizing the usage of scan. This might be due to the fact that the loop is executed in Python, which might be a bit slow (due to context switching, and performance of Python itself). Moreover, since no unrolling is performed, the graph optimizer cannot see the big picture here.
+
+If I understand correctly, when multiple RNN/LSTM layers are stacked, instead of a single loop with each iteration computing the whole feedforward network operation, the computation goes by doing a separate loop for each of the layer that uses the scan operator, sequentially. This is fine if all the intermediate values are stored to support computing the gradients. But otherwise, using a single loop could be more memory efficient.
+
+### Lasagne
+
+The documentation for RNN in Lasagne can be found [here](http://lasagne.readthedocs.org/en/latest/modules/layers/recurrent.html). In Lasagne, a recurrent layer is just like standard layers, except that the input shape is expected to be `(batch_size, sequence_length, feature_dimension)`. The output shape is then `(batch_size, sequence_length, output_dimension)`.
+
+Both `batch_size` and `sequence_length` are specified as `None`, and inferred from the data. Alternative, when memory is enough and the (maximum) sequence length is known a prior, the user can set `unroll_scan` to `False`. Then Lasagne will unroll the graph explicitly, instead of using Theano `scan` operator. Explicitly unrolling is implemented in [utils.py#unroll_scan](https://github.com/Lasagne/Lasagne/blob/master/lasagne/utils.py#L340).
+
+The recurrent layer also accept a `mask_input`, to support the case of variable length sequences (e.g. sequences could be of different length even within a mini-batch). The mask is of shape `(batch_size, sequence_length)`.
+
+### Keras
+
+The documentation for RNN in Keras can be found [here](http://keras.io/layers/recurrent/). The interface in Keras is similar to Lasagne. The input is expected to be of shape `(batch_size, sequence_length, feature_dimension)`, and the output shape (if `return_sequences` is `True`) is `(batch_size, sequence_length, feature_dimension)`.
+
+Keras currently supports both Theano and TensorFlow backend. RNN for the Theano backend is [implemented with the scan operator](https://github.com/fchollet/keras/blob/master/keras/backend/theano_backend.py#L432). For TensorFlow, it seem to be [implemented via explicitly unrolling](https://github.com/fchollet/keras/blob/master/keras/backend/tensorflow_backend.py#L396). The documentation says for TensorFlow backend, the sequence length must be specified a prior, and masking is currently not working (because `tf.reduce_any` is not functioning yet).
+
+## Torch
+
+[karpathy/char-rnn](https://github.com/karpathy/char-rnn) is implemented via [explicitly unrolling](https://github.com/karpathy/char-rnn/blob/master/model/RNN.lua#L15). [Element-Research/rnn](https://github.com/Element-Research/rnn), on the contrary, run the sequence iteration in Lua. It actually has a very modular design:
+
+* The basic RNN/LSTM modules only run *one* time step upon one call of `forward` (and accumulate / store necessary information to support backward computation if needed). So the users could have detailed control when using this API directly.
+* A collection of `Sequencer` are defined to model common scenarios like forward sequence, bi-directional sequence, attention models, etc.
+* Other utility modules like masking to support variable length sequences, etc.
+
+## CNTK
+
+CNTK looks quite different from other common deep learning libraries. I cannot understand it very well. I will talk with Yu to get more details.
+
+It seems the basic data types are matrices (although there is also a `TensorView` utility class). The mini-batch data for sequence data is packed in a matrix with N-row being `feature_dimension` and N-column being `sequence_length * batch_size` (see Figure 2.9 at page 50 of the [CNTKBook](http://research.microsoft.com/pubs/226641/CNTKBook-20151201.pdf)).
+
+Recurrent networks is a first-class citizen in CNTK. In section 5.2.1.8 of the CNTKBook, we can see an example of customized computation node. The node need to explicitly define function for standard forward and forward with a time index, which is used for RNN evaluation:
+
+```cpp
+virtual void EvaluateThisNode()
+{
+    EvaluateThisNodeS(FunctionValues(), Inputs(0)->
+        FunctionValues(), Inputs(1)->FunctionValues());
+}
+virtual void EvaluateThisNode(const size_t timeIdxInSeq)
+{
+    Matrix<ElemType> sliceInputValue = Inputs(1)->
+        FunctionValues().ColumnSlice(timeIdxInSeq *
+        m_samplesInRecurrentStep, m_samplesInRecurrentStep);
+    Matrix<ElemType> sliceOutputValue = m_functionValues.
+        ColumnSlice(timeIdxInSeq * m_samplesInRecurrentStep,
+        m_samplesInRecurrentStep);
+    EvaluateThisNodeS(sliceOutputValue, Inputs(0)->
+        FunctionValues(), sliceInput1Value);
+}
+```
+
+The function `ColumnSlice(start_col, num_col)` takes out the packed data for that time index as described above (here `m_samplesInRecurrentStep` must be mini-batch size).
+
+The low-level API for recurrent connection seem to be a *delay node*. But I'm not sure how to use this low level API. The [example of ptb language model](https://cntk.codeplex.com/SourceControl/latest#Examples/Text/PennTreebank/Config/rnn.config) uses very high-level API (simply setting `recurrentLayer = 1` in the config).
+
+## TensorFlow
+
+The [current example of RNNLM](https://www.tensorflow.org/versions/master/tutorials/recurrent/index.html#recurrent-neural-networks) in TensorFlow use explicit unrolling for a predefined number of time steps. The white paper mentioned advanced control flow API (Theano's scan-like) coming in the future.
diff --git a/readthedocs.yml b/readthedocs.yml
@@ -0,0 +1,3 @@
+formats:
+        - none
+requirements_file: docs/requirements.txt