Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Commit

Permalink
[docs] add favicon and fix index html title
Browse files Browse the repository at this point in the history
[docs] imported all from doc/

[docs] rename doc/conf.py

[docs] don't build mxnet for generating the docs

[docs] fix readthedocs

[docs] add build requirement

[docs] fix readthedocs

[docs] fix

[docs] fix

[docs]
  • Loading branch information
mli committed Mar 25, 2016
1 parent c0daa19 commit b89832b
Show file tree
Hide file tree
Showing 10 changed files with 170 additions and 30 deletions.
34 changes: 19 additions & 15 deletions docs/_static/mxnet-theme/layout.html
Original file line number Diff line number Diff line change
Expand Up @@ -29,25 +29,16 @@
{%- macro sidebarglobal() %}
<ul class="globaltoc">
{{ toctree(maxdepth=2|toint, collapse=False,includehidden=theme_globaltoc_includehidden|tobool) }}

</ul>
{%- endmacro %}

{%- macro sidebar() %}
{%- if render_sidebar %}
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
{%- if sidebars != None %}
{#- new style sidebar: explicitly include/exclude templates #}
{%- for sidebartemplate in sidebars %}
{%- include sidebartemplate %}
{%- endfor %}
{%- else %}
{#- old style sidebars: using blocks -- should be deprecated #}
{%- block sidebartoc %}
{%- include "localtoc.html" %}
{%- endblock %}
{%- endif %}
{%- block sidebartoc %}
{%- include "localtoc.html" %}
{%- endblock %}
</div>
</div>
{%- endif %}
Expand All @@ -64,9 +55,16 @@
HAS_SOURCE: {{ has_source|lower }}
};
</script>
{%- for scriptfile in script_files %}
<script type="text/javascript" src="{{ pathto(scriptfile, 1) }}"></script>
{%- endfor %}

{% for name in ['jquery.js', 'underscore.js', 'doctools.js'] %}
<script type="text/javascript" src="{{ pathto('_static/' + name, 1) }}"></script>
{% endfor %}

<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>

<!-- {%- for scriptfile in script_files %} -->
<!-- <script type="text/javascript" src="{{ pathto(scriptfile, 1) }}"></script> -->
<!-- {%- endfor %} -->
{%- endmacro %}

{%- macro css() %}
Expand All @@ -90,7 +88,11 @@
must come *after* these tags. #}
{{ metatags }}
{%- block htmltitle %}
{%- if pagename != 'index' %}
<title>{{ title|striptags|e }}{{ titlesuffix }}</title>
{%- else %}
<title>MXNet Documents</title>
{%- endif %}
{%- endblock %}
{{ css() }}
{%- if not embedded %}
Expand Down Expand Up @@ -128,6 +130,8 @@
{%- endif %}
{%- endblock %}
{%- block extrahead %} {% endblock %}

<link rel="icon" type="image/png" href="https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/image/mxnet-icon.png">
</head>
<body role="document">
{%- include "navbar.html" %}
Expand Down
7 changes: 1 addition & 6 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,6 @@
# If your documentation needs a minimal Sphinx version, state it here.
needs_sphinx = '1.2'

if os.environ.get('READTHEDOCS', None) == 'True':
subprocess.call('doxygen')

# Add any Sphinx extension module names here, as strings. They can be extensions
# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
extensions = ['sphinx.ext.ifconfig', 'breathe']
Expand Down Expand Up @@ -307,17 +304,15 @@ def run_doxygen(folder):
"""Run the doxygen make command in the designated folder."""
try:
retcode = subprocess.call("cd %s; make doxygen" % folder, shell=True)
retcode = subprocess.call("cp -rf doxygen/html _build/html/doxygen", shell=True)
if retcode < 0:
sys.stderr.write("doxygen terminated by signal %s" % (-retcode))
except OSError as e:
sys.stderr.write("doxygen execution failed: %s" % e)


def generate_doxygen_xml(app):
"""Run the doxygen make commands if we're on the ReadTheDocs server"""
"""Run the doxygen make commands"""
run_doxygen('..')
sys.stderr.write('The Lib path: %s\n' % str(os.listdir('../lib')))

def setup(app):
# Add hook for building doxygen xml when needed
Expand Down
40 changes: 40 additions & 0 deletions docs/how_to/faq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
Frequently Asked Questions
==========================
This document contains the frequently asked questions to mxnet.

How to Copy Part of Parameters to Another Model
-----------------------------------------------
Most MXNet's model consists two parts, the argument arrays and symbol. You can simply copy the argument arrary to the argument array of another model. For example, in python model API, you can do
```python
copied_model = mx.model.FeedForward(ctx=mx.gpu(), symbol=new_symbol,
arg_params=old_arg_params, aux_params=old_aux_params,
allow_extra_params=True);
```
To copy model parameter from existing ```old_arg_params```, see also this [notebook](https://github.com/dmlc/mxnet/blob/master/example/notebooks/predict-with-pretrained-model.ipynb)

How to Extract Feature Map of Certain Layer
------------------------------------------
See this [notebook](https://github.com/dmlc/mxnet/blob/master/example/notebooks/predict-with-pretrained-model.ipynb)


What is the relation between MXNet and CXXNet, Minerva, Purine2
---------------------------------------------------------------
MXNet is created in collaboration by authors from the three projects.
The project reflects what we have learnt from the past projects.
It combines important flavour of the existing projects, being
efficient, flexible and memory efficient.

It also contains new ideas, that allows user to combine different
ways of programming, and write CPU/GPU applications that are more
memory efficient than cxxnet, purine and more flexible than minerva.


What is the Relation to Tensorflow
----------------------------------
Both MXNet and Tensorflow use a computation graph abstraction, which is initially used by Theano, then also adopted by other packages such as CGT, caffe2, purine. Currently TensorFlow adopts an optimized symbolic API. While mxnet supports a more [mixed flavor](https://mxnet.readthedocs.org/en/latest/program_model.html), with a dynamic dependency scheduler to combine symbolic and imperative programming together.
In short, mxnet is lightweight and “mixed”, with flexiblity from imperative programing, while getting similar advantages by using a computation graph to make it very fast and memory efficient. That being said, most systems will involve and we expect both systems can learn and benefit from each other.


How to Build the Project
------------------------
See [build instruction](build.md)
4 changes: 4 additions & 0 deletions docs/how_to/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ Install and run MXNet on smart devices such as mobile phones

Run MXNet on a clound such as Amazon AWS

### [Pretrained Models](./pretrained.html)

## Develope with MXNet

### [Create New Operators](./new_op.html)
Expand All @@ -33,3 +35,5 @@ Run MXNet on a clound such as Amazon AWS
### [Make a Contribution](./contribute.html)

### [Buckeing](./bucketing.html)

### [FAQ](./faq.html)
7 changes: 7 additions & 0 deletions docs/how_to/pretrained.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Pretrained Model Gallary
========================
This document contains the the pretrained in MXNet

* [89.9% Top-5 Validation Accuracy for ImageNet 1,000 Classes Challenge](https://github.com/dmlc/mxnet-model-gallery/blob/master/imagenet-1k-inception-bn.md)
* [37.2% Top-1 Training Accuracy for Full ImageNet 21,841 Classes](https://github.com/dmlc/mxnet-model-gallery/blob/master/imagenet-21k-inception.md)

1 change: 1 addition & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
breathe
16 changes: 7 additions & 9 deletions docs/sphinx_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,25 +19,23 @@ def run_build_mxnet(folder):
except OSError as e:
sys.stderr.write("build execution failed: %s" % e)

if not os.path.exists('../recommonmark'):
subprocess.call('cd ..; rm -rf recommonmark;' +
'git clone https://github.com/tqchen/recommonmark', shell = True)
else:
subprocess.call('cd ../recommonmark/; git pull', shell=True)
# run_build_mxnet("..")
# sys.stderr.write('READTHEDOCS=%s\n' % (READTHEDOCS_BUILD))

# if not os.path.exists('web-data'):
# subprocess.call('rm -rf web-data;' +
# 'git clone https://github.com/dmlc/web-data', shell = True)
# else:
# subprocess.call('cd web-data; git pull', shell=True)

if not os.path.exists('../recommonmark'):
subprocess.call('cd ..; rm -rf recommonmark;' +
'git clone https://github.com/tqchen/recommonmark', shell = True)
else:
subprocess.call('cd ../recommonmark/; git pull', shell=True)

run_build_mxnet("..")
sys.path.insert(0, os.path.abspath('../recommonmark/'))


sys.stderr.write('READTHEDOCS=%s\n' % (READTHEDOCS_BUILD))

from recommonmark import parser, transform

MarkdownParser = parser.CommonMarkParser
Expand Down
1 change: 1 addition & 0 deletions docs/system/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ other. The modules are
- Introduces how we can reduce memory consumption of deep nets
* [Efficient Data Loading Module for Deep Learning](note_data_loading.html)
- Push the efficiency offline data preparation and online data loading.
* [Survay of RNN Interface](rnn_interface.html)

## How to Read the Code
- All the module interface are listed in [include](../../include), these
Expand Down
87 changes: 87 additions & 0 deletions docs/system/rnn_interface.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Survey of existing interfaces and implementations

Some commonly used deep learning libraries with good RNN / LSTM support include [Theano](http://deeplearning.net/software/theano/library/scan.html) and its wrappers [Lasagne](http://lasagne.readthedocs.org/en/latest/modules/layers/recurrent.html) and [Keras](http://keras.io/layers/recurrent/); [CNTK](https://cntk.codeplex.com/); [TensorFlow](https://www.tensorflow.org/versions/master/tutorials/recurrent/index.html); and various implementations in Torch like [char-rnn](https://github.com/karpathy/char-rnn), [this](https://github.com/Element-Research/rnn) and [this](https://github.com/wojzaremba/lstm).

## Theano

RNN support in Theano is provided by its [scan operator](http://deeplearning.net/software/theano/library/scan.html), which allows construction of a loop where the number of iterations is specified via a runtime value of a symbolic variable. An official example of LSTM implementation with scan can be found [here](http://deeplearning.net/tutorial/lstm.html).

### Implementation

I'm not very familiar with the Theano internals, but it seems from [theano/scan_module/scan_op.py#execute](https://github.com/Theano/Theano/blob/master/theano/scan_module/scan_op.py#L1225) that the scan operator is implemented with a loop in Python that perform one iteration at a time:

```python
fn = self.fn.fn

while (i < n_steps) and cond:
# ...
fn()
```

The `grad` function in Theano constructs a symbolic graph for computing gradients. So the `grad` for the scan operator is actually implemented by [constructing another scan operator](https://github.com/Theano/Theano/blob/master/theano/scan_module/scan_op.py#L2527):

```python
local_op = Scan(inner_gfn_ins, inner_gfn_outs, info)
outputs = local_op(*outer_inputs)
```

The [performance guide](http://deeplearning.net/software/theano/library/scan.html#optimizing-scan-s-performance) for Theano's scan operator suggests minimizing the usage of scan. This might be due to the fact that the loop is executed in Python, which might be a bit slow (due to context switching, and performance of Python itself). Moreover, since no unrolling is performed, the graph optimizer cannot see the big picture here.

If I understand correctly, when multiple RNN/LSTM layers are stacked, instead of a single loop with each iteration computing the whole feedforward network operation, the computation goes by doing a separate loop for each of the layer that uses the scan operator, sequentially. This is fine if all the intermediate values are stored to support computing the gradients. But otherwise, using a single loop could be more memory efficient.

### Lasagne

The documentation for RNN in Lasagne can be found [here](http://lasagne.readthedocs.org/en/latest/modules/layers/recurrent.html). In Lasagne, a recurrent layer is just like standard layers, except that the input shape is expected to be `(batch_size, sequence_length, feature_dimension)`. The output shape is then `(batch_size, sequence_length, output_dimension)`.

Both `batch_size` and `sequence_length` are specified as `None`, and inferred from the data. Alternative, when memory is enough and the (maximum) sequence length is known a prior, the user can set `unroll_scan` to `False`. Then Lasagne will unroll the graph explicitly, instead of using Theano `scan` operator. Explicitly unrolling is implemented in [utils.py#unroll_scan](https://github.com/Lasagne/Lasagne/blob/master/lasagne/utils.py#L340).

The recurrent layer also accept a `mask_input`, to support the case of variable length sequences (e.g. sequences could be of different length even within a mini-batch). The mask is of shape `(batch_size, sequence_length)`.

### Keras

The documentation for RNN in Keras can be found [here](http://keras.io/layers/recurrent/). The interface in Keras is similar to Lasagne. The input is expected to be of shape `(batch_size, sequence_length, feature_dimension)`, and the output shape (if `return_sequences` is `True`) is `(batch_size, sequence_length, feature_dimension)`.

Keras currently supports both Theano and TensorFlow backend. RNN for the Theano backend is [implemented with the scan operator](https://github.com/fchollet/keras/blob/master/keras/backend/theano_backend.py#L432). For TensorFlow, it seem to be [implemented via explicitly unrolling](https://github.com/fchollet/keras/blob/master/keras/backend/tensorflow_backend.py#L396). The documentation says for TensorFlow backend, the sequence length must be specified a prior, and masking is currently not working (because `tf.reduce_any` is not functioning yet).

## Torch

[karpathy/char-rnn](https://github.com/karpathy/char-rnn) is implemented via [explicitly unrolling](https://github.com/karpathy/char-rnn/blob/master/model/RNN.lua#L15). [Element-Research/rnn](https://github.com/Element-Research/rnn), on the contrary, run the sequence iteration in Lua. It actually has a very modular design:

* The basic RNN/LSTM modules only run *one* time step upon one call of `forward` (and accumulate / store necessary information to support backward computation if needed). So the users could have detailed control when using this API directly.
* A collection of `Sequencer` are defined to model common scenarios like forward sequence, bi-directional sequence, attention models, etc.
* Other utility modules like masking to support variable length sequences, etc.

## CNTK

CNTK looks quite different from other common deep learning libraries. I cannot understand it very well. I will talk with Yu to get more details.

It seems the basic data types are matrices (although there is also a `TensorView` utility class). The mini-batch data for sequence data is packed in a matrix with N-row being `feature_dimension` and N-column being `sequence_length * batch_size` (see Figure 2.9 at page 50 of the [CNTKBook](http://research.microsoft.com/pubs/226641/CNTKBook-20151201.pdf)).

Recurrent networks is a first-class citizen in CNTK. In section 5.2.1.8 of the CNTKBook, we can see an example of customized computation node. The node need to explicitly define function for standard forward and forward with a time index, which is used for RNN evaluation:

```cpp
virtual void EvaluateThisNode()
{
EvaluateThisNodeS(FunctionValues(), Inputs(0)->
FunctionValues(), Inputs(1)->FunctionValues());
}
virtual void EvaluateThisNode(const size_t timeIdxInSeq)
{
Matrix<ElemType> sliceInputValue = Inputs(1)->
FunctionValues().ColumnSlice(timeIdxInSeq *
m_samplesInRecurrentStep, m_samplesInRecurrentStep);
Matrix<ElemType> sliceOutputValue = m_functionValues.
ColumnSlice(timeIdxInSeq * m_samplesInRecurrentStep,
m_samplesInRecurrentStep);
EvaluateThisNodeS(sliceOutputValue, Inputs(0)->
FunctionValues(), sliceInput1Value);
}
```
The function `ColumnSlice(start_col, num_col)` takes out the packed data for that time index as described above (here `m_samplesInRecurrentStep` must be mini-batch size).
The low-level API for recurrent connection seem to be a *delay node*. But I'm not sure how to use this low level API. The [example of ptb language model](https://cntk.codeplex.com/SourceControl/latest#Examples/Text/PennTreebank/Config/rnn.config) uses very high-level API (simply setting `recurrentLayer = 1` in the config).
## TensorFlow
The [current example of RNNLM](https://www.tensorflow.org/versions/master/tutorials/recurrent/index.html#recurrent-neural-networks) in TensorFlow use explicit unrolling for a predefined number of time steps. The white paper mentioned advanced control flow API (Theano's scan-like) coming in the future.
3 changes: 3 additions & 0 deletions readthedocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
formats:
- none
requirements_file: docs/requirements.txt

0 comments on commit b89832b

Please sign in to comment.