[Numpy] [Fix] Update README.md (dmlc#1306)

* Update README.md Update README.md Update ubuntu18.04-devel-gpu.Dockerfile Update README.md update Update README.md Update README.md Update README.md use python3 -m Update benchmark_utils.py Update benchmark_utils.py Update ubuntu18.04-devel-gpu.Dockerfile Update ubuntu18.04-devel-gpu.Dockerfile * Update ubuntu18.04-devel-gpu.Dockerfile * Update ubuntu18.04-devel-gpu.Dockerfile * Update ubuntu18.04-devel-gpu.Dockerfile * Update ubuntu18.04-devel-gpu.Dockerfile * update * Update README.md * Update README.md * Update ubuntu18.04-devel-gpu.Dockerfile * Update README.md
PawelGlomski-Intel · Aug 23, 2020 · 210dd0c · 210dd0c
1 parent d93356f
commit 210dd0c
Show file tree

Hide file tree

Showing 8 changed files with 82 additions and 17 deletions.
diff --git a/README.md b/README.md
@@ -1,15 +1,29 @@
-# GluonNLP + Numpy
+<h3 align="center">
+GluonNLP: Your Choice of Deep Learning for NLP
+</h3>
 
-Implementing NLP algorithms using the new numpy-like interface of MXNet. It's also a testbed for the next-generation release of GluonNLP.
-
-This is a work-in-progress.
+<p align="center">
+    <a href="https://github.com/dmlc/gluon-nlp/actions"><img src="https://github.com/dmlc/gluon-nlp/workflows/continuous%20build/badge.svg"></a>
+    <a href="https://codecov.io/gh/dmlc/gluon-nlp"><img src="https://codecov.io/gh/dmlc/gluon-nlp/branch/master/graph/badge.svg"></a>
+    <a href="https://github.com/dmlc/gluonnlp/actions"><img src="https://img.shields.io/badge/python-3.6%2C3.8-blue.svg"></a>
+    <a href="https://pypi.org/project/gluonnlp/#history"><img src="https://img.shields.io/pypi/v/gluonnlp.svg"></a>
+</p>
 
+GluonNLP is a toolkit that enables easy text preprocessing, datasets
+loading and neural models building to help you speed up your Natural
+Language Processing (NLP) research.
 
 # Features
 
-- Data Pipeline for NLP
-- AutoML support (TODO)
+For NLP Practitioners
+- Easy-to-use Data Pipeline
+- Automatically Train Models via AutoNLP (TODO)
+
+For Researchers
 - Pretrained Model Zoo
+- Programming with numpy-like API
+
+For Engineers
 - Fast Deployment
     - [TVM](https://tvm.apache.org/) (TODO)
 - AWS Integration
@@ -70,6 +84,18 @@ python3 -m gluonnlp.cli.preprocess help
 
 ```
 
+### Frequently Asked Questions
+- **Question**: I cannot you access the command line toolkits. By running `nlp_data`, it reports `nlp_data: command not found`.
+
+  This is sometimes because that you have installed glunonnlp to the user folder and 
+  the executables are installed to `~/.local/bin`. You can try to change the `PATH` variable to 
+  also include '~/.local/bin'.
+
+  ```
+  export PATH=${PATH}:~/.local/bin
+  ```
+
+
 # Run Unittests
 You may go to [tests](tests) to see all how to run the unittests.
 
@@ -78,8 +104,8 @@ You may go to [tests](tests) to see all how to run the unittests.
 You can use Docker to launch a JupyterLab development environment with GluonNLP installed.
 
 ```
-docker pull gluonai/gluon-nlp:v1.0.0
-docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 gluonai/gluon-nlp:v1.0.0
+docker pull gluonai/gluon-nlp:gpu-latest
+docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 --shm-size=4g gluonai/gluon-nlp:gpu-latest
 ``` 
 
-For more details, you can refer to the guidance in [tools/docker].
+For more details, you can refer to the guidance in [tools/docker](tools/docker).
diff --git a/scripts/benchmarks/benchmark_utils.py b/scripts/benchmarks/benchmark_utils.py
@@ -91,7 +91,7 @@ def is_mxnet_available():
 
 
 logger = logging.getLogger(__name__)  # pylint: disable=invalid-name
-logging_config(logger=logger)
+logging_config(folder='gluonnlp_benchmark', name='benchmark', logger=logger)
 
 
 _is_memory_tracing_enabled = False

diff --git a/scripts/machine_translation/train_transformer.py b/scripts/machine_translation/train_transformer.py
@@ -526,7 +526,6 @@ def train(args):
 
 if __name__ == '__main__':
     os.environ['MXNET_GPU_MEM_POOL_TYPE'] = 'Round'
-    os.environ['MXNET_USE_FUSION'] = '0'  # Manually disable pointwise fusion
     args = parse_args()
     np.random.seed(args.seed)
     mx.random.seed(args.seed)

diff --git a/setup.py b/setup.py
@@ -39,6 +39,7 @@ def find_version(*file_paths):
     'protobuf',
     'pandas',
     'tokenizers>=0.7.0',
+    'click>=7.0',  # Dependency of youtokentome
     'youtokentome>=1.0.6',
     'fasttext>=0.9.2'
 ]

diff --git a/src/gluonnlp/data/tokenizers.py b/src/gluonnlp/data/tokenizers.py
@@ -30,7 +30,6 @@
 from typing import List, Tuple, Union, NewType, Optional
 from collections import OrderedDict
 
-import jieba
 import sacremoses
 
 from .vocab import Vocab

diff --git a/tests/README.md b/tests/README.md
@@ -3,13 +3,13 @@
 To run the unittests, use the following command
 
 ```bash
-pytest .
+python3 -m pytest .
 ```
 
 To test for certain file, e.g., the `test_models_transformer.py`, use the following command
 
 ```bash
-pytest test_models_transformer
+python3 -m pytest test_models_transformer
 ```
 
 Refer to the [official guide of pytest](https://docs.pytest.org/en/latest/) for more details.

diff --git a/tools/docker/README.md b/tools/docker/README.md
@@ -9,14 +9,35 @@ You can run the docker with the following command.
 
 ```
 docker pull gluonai/gluon-nlp:gpu-latest
-docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 --shm-size=4g gluonai/gluon-nlp:gpu-latest
+docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 --shm-size=2g gluonai/gluon-nlp:gpu-latest
 ```
 
 Here, we open the ports 8888, 8787, 8786, which are used for connecting to JupyterLab. 
-Also, we set `--shm-size` to `4g`. This sets the shared memory storage to 4GB. Since NCCL will 
+Also, we set `--shm-size` to `2g`. This sets the shared memory storage to 2GB. Since NCCL will 
 create shared memory segments, this argument is essential for the JupyterNotebook to work with NCCL. 
 (See also https://github.com/NVIDIA/nccl/issues/290).
 
+The folder structure of the docker image will be
+```
+/workspace/
+├── gluonnlp
+├── horovod
+├── mxnet
+├── notebooks
+├── data
+```
+
+If you have a multi-GPU instance, e.g., [g4dn.12xlarge](https://aws.amazon.com/ec2/instance-types/g4/),
+[p2.8xlarge](https://aws.amazon.com/ec2/instance-types/p2/),
+[p3.8xlarge](https://aws.amazon.com/ec2/instance-types/p3/), you can try to run the following 
+command to verify the installation of horovod + MXNet
+
+```
+docker run --gpus all --rm -it --shm-size=4g gluonai/gluon-nlp:gpu-latest \
+    horovodrun -np 2 python3 -m pytest /workspace/horovod/horovod/test/test_mxnet.py
+```
+
+
 ## Build your own Docker Image
 To build a docker image fom the dockerfile, you may use the following command:
 

diff --git a/tools/docker/ubuntu18.04-devel-gpu.Dockerfile b/tools/docker/ubuntu18.04-devel-gpu.Dockerfile
@@ -74,7 +74,7 @@ RUN echo "hwloc_base_binding_policy = none" >> /usr/local/etc/openmpi-mca-params
 ENV LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH
 ENV PATH=/usr/local/openmpi/bin/:/usr/local/bin:/root/.local/bin:$PATH
 
-RUN ln -s $(which ${PYTHON}) /usr/local/bin/python
+RUN ln -s $(which python3) /usr/local/bin/python
 
 RUN mkdir -p ${WORKDIR}
 
@@ -144,6 +144,25 @@ WORKDIR ${WORKDIR}
 # Debug horovod by default
 RUN echo NCCL_DEBUG=INFO >> /etc/nccl.conf
 
+# Install NodeJS + Tensorboard + TensorboardX
+RUN curl -sL https://deb.nodesource.com/setup_14.x | bash - \
+    && apt-get install -y nodejs
+
+RUN apt-get update \
+ && apt-get install -y --no-install-recommends \
+    libsndfile1-dev
+
+RUN pip3 install --no-cache --upgrade \
+    soundfile==0.10.2 \
+    ipywidgets==7.5.1 \
+    jupyter_tensorboard==0.2.0 \
+    widgetsnbextension==3.5.1 \
+    tensorboard==2.1.1 \
+    tensorboardX==2.1
+RUN jupyter labextension install jupyterlab_tensorboard \
+   && jupyter nbextension enable --py widgetsnbextension \
+   && jupyter labextension install @jupyter-widgets/jupyterlab-manager
+
 # Revise default shell to /bin/bash
 RUN jupyter notebook --generate-config \
   && echo "c.NotebookApp.terminado_settings = { 'shell_command': ['/bin/bash'] }" >> /root/.jupyter/jupyter_notebook_config.py