Skip to content

Commit

Permalink
[Numpy] [Fix] Update README.md (dmlc#1306)
Browse files Browse the repository at this point in the history
* Update README.md

Update README.md

Update ubuntu18.04-devel-gpu.Dockerfile

Update README.md

update

Update README.md

Update README.md

Update README.md

use python3 -m

Update benchmark_utils.py

Update benchmark_utils.py

Update ubuntu18.04-devel-gpu.Dockerfile

Update ubuntu18.04-devel-gpu.Dockerfile

* Update ubuntu18.04-devel-gpu.Dockerfile

* Update ubuntu18.04-devel-gpu.Dockerfile

* Update ubuntu18.04-devel-gpu.Dockerfile

* Update ubuntu18.04-devel-gpu.Dockerfile

* update

* Update README.md

* Update README.md

* Update ubuntu18.04-devel-gpu.Dockerfile

* Update README.md
  • Loading branch information
sxjscience authored Aug 23, 2020
1 parent d93356f commit 210dd0c
Show file tree
Hide file tree
Showing 8 changed files with 82 additions and 17 deletions.
44 changes: 35 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,29 @@
# GluonNLP + Numpy
<h3 align="center">
GluonNLP: Your Choice of Deep Learning for NLP
</h3>

Implementing NLP algorithms using the new numpy-like interface of MXNet. It's also a testbed for the next-generation release of GluonNLP.

This is a work-in-progress.
<p align="center">
<a href="https://github.com/dmlc/gluon-nlp/actions"><img src="https://github.com/dmlc/gluon-nlp/workflows/continuous%20build/badge.svg"></a>
<a href="https://codecov.io/gh/dmlc/gluon-nlp"><img src="https://codecov.io/gh/dmlc/gluon-nlp/branch/master/graph/badge.svg"></a>
<a href="https://github.com/dmlc/gluonnlp/actions"><img src="https://img.shields.io/badge/python-3.6%2C3.8-blue.svg"></a>
<a href="https://pypi.org/project/gluonnlp/#history"><img src="https://img.shields.io/pypi/v/gluonnlp.svg"></a>
</p>

GluonNLP is a toolkit that enables easy text preprocessing, datasets
loading and neural models building to help you speed up your Natural
Language Processing (NLP) research.

# Features

- Data Pipeline for NLP
- AutoML support (TODO)
For NLP Practitioners
- Easy-to-use Data Pipeline
- Automatically Train Models via AutoNLP (TODO)

For Researchers
- Pretrained Model Zoo
- Programming with numpy-like API

For Engineers
- Fast Deployment
- [TVM](https://tvm.apache.org/) (TODO)
- AWS Integration
Expand Down Expand Up @@ -70,6 +84,18 @@ python3 -m gluonnlp.cli.preprocess help

```

### Frequently Asked Questions
- **Question**: I cannot you access the command line toolkits. By running `nlp_data`, it reports `nlp_data: command not found`.

This is sometimes because that you have installed glunonnlp to the user folder and
the executables are installed to `~/.local/bin`. You can try to change the `PATH` variable to
also include '~/.local/bin'.

```
export PATH=${PATH}:~/.local/bin
```


# Run Unittests
You may go to [tests](tests) to see all how to run the unittests.

Expand All @@ -78,8 +104,8 @@ You may go to [tests](tests) to see all how to run the unittests.
You can use Docker to launch a JupyterLab development environment with GluonNLP installed.

```
docker pull gluonai/gluon-nlp:v1.0.0
docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 gluonai/gluon-nlp:v1.0.0
docker pull gluonai/gluon-nlp:gpu-latest
docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 --shm-size=4g gluonai/gluon-nlp:gpu-latest
```

For more details, you can refer to the guidance in [tools/docker].
For more details, you can refer to the guidance in [tools/docker](tools/docker).
2 changes: 1 addition & 1 deletion scripts/benchmarks/benchmark_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ def is_mxnet_available():


logger = logging.getLogger(__name__) # pylint: disable=invalid-name
logging_config(logger=logger)
logging_config(folder='gluonnlp_benchmark', name='benchmark', logger=logger)


_is_memory_tracing_enabled = False
Expand Down
1 change: 0 additions & 1 deletion scripts/machine_translation/train_transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -526,7 +526,6 @@ def train(args):

if __name__ == '__main__':
os.environ['MXNET_GPU_MEM_POOL_TYPE'] = 'Round'
os.environ['MXNET_USE_FUSION'] = '0' # Manually disable pointwise fusion
args = parse_args()
np.random.seed(args.seed)
mx.random.seed(args.seed)
Expand Down
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ def find_version(*file_paths):
'protobuf',
'pandas',
'tokenizers>=0.7.0',
'click>=7.0', # Dependency of youtokentome
'youtokentome>=1.0.6',
'fasttext>=0.9.2'
]
Expand Down
1 change: 0 additions & 1 deletion src/gluonnlp/data/tokenizers.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@
from typing import List, Tuple, Union, NewType, Optional
from collections import OrderedDict

import jieba
import sacremoses

from .vocab import Vocab
Expand Down
4 changes: 2 additions & 2 deletions tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@
To run the unittests, use the following command

```bash
pytest .
python3 -m pytest .
```

To test for certain file, e.g., the `test_models_transformer.py`, use the following command

```bash
pytest test_models_transformer
python3 -m pytest test_models_transformer
```

Refer to the [official guide of pytest](https://docs.pytest.org/en/latest/) for more details.
Expand Down
25 changes: 23 additions & 2 deletions tools/docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,35 @@ You can run the docker with the following command.

```
docker pull gluonai/gluon-nlp:gpu-latest
docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 --shm-size=4g gluonai/gluon-nlp:gpu-latest
docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 --shm-size=2g gluonai/gluon-nlp:gpu-latest
```

Here, we open the ports 8888, 8787, 8786, which are used for connecting to JupyterLab.
Also, we set `--shm-size` to `4g`. This sets the shared memory storage to 4GB. Since NCCL will
Also, we set `--shm-size` to `2g`. This sets the shared memory storage to 2GB. Since NCCL will
create shared memory segments, this argument is essential for the JupyterNotebook to work with NCCL.
(See also https://github.com/NVIDIA/nccl/issues/290).

The folder structure of the docker image will be
```
/workspace/
├── gluonnlp
├── horovod
├── mxnet
├── notebooks
├── data
```

If you have a multi-GPU instance, e.g., [g4dn.12xlarge](https://aws.amazon.com/ec2/instance-types/g4/),
[p2.8xlarge](https://aws.amazon.com/ec2/instance-types/p2/),
[p3.8xlarge](https://aws.amazon.com/ec2/instance-types/p3/), you can try to run the following
command to verify the installation of horovod + MXNet

```
docker run --gpus all --rm -it --shm-size=4g gluonai/gluon-nlp:gpu-latest \
horovodrun -np 2 python3 -m pytest /workspace/horovod/horovod/test/test_mxnet.py
```


## Build your own Docker Image
To build a docker image fom the dockerfile, you may use the following command:

Expand Down
21 changes: 20 additions & 1 deletion tools/docker/ubuntu18.04-devel-gpu.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ RUN echo "hwloc_base_binding_policy = none" >> /usr/local/etc/openmpi-mca-params
ENV LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH
ENV PATH=/usr/local/openmpi/bin/:/usr/local/bin:/root/.local/bin:$PATH

RUN ln -s $(which ${PYTHON}) /usr/local/bin/python
RUN ln -s $(which python3) /usr/local/bin/python

RUN mkdir -p ${WORKDIR}

Expand Down Expand Up @@ -144,6 +144,25 @@ WORKDIR ${WORKDIR}
# Debug horovod by default
RUN echo NCCL_DEBUG=INFO >> /etc/nccl.conf

# Install NodeJS + Tensorboard + TensorboardX
RUN curl -sL https://deb.nodesource.com/setup_14.x | bash - \
&& apt-get install -y nodejs

RUN apt-get update \
&& apt-get install -y --no-install-recommends \
libsndfile1-dev

RUN pip3 install --no-cache --upgrade \
soundfile==0.10.2 \
ipywidgets==7.5.1 \
jupyter_tensorboard==0.2.0 \
widgetsnbextension==3.5.1 \
tensorboard==2.1.1 \
tensorboardX==2.1
RUN jupyter labextension install jupyterlab_tensorboard \
&& jupyter nbextension enable --py widgetsnbextension \
&& jupyter labextension install @jupyter-widgets/jupyterlab-manager

# Revise default shell to /bin/bash
RUN jupyter notebook --generate-config \
&& echo "c.NotebookApp.terminado_settings = { 'shell_command': ['/bin/bash'] }" >> /root/.jupyter/jupyter_notebook_config.py
Expand Down

0 comments on commit 210dd0c

Please sign in to comment.