Docker container for UDPipe (https://github.com/ufal/udpipe) REST server.
UDPipe is trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files.
To use UDPipe REST server Docker image, you need to:
-
Find a language model.
-
Train it or get if from somewhere.
-
-
To train a modeL:
-
Get a training file. For example, from Universal Dependencies.
-
Train UDPipe for the language.
-
-
Build Docker image with the language model.
-
Run Docker image.
-
Use it.
Example how to use is described below. It shows how to train UDPipe for Finnish and create UDPipe REST server Docker image for Finnish.
Training needs some manual steps.
-
Download or clone this repository to your computer.
-
Create directory training_files under training-directory.
-
Download Finnish training file fi_tdt-ud-train.conllu to training_files-directory.
-
GitHub repo of the file is: UD_Finnish-TDT
-
-
Copy training/training_template.dockerfile to training/training_fi.dockerfile.
-
Find ENV-entries in the training_fi.dockerfile:
-
Set training file name in training_files-directory.
-
Finnish training file: fi_tdt-ud-train.conllu.
-
-
Set model name. For example: fi_20180111.model.
-
-
Start training by executing docker build:
-
Change to training-directory.
-
docker build -t training_fi -f training_fi.dockerfile .
-
-
Wait… wait… wait for it…
-
Eventually, start the Docker-container:
-
docker run -it --rm -p 8000:8000 training_fi
-
-
Use browser and go to http://127.0.0.1:8000.
-
Download model file to training/models-directory.
-
The next step is to build REST-server Docker image using the model file you just downloaded.
During training, we trained the model file to be used with UDPipe REST server. Follow the instructions to build the actual REST server image.
-
Copy rest_server_template.dockerfile to rest_server_fi.dockerfile.
-
Open rest_server_fi.dockerfile and find ENV-entries
-
change MODEL_FILE_NAME to the model name from previous section
-
For example: fi
-
-
change MODEL_NAME and MODEL_DESC to some descriptive name.
-
For example: finnish_model_20180112
-
-
-
Build Docker image:
-
docker build -t udpipe-rest-server-fi -f rest_server_fi.dockerfile .
-
-
Run Docker image:
-
docker run -it --rm -p 8080:8080 -t udpipe-rest-server-fi
-
-
Access and test using browser:
You can use curl to test:
-
curl -F data=@data/text.txt -F tokenizer= -F tagger= -F parser= http://127.0.0.1:8080/process
To get CoNLL-U back, use this:
-
curl -F data=@data/text.txt -F tokenizer= -F tagger= -F parser= http://127.0.0.1:8080/process | PYTHONIOENCODING=utf-8 python -c "import sys,json; sys.stdout.write(json.load(sys.stdin)['result'])"`
Universal Dependencies includes quite many languages and each of them have training files. All of them can be used to build model for the UDPipe REST server.
Many models can be included in single REST server Docker image. See UDPipe docs how to start server with many models and change Docker file accordingly.
Pre-existing models can be also used. You can find some models from UDPipe web site licensed under the CC-BY-SA.
Everything in this repo, including all code is "AS IS". No support, no warranty, no fitness for any purpose, nothing is expressed or implied, not by me (nor my employer).