Enterprise Text Classification using BERT Tutorial

Setup:

Download Intellij Community (for diffing folders): https://www.jetbrains.com/idea/download/
Slides https://docs.google.com/presentation/d/e/2PACX-1vRShMiEX4u9yawt4kA7Nmq2E1o3eCI4yYtV4WRq8Wg2qGH_RJYP3PxqPbEjTcJ8PifCLtI0I8lxmqJw/pub?start=true&loop=true&delayms=10000
Slack https://join.slack.com/t/ninja-pirate-wizard/shared_invite/enQtODYzOTc1NjI4MTk5LTA2MWFjNzcyMGE2OTZkZjllOTBiODFhODQ5MzNjM2JkNDAyNmVkYTQyYjZmYmMzMWExYjUzYTIzZmEyMzc2MDQ

Step 0:

Explore BERT Repo to understand steps

Step 1: Add code to process data

Download data and inspect format
Modify BERT Repo for training needs
Create google cloud bucket for model checkpoint saving
Give account read and write access to bucket

Step 2: Add code to support export

Add export flags
Write serving_input_fn

Step 3: Build, Examine, Export Model

Train model

import sys
!test -d bert_repo || git clone https://github.com/lapolonio/text_classification_tutorial bert_repo
if not 'bert_repo' in sys.path:
  sys.path += ['bert_repo/step_3/bert']


%%bash

export BERT_BASE_DIR=gs://bert_model_demo/uncased_L-12_H-768_A-12
export IMDB_DIR=NOT_USED
export TPU_NAME=grpc://10.92.118.162:8470
export OUTPUT_DIR=gs://bert_model_demo/imdb_v1/output/
export EXPORT_DIR=gs://bert_model_demo/imdb_v1/export/

python bert_repo/step_3/bert/run_classifier.py \
  --task_name=IMDB \
  --do_train=true \
  --do_predict=true \
  --data_dir=$IMDB_DIR \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
  --max_seq_length=128 \
  --train_batch_size=32 \
  --learning_rate=2e-5 \
  --num_train_epochs=3.0 \
  --output_dir=$OUTPUT_DIR \
  --use_tpu=True \
  --tpu_name=$TPU_NAME \
  --do_serve=true \
  --export_dir=$EXPORT_DIR

Export built model to bucket
Test exported model

!saved_model_cli show --dir gs://bert_model_demo/imdb_v1/export/1567569486 --tag_set serve --signature_def serving_default

# Execute model to understand model input format
# Batch 1
!saved_model_cli run --dir gs://bert_model_demo/imdb_v1/export/1567569486 --tag_set serve --signature_def serving_default \
--input_examples 'examples=[{"input_ids":np.zeros((128), dtype=int).tolist(),"input_mask":np.zeros((128), dtype=int).tolist(),"label_ids":[0],"segment_ids":np.zeros((128), dtype=int).tolist()}]'

# Batch 3
!saved_model_cli run --dir gs://bert_model_demo/imdb_v1/export/1567569486 --tag_set serve --signature_def serving_default \
--input_examples 'examples=[{"input_ids":np.zeros((128), dtype=int).tolist(),"input_mask":np.zeros((128), dtype=int).tolist(),"label_ids":[0],"segment_ids":np.zeros((128), dtype=int).tolist()},{"input_ids":np.zeros((128), dtype=int).tolist(),"input_mask":np.zeros((128), dtype=int).tolist(),"label_ids":[0],"segment_ids":np.zeros((128), dtype=int).tolist()},{"input_ids":np.zeros((128), dtype=int).tolist(),"input_mask":np.zeros((128), dtype=int).tolist(),"label_ids":[0],"segment_ids":np.zeros((128), dtype=int).tolist()}]'

##Step 4: Create serving image and test with python client

Create k8s cluster
- make_cluster.sh
Copy model to persistent volume claim
- copy_model_to_k8s.sh
Create tensorflow_serving deployment with model
- model-chart.yml
Deploy to k8s
- deploy_model.sh
Test Server Service
- kubectl run curler -it --rm --image=pstauffer/curl --restart=Never -- sh
  - curl imdb-sentiment-service:8501/v1/models/bert
Create python client to call tf serving image
- run_app.py
Download needed assets
- prepare_bert_client.sh
Run code locally
- Login
  - gcloud auth login
- Set kube context
  - gcloud container clusters get-credentials bert-cluster --zone us-central1-b --project analog-subset-256304
- Port forward service
  - kubectl port-forward agnews-server-5cc95c9456-fmmx5 8501:8501 8500:8500
- Run app locally

mkdir ~/models
gsutil cp -r  $MODEL_LOCATION ~/models
export MODEL_NAME=bert
docker run -p 8500:8500 -p 8501:8501 \
--mount type=bind,source=/home/leo/models,target=/models/$MODEL_NAME \
-e MODEL_NAME=$MODEL_NAME --name bert_serving -t tensorflow/serving:latest &
cd bert
python3 -m pip install --user pipenv
python3 -m pipenv install
APP_CONFIG_FILE=config/development.py python3 -m pipenv run python run_app.py

curl -X POST \
  http://localhost:5000/ \
  -H 'Content-Type: application/json' \
  -d '{"sentences":["Tainted look at kibbutz life<br /><br />This film is less a cultural story about a boy'\''s life in a kibbutz, but the deliberate demonization of kibbutz life in general. In the first two minutes of the movie, the milk man in charge of the cows rapes one of his calves. And it'\''s all downhill from there in terms of the characters representing typical '\''kibbutznikim'\''. Besides the two main characters, a clinically depressed woman and her young son, every one else in the kibbutz is a gross caricature of well�evil.", 
"A great story a young Aussie bloke travels to england to claim his inheritance and meets up with his mates, who are just as loveable and innocent as he is.",
"i hate the movie it was racist",
"i loved the movie it was inspiring."]}'

##Step 5: Create client image and deploy

Create container for client
- make_client_container.sh
Create k8s deployment
- client-chart.yml
Deploy to k8s cluster
- deploy_app.sh
Test deployment

Resources

Results on Text Classification from https://github.com/zihangdai/xlnet

Model	IMDB	Yelp-2	Yelp-5	DBpedia	Amazon-2	Amazon-5
BERT-Large	4.51	1.89	29.32	0.64	2.63	34.17
XLNet-Large	3.79	1.55	27.80	0.62	2.40	32.26

The above numbers are error rates.

Links

Bert Model Theory, Explanation and Research

Evolution of Representations in the Transformer https://lena-voita.github.io/posts/emnlp19_evolution.html
A survyer of Pre-trained Languge Model literature https://github.com/thunlp/PLMpapers
The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) http://jalammar.github.io/illustrated-bert/
Has BERT Been Cheating? Researchers Say it Exploits ‘Spurious Statistical Cues’ https://medium.com/syncedreview/has-bert-been-cheating-researchers-say-it-exploits-spurious-statistical-cues-b256760ded57

Bert/Transformer Model Implementations

Original Bert Repo https://github.com/google-research/bert
Multiple Transformer Model Architecture Implementations in Tensorflow and Pytorch https://github.com/huggingface/transformers

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
step_0/bert		step_0/bert
step_1/bert		step_1/bert
step_2/bert		step_2/bert
step_3		step_3
step_4		step_4
step_5		step_5
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enterprise Text Classification using BERT Tutorial

Setup:

Step 0:

Step 1: Add code to process data

Step 2: Add code to support export

Step 3: Build, Examine, Export Model

Resources

Results on Text Classification from https://github.com/zihangdai/xlnet

Links

Bert Model Theory, Explanation and Research

Bert/Transformer Model Implementations

Kubernetes Resources

Machine Learning Engineer Resources

Performance

Tutorials

Next Steps

About

Releases

Packages

Languages

lapolonio/text_classification_tutorial

Folders and files

Latest commit

History

Repository files navigation

Enterprise Text Classification using BERT Tutorial

Setup:

Step 0:

Step 1: Add code to process data

Step 2: Add code to support export

Step 3: Build, Examine, Export Model

Resources

Results on Text Classification from https://github.com/zihangdai/xlnet

Links

Bert Model Theory, Explanation and Research

Bert/Transformer Model Implementations

Kubernetes Resources

Machine Learning Engineer Resources

Performance

Tutorials

Next Steps

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages