Skip to content

lapolonio/text_classification_tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Enterprise Text Classification using BERT Tutorial

Setup:

Step 0:

  • Explore BERT Repo to understand steps

Step 1: Add code to process data

  • Download data and inspect format
  • Modify BERT Repo for training needs
  • Create google cloud bucket for model checkpoint saving
  • Give account read and write access to bucket

Step 2: Add code to support export

  • Add export flags
  • Write serving_input_fn

Step 3: Build, Examine, Export Model

  • Train model
import sys
!test -d bert_repo || git clone https://github.com/lapolonio/text_classification_tutorial bert_repo
if not 'bert_repo' in sys.path:
  sys.path += ['bert_repo/step_3/bert']


%%bash

export BERT_BASE_DIR=gs://bert_model_demo/uncased_L-12_H-768_A-12
export IMDB_DIR=NOT_USED
export TPU_NAME=grpc://10.92.118.162:8470
export OUTPUT_DIR=gs://bert_model_demo/imdb_v1/output/
export EXPORT_DIR=gs://bert_model_demo/imdb_v1/export/

python bert_repo/step_3/bert/run_classifier.py \
  --task_name=IMDB \
  --do_train=true \
  --do_predict=true \
  --data_dir=$IMDB_DIR \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
  --max_seq_length=128 \
  --train_batch_size=32 \
  --learning_rate=2e-5 \
  --num_train_epochs=3.0 \
  --output_dir=$OUTPUT_DIR \
  --use_tpu=True \
  --tpu_name=$TPU_NAME \
  --do_serve=true \
  --export_dir=$EXPORT_DIR
  • Export built model to bucket
  • Test exported model
!saved_model_cli show --dir gs://bert_model_demo/imdb_v1/export/1567569486 --tag_set serve --signature_def serving_default

# Execute model to understand model input format
# Batch 1
!saved_model_cli run --dir gs://bert_model_demo/imdb_v1/export/1567569486 --tag_set serve --signature_def serving_default \
--input_examples 'examples=[{"input_ids":np.zeros((128), dtype=int).tolist(),"input_mask":np.zeros((128), dtype=int).tolist(),"label_ids":[0],"segment_ids":np.zeros((128), dtype=int).tolist()}]'

# Batch 3
!saved_model_cli run --dir gs://bert_model_demo/imdb_v1/export/1567569486 --tag_set serve --signature_def serving_default \
--input_examples 'examples=[{"input_ids":np.zeros((128), dtype=int).tolist(),"input_mask":np.zeros((128), dtype=int).tolist(),"label_ids":[0],"segment_ids":np.zeros((128), dtype=int).tolist()},{"input_ids":np.zeros((128), dtype=int).tolist(),"input_mask":np.zeros((128), dtype=int).tolist(),"label_ids":[0],"segment_ids":np.zeros((128), dtype=int).tolist()},{"input_ids":np.zeros((128), dtype=int).tolist(),"input_mask":np.zeros((128), dtype=int).tolist(),"label_ids":[0],"segment_ids":np.zeros((128), dtype=int).tolist()}]'

##Step 4: Create serving image and test with python client

  • Create k8s cluster
    • make_cluster.sh
  • Copy model to persistent volume claim
    • copy_model_to_k8s.sh
  • Create tensorflow_serving deployment with model
    • model-chart.yml
  • Deploy to k8s
    • deploy_model.sh
  • Test Server Service
    • kubectl run curler -it --rm --image=pstauffer/curl --restart=Never -- sh
      • curl imdb-sentiment-service:8501/v1/models/bert
  • Create python client to call tf serving image
    • run_app.py
  • Download needed assets
    • prepare_bert_client.sh
  • Run code locally
    • Login
      • gcloud auth login
    • Set kube context
      • gcloud container clusters get-credentials bert-cluster --zone us-central1-b --project analog-subset-256304
    • Port forward service
      • kubectl port-forward agnews-server-5cc95c9456-fmmx5 8501:8501 8500:8500
    • Run app locally
mkdir ~/models
gsutil cp -r  $MODEL_LOCATION ~/models
export MODEL_NAME=bert
docker run -p 8500:8500 -p 8501:8501 \
--mount type=bind,source=/home/leo/models,target=/models/$MODEL_NAME \
-e MODEL_NAME=$MODEL_NAME --name bert_serving -t tensorflow/serving:latest &
cd bert
python3 -m pip install --user pipenv
python3 -m pipenv install
APP_CONFIG_FILE=config/development.py python3 -m pipenv run python run_app.py

curl -X POST \
  http://localhost:5000/ \
  -H 'Content-Type: application/json' \
  -d '{"sentences":["Tainted look at kibbutz life<br /><br />This film is less a cultural story about a boy'\''s life in a kibbutz, but the deliberate demonization of kibbutz life in general. In the first two minutes of the movie, the milk man in charge of the cows rapes one of his calves. And it'\''s all downhill from there in terms of the characters representing typical '\''kibbutznikim'\''. Besides the two main characters, a clinically depressed woman and her young son, every one else in the kibbutz is a gross caricature of well�evil.", 
"A great story a young Aussie bloke travels to england to claim his inheritance and meets up with his mates, who are just as loveable and innocent as he is.",
"i hate the movie it was racist",
"i loved the movie it was inspiring."]}'

##Step 5: Create client image and deploy

  • Create container for client
    • make_client_container.sh
  • Create k8s deployment
    • client-chart.yml
  • Deploy to k8s cluster
    • deploy_app.sh
  • Test deployment

Resources

Results on Text Classification from https://github.com/zihangdai/xlnet

Model IMDB Yelp-2 Yelp-5 DBpedia Amazon-2 Amazon-5
BERT-Large 4.51 1.89 29.32 0.64 2.63 34.17
XLNet-Large 3.79 1.55 27.80 0.62 2.40 32.26

The above numbers are error rates.

Links

Bert Model Theory, Explanation and Research

Bert/Transformer Model Implementations

Kubernetes Resources

Machine Learning Engineer Resources

Performance

Tutorials

Next Steps

About

Enterprise Solution for Text Classification (using BERT)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published