Skip to content

Latest commit

 

History

History
102 lines (62 loc) · 2.98 KB

README.md

File metadata and controls

102 lines (62 loc) · 2.98 KB

MORpheme SEgment-er

Multi-language full pipeline of MORpheme Segementor based on [1]

Requirement

System Requirement

Linux enviorment with

  1. CUDA enabled GPU
  2. Recommended 32GB or more of RAM

Dependencies

How to use

Training

Run python main.py with the following arguments. Values from default-config.json will be used if argument is not given.

train.py
  FASTTEXT MODEL mode(default mode):
  -l <input language>       Language to run Morse Segmentation
  
  External MODEL Mode(when <external mode> is used)
  -e <external model dir>   Directory of the external model
  -t <model type>           model to load the external model (<fasttext> or <word2vec>)
  
  General Configuations:
  -b <batch size>           Number of words to segment from model( -1 for full dataset)
  -p <partition size>       Number of words to group as a partition( -1 for no partition)
  
  -m  <external mode>       <True> or <False> value if external model is going to be used
  
  Output Directories:
  -s <ss, score directory>  Output directory for the  support set and scores
  -o <model output dir>     Output directory for the model
  
  PREFIX Rules:
  --pw <base word>           Minimum length of a word
  --pe <edit distance>       Maximum edit distance a word can have beween another word in a SS
  
  SUFFIX Rules:
  --sw <base word>           Minimum length of a word
  --se <edit distance>       Maximum edit distance a word can have beween another word in a SS

example

python train.py -l korean -b 500000 -s korean_500k -o korean_model --pw 1 --pe 2 --sw 1 --se 3

Output

  • Output Model in model output directory

  • 10 types of files in ss score directory.

    • [PRE + SUF]_ss[0-9] - Contains Support sets
    • [PRE + SUF]_w_sem[0-9], [PRE + SUF]_loc_sem[0-9], [PRE + SUF]_r_sem[0-9] and [PRE + SUF]_r_orth[0-9] - Contains Scores calculated described in [1]

Notes


Inference

run python MORSE.py with the arguments [model_dir,input.txt,output.txt]

example

python MORSE.py ../english_output input.txt output.txt

Reference

[1] Tarek Sakakini, Suma Bhat, Pramod Viswanath, MORSE: Semantic-ally Drive-n MORpheme SEgment-er

Authors

Supervised By