GitHub - musatarar/HMMTagger: An implementation of the Natural Language Processing Viterbi algorithm that tags parts of speech (noun, adjective, verb, etc.) of words in text files using probability. Currently in progress

musatarar / HMMTagger Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

An implementation of the Natural Language Processing Viterbi algorithm that tags parts of speech (noun, adjective, verb, etc.) of words in text files using probability. Currently in progress

0 stars 0 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Veterbi HMM POS Tagger.py		Veterbi HMM POS Tagger.py
WSJ_02-21.pos		WSJ_02-21.pos
WSJ_23.words		WSJ_23.words
readme.txt		readme.txt

Repository files navigation

This is a very simple program to run:
1. Ensure "WSJ_02-21.pos" is in the folder to use as a training corpus.
2. Open “mn2332_HMMTrainerAndTagger_HW3.py”.
3. Input the [FILENAME].words file that you want tagged.
4. The tagged results will be placed into a file called [FILENAME].pos. 


I used a very simple implementation for OOV. 
They are tagged as OOV and the previous tag percentage for the next word is automatically set as 1/1000. 


Everything else is very self explanatory. 
This implementation uses two dimensional dictionaries to create likelihood tables.

WordList is a dictionary of words; 
each word key’s value is a dictionary of POS tags whose values are their likelihood of being the tag for that word.

POSList is a dictionary of POS tags;
each tag key’s value is a dictionary of POS tags whose values are their likelihood of being the previous tag
for that specific tag key. 

These two tables are used via the Vertebi Algorithm to probabilistically find the 
highest likelihood POS tag for each word.

About

An implementation of the Natural Language Processing Viterbi algorithm that tags parts of speech (noun, adjective, verb, etc.) of words in text files using probability. Currently in progress

Readme

Activity

0 stars

1 watching

0 forks

Report repository