Skip to content

Word Resources

Cooper Pellaton edited this page Jul 1, 2016 · 2 revisions

The following is a list of resources that can be used to create the database of words that EngLang will used to parse human readable text:

General Sources

dictionary (https://github.com/adambom/dictionary): A JSON representation of Webster's Unabridged Dictionary.

Roget's Thesaurus (http://www.nzdl.org/ELKB/index.html#download): This is an Electronic Lexical Knowledge Base ELKB for synonyms and semantic distance. Although most of the parsers and AI processors are written in Java, the site includes many files / resources that may be of use. Also includes stopwords, morphological processing rules, and alternative spellings.

WordNet (ftp://www.ai.mit.edu/people/naha/WordNet/WordNet.html): A large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. WordNet's structure makes it a useful tool for computational linguistics and natural language processing. There is also a CommonLisp interface for this library.

Aspell (http://app.aspell.net/create): Generates a list of words from settings such as region, frequency, etc. Includes a useful Hunspell generator that gives words and their variations.

Chat Acronyms (http://www.netlingo.com/acronyms.php): A list of chat acronyms that may be useful.

UNIX Words: On Unix-based systems, you can generate a list of words using the command cat /usr/share/dict/words > words.txt. This is included in the assets folder automatically.

Clone this wiki locally