Skip to content

Collection of tutorials on text analytics/NLP, including vector space models, neural language models and topic models on the Pivotal MPP platform (Greenplum/HAWQ).

Notifications You must be signed in to change notification settings

vatsan/text_analytics_on_mpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Text analytics on MPP

Collection of tutorials on text analytics/NLP, including vector space models, neural language models and topic models on the Pivotal MPP platform (Greenplum/HAWQ).

Vector space models

1. Tokenization, stemming, unigrams, bigrams, trigram and skipgrams generation. 
2. Bag-of-words model for classification on 20-news-groups dataset.
3. tf-idf weighting for classification on 20-news-groups dataset.
4. Feature hashing for classification on 20-news-groups dataset.
5. Grid search on model parameters for Elastic Net on the  tf-idf representation

Topic models

1. LDA topic models on 20-news-groups dataset.
2. Grid search for LDA hyperparameters, on the 20-news-groups dataset.

Neural Language Models (Paragraph Vectors, Word2Vec etc.)

1. Classification models using Paragraph vector representation of 20-news-groups dataset using `doc2vec` package in `gensim`.

Dependencies

These exercises have the following client and server side dependencies:

  1. Client side: We encourage you to install Anaconda Python for your Jupyter Notebooks. The notebooks in these exercises use matplotlib and seaborn for data visualization, pandas and psycopg2 to query the backend database.
  2. Server side: On the server side, you'll need to install sklearn (and its dependencies).

Note

These notebooks have been uploaded only to show code snippets, it is not meant to be a complete tutorial as is a narration that accompanies these exercises.

About

Collection of tutorials on text analytics/NLP, including vector space models, neural language models and topic models on the Pivotal MPP platform (Greenplum/HAWQ).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published