##The DIY Guide to pyLDAvis
Please make Pull Requests for good resources, or create Issues for any feedback! Thanks!
###Table Of Contents
###One Minute Guide
LDAvis helps you interpret LDA results by answer 3 questions:
- What is the meaning of each topic?
- How prevalent is each topic?
- How do topics relate to each other?
pip install pyLDAvis
###Hello World Just a simple code-based intro, theory is covered in the next section
#####Firing up a notebook
- The UI is available as a notebook or HTML, let's fire up a notebook
- Install Jupyter, Running a notebook
#####Train a quick LDA model
- LDAvis is framework agnostic, meaning that we can use any library in python or R
- Gensim - Setup, then run
LdaModel(corpus, num_topics=3)
, Docs - Scikit-learn - Example, Docs
- GraphLab - Example
- LDA in R - Example
#####Enable Notebook
pyLDAvis.enable_notebook()
#####Prepare LDAvis
- pyLDAvis uses the
prepare
method to load the LDA models - Different libraries requires different variations of the
prepare
method - Gensim -
prepare(model, corpus, dictionary)
, Source - Scikit-learn -
prepare(documents, vectorizer, model)
, Source - GraphLab -
prepare(model, documents)
, Source - LDA in R -
prepare(*args)
, Example - And voila! A beautiful dashboard!
#####Interpreting LDAvis
- LDAvis tries to answer 3 important questions
- What is the meaning of each topic?
- The blue denotes overall term frequency and the red denotes term frequency within topic
- To understand the lambda knob, see Topic Composition
- How prevalent is each topic? The larger the area, the more prevalent the topic
- How do topics relate to each other? The larger the overlap between two circles, the closer the topics
- What is the meaning of each topic?
###Theory
#####LDA Intro
#####Topic Composition
- Paper (see right module)
- Left
lambda = 0
means that you value how exclusive a word is to a topic- words are purely ranked on
P(word | topic)
- words are purely ranked on
- Right
lambda = 1
means that you value how probable a word is to appear in a topic- words are purely ranked on lift
P(word | topic) / P(word)
- words are purely ranked on lift
- The ranking formula is
lambda * P(word | topic) + (1 - lambda) * lift
(see paper section 3.1)
###Super Short Feedback Survey (Pretty please!)