Skip to content

Latest commit

 

History

History
83 lines (60 loc) · 3.36 KB

pyLDAvis.md

File metadata and controls

83 lines (60 loc) · 3.36 KB

##The DIY Guide to pyLDAvis

Analytics

Please make Pull Requests for good resources, or create Issues for any feedback! Thanks!


pyLDAvis logo

###Table Of Contents


###One Minute Guide

LDAvis helps you interpret LDA results by answer 3 questions:

  1. What is the meaning of each topic?
  2. How prevalent is each topic?
  3. How do topics relate to each other?

demo button

Installation

pip install pyLDAvis

###Hello World Just a simple code-based intro, theory is covered in the next section

#####Firing up a notebook

#####Train a quick LDA model

  • LDAvis is framework agnostic, meaning that we can use any library in python or R
  • Gensim - Setup, then run LdaModel(corpus, num_topics=3), Docs
  • Scikit-learn - Example, Docs
  • GraphLab - Example
  • LDA in R - Example

#####Enable Notebook

  • pyLDAvis.enable_notebook()

#####Prepare LDAvis

  • pyLDAvis uses the prepare method to load the LDA models
  • Different libraries requires different variations of the prepare method
  • Gensim - prepare(model, corpus, dictionary), Source
  • Scikit-learn - prepare(documents, vectorizer, model), Source
  • GraphLab - prepare(model, documents), Source
  • LDA in R - prepare(*args), Example
  • And voila! A beautiful dashboard!

#####Interpreting LDAvis

  • LDAvis tries to answer 3 important questions
    • What is the meaning of each topic?
      • The blue denotes overall term frequency and the red denotes term frequency within topic
      • To understand the lambda knob, see Topic Composition
    • How prevalent is each topic? The larger the area, the more prevalent the topic
    • How do topics relate to each other? The larger the overlap between two circles, the closer the topics

###Theory

#####LDA Intro

#####Topic Composition

  • Paper (see right module)
  • Left lambda = 0 means that you value how exclusive a word is to a topic
    • words are purely ranked on P(word | topic)
  • Right lambda = 1 means that you value how probable a word is to appear in a topic
    • words are purely ranked on lift P(word | topic) / P(word)
  • The ranking formula is lambda * P(word | topic) + (1 - lambda) * lift (see paper section 3.1)

###Super Short Feedback Survey (Pretty please!)