Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

June 18, 2014

Topics and xkcd Comics

Filed under: Latent Dirichlet Allocation (LDA),Statistics,Topic Models (LDA) — Patrick Durusau @ 9:01 am

Finding structure in xkcd comics with Latent Dirichlet Allocation by Carson Sievert.

From the post:

xkcd is self-proclaimed as “a webcomic of romance, sarcasm, math, and language”. There was a recent effort to quantify whether or not these “topics” agree with topics derived from the xkcd text corpus using Latent Dirichlet Allocation (LDA). That analysis makes the all too common folly of choosing an arbitrary number of topics. Maybe xkcd’s tagline does provide a strong prior belief of a small number of topics, but here we take a more objective approach and let the data choose the number of topics. An “optimal” number of topics is found using the Bayesian model selection approach (with uniform prior belief on the number of topics) suggested by Griffiths and Steyvers (2004). After an optimal number is decided, topic interpretations and trends over time are explored.

Great interactive visualization, code for extracting data for xkcd comics, exploring “keywords that are most ‘relevant’ or ‘informative’ to a given topic’s meaning.”

Easy to see this post forming the basis for several sessions on LDA, starting with extracting the data, exploring the choices that influence the results and then visualizing the results of analysis.

Enjoy!

I first saw this in a tweet by Zoltan Varju.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress