Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 16, 2011

Latent Dirichlet Allocation in C

Filed under: Latent Dirichlet Allocation (LDA) — Patrick Durusau @ 3:13 pm

Latent Dirichlet Allocation in C

From the website:

This is a C implementation of variational EM for latent Dirichlet allocation (LDA), a topic model for text or other discrete data. LDA allows you to analyze of corpus, and extract the topics that combined to form its documents. For example, click here to see the topics estimated from a small corpus of Associated Press documents. LDA is fully described in Blei et al. (2003) .

This code contains:

  • an implementation of variational inference for the per-document topic proportions and per-word topic assignments
  • a variational EM procedure for estimating the topics and exchangeable Dirichlet hyperparameter

Do be aware that the use of topic in this technique and papers discussing it is not the same thing as topic as defined by ISO 13250-2.

It comes closer to the notion of subject as defined in ISO 13250-2.

Update:

I was sent a pointer to David M. Blei’s
http://www.cs.princeton.edu/~blei/topicmodeling.html, which has more code and other goodies.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress