Latent Dirichlet Allocation in C
From the website:
This is a C implementation of variational EM for latent Dirichlet allocation (LDA), a topic model for text or other discrete data. LDA allows you to analyze of corpus, and extract the topics that combined to form its documents. For example, click here to see the topics estimated from a small corpus of Associated Press documents. LDA is fully described in Blei et al. (2003) .
This code contains:
- an implementation of variational inference for the per-document topic proportions and per-word topic assignments
- a variational EM procedure for estimating the topics and exchangeable Dirichlet hyperparameter
Do be aware that the use of topic in this technique and papers discussing it is not the same thing as topic as defined by ISO 13250-2.
It comes closer to the notion of subject as defined in ISO 13250-2.
Update:
I was sent a pointer to David M. Blei’s
http://www.cs.princeton.edu/~blei/topicmodeling.html, which has more code and other goodies.