Learning Topic Models by Belief Propagation by Jia Zeng, William K. Cheung, and Jiming Liu.
Abstract:
Latent Dirichlet allocation (LDA) is an important class of hierarchical Bayesian models for probabilistic topic modeling, which attracts worldwide interests and touches many important applications in text mining, computer vision and computational biology. This paper proposes a novel tree-structured factor graph representation for LDA within the Markov random field (MRF) framework, which enables the classic belief propagation (BP) algorithm for exact inference and parameter estimation. Although two commonly-used approximation inference methods, such as variational Bayes (VB) and collapsed Gibbs sampling (GS), have gained great successes in learning LDA, the proposed BP is competitive in both speed and accuracy validated by encouraging experimental results on four large-scale document data sets. Furthermore, the BP algorithm has the potential to become a generic learning scheme for variants of LDA-based topic models. To this end, we show how to learn two typical variants of LDA-based topic models, such as author-topic models (ATM) and relational topic models (RTM), using belief propagation based on the factor graph representation.
I have just started reading this paper but wanted to bring it to your attention. I peeked at the results and it looks quite promising.
This work was tested against the following data sets:
1) CORA [30] contains abstracts from the CORA research paper search engine in machine learning area, where the documents can be classified into 7 major categories.
2) MEDL [31] contains abstracts from the MEDLINE biomedical paper search engine, where the documents fall broadly into 4 categories.
3) NIPS [32] includes papers from the conference “Neural Information Processing Systems”, where all papers are grouped into 13 categories. NIPS has no citation link information.
4) BLOG [33] contains a collection of political blogs on the subject of American politics in the year 2008. where all blogs can be broadly classified into 6 categories. BLOG has no author information.
with positive results.