Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 1, 2013

Topic Discovery With Apache Pig and Mallet

Filed under: Latent Dirichlet Allocation (LDA),MALLET,Pig — Patrick Durusau @ 8:07 pm

Topic Discovery With Apache Pig and Mallet

Only one of two posts from this blog in 2012 but it is a useful one.

From the post:

A common desire when working with natural language is topic discovery. That is, given a set of documents (eg. tweets, blog posts, emails) you would like to discover the topics inherent in those documents. Often this method is used to summarize a large corpus of text so it can be quickly understood what that text is ‘about’. You can go further and use topic discovery as a way to classify new documents or to group and organize the documents you’ve done topic discovery on.

Walks through the use of Pig and Mallet on a newsgroup data set.

I have been thinking about getting one of those unlimited download newsgroup accounts.

Maybe I need to go ahead and start building some newsgroup data sets.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress