Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 6, 2012

Using your Lucene index as input to your Mahout job – Part I

Filed under: Clustering,Collocation,Lucene,Mahout — Patrick Durusau @ 8:08 pm

Using your Lucene index as input to your Mahout job – Part I

From the post:

This blog shows you how to use an upcoming Mahout feature, the lucene2seq program or https://issues.apache.org/jira/browse/MAHOUT-944. This program reads the contents of stored fields in your Lucene index and converts them into text sequence files, to be used by a Mahout text clustering job. The tool contains both a sequential and MapReduce implementation and can be run from the command line or from Java using a bean configuration object. In this blog I demonstrate how to use the sequential version on an index of Wikipedia.

Access to original text can help with improving clustering results. See the blog post for details.

1 Comment

  1. […] background-color:#222222; background-repeat : repeat; } tm.durusau.net – Today, 8:13 […]

    Pingback by Using your Lucene index as input to your Mahout job – Part I « Another Word For It | Hadoop and Mahout | Scoop.it — March 8, 2012 @ 8:13 am

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress