Using your Lucene index as input to your Mahout job – Part I
From the post:
This blog shows you how to use an upcoming Mahout feature, the lucene2seq program or https://issues.apache.org/jira/browse/MAHOUT-944. This program reads the contents of stored fields in your Lucene index and converts them into text sequence files, to be used by a Mahout text clustering job. The tool contains both a sequential and MapReduce implementation and can be run from the command line or from Java using a bean configuration object. In this blog I demonstrate how to use the sequential version on an index of Wikipedia.
Access to original text can help with improving clustering results. See the blog post for details.
[…] background-color:#222222; background-repeat : repeat; } tm.durusau.net – Today, 8:13 […]
Pingback by Using your Lucene index as input to your Mahout job – Part I « Another Word For It | Hadoop and Mahout | Scoop.it — March 8, 2012 @ 8:13 am