Beginners Guide To Enhancing Solr/Lucene Search With Mahout’s Machine Learning by Doug Turnbull.
From the post:
Yesterday, John and I gave a talk to the DC Hadoop Users Group about using Mahout with Solr to perform Latent Semantic Indexing — calculating and exploiting the semantic relationships between keywords. While we were there, I realized, a lot of people could benefit from a bigger picture, less in-depth, point of view outside of our specific story. In general where do Mahout and Solr fit together? What does that relationship look like, and how does one exploit Mahout to make search even more awesome? So I thought I’d blog about how you too get start to put these pieces together to simultaneously exploit Solr’s search and Mahout’s machine learning capabilities.
The root of how this all works is with a slightly obscure feature of Lucene based search — Term Vectors. Lucene based search applications give you the ability to generate term vectors from documents in the search index. Its a feature often turned on for specific search features, but other than that can appear to be a weird opaque feature to beginners. What is a term vector, you might ask? And why would you want to get one?
You know my misgivings about metric approaches to non-metric data (such as semantics) but there is no denying that Latent Semantic Indexing can be useful.
Think of Latent Semantic Indexing as a useful tool.
A saw is a tool too but not every cut made with a saw is a correct one.
Yes?