From the website:
Matrix algebra underpins the way many Big Data algorithms and data structures are composed: full-text search can be viewed as doing matrix multiplication of the term-document matrix by the query vector (giving a vector over documents where the components are the relevance score), computing co-occurrences in a collaborative filtering context (people who viewed X also viewed Y, or ratings-based CF like the Netflix Prize contest) is taking the squaring the user-item interation matrix, calculating users who are k-degrees separated from each other in a social network or web-graph can be found by looking at the k-fold product of the graph adjacency matrix, and the list goes on (and these are all cases where the linear structure of the matrix is preserved!)
….
Currently implemented: Singular Value Decomposition using the Asymmetric Generalized Hebbian Algorithm outlined in Genevieve Gorrell & Brandyn Webb’s paper and there is a Lanczos implementation, both single-threaded, and in the contrib/hadoop subdirectory, as a hadoop map-reduce (series of) job(s). Coming soon: stochastic decomposition.This code is in the process of being absorbed into the Apache Mahout Machine Learning Project.
Useful in learning to use search technology but also for recognizing at a very fundamental level, the limitations of that technology.
Document and query vectors are constructed without regard to the semantics of their components.
Using co-occurrence, for example, doesn’t give a search engine greater access to the semantics of the terms in question.
It simply makes the vector longer and so matches are less frequent and hopefully, less frequent = more precise.
That may or may not be the case. It also doesn’t account for case where the vectors are different but the subject in question is the same.