Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 11, 2010

Decomposer

Filed under: Matrix,Search Engines,Vectors — Patrick Durusau @ 1:19 pm

Decomposer

From the website:

Matrix algebra underpins the way many Big Data algorithms and data structures are composed: full-text search can be viewed as doing matrix multiplication of the term-document matrix by the query vector (giving a vector over documents where the components are the relevance score), computing co-occurrences in a collaborative filtering context (people who viewed X also viewed Y, or ratings-based CF like the Netflix Prize contest) is taking the squaring the user-item interation matrix, calculating users who are k-degrees separated from each other in a social network or web-graph can be found by looking at the k-fold product of the graph adjacency matrix, and the list goes on (and these are all cases where the linear structure of the matrix is preserved!)
….
Currently implemented: Singular Value Decomposition using the Asymmetric Generalized Hebbian Algorithm outlined in Genevieve Gorrell & Brandyn Webb’s paper and there is a Lanczos implementation, both single-threaded, and in the contrib/hadoop subdirectory, as a hadoop map-reduce (series of) job(s). Coming soon: stochastic decomposition.

This code is in the process of being absorbed into the Apache Mahout Machine Learning Project.

Useful in learning to use search technology but also for recognizing at a very fundamental level, the limitations of that technology.

Document and query vectors are constructed without regard to the semantics of their components.

Using co-occurrence, for example, doesn’t give a search engine greater access to the semantics of the terms in question.

It simply makes the vector longer and so matches are less frequent and hopefully, less frequent = more precise.

That may or may not be the case. It also doesn’t account for case where the vectors are different but the subject in question is the same.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress