Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 18, 2015

Open Source Tensor Libraries For Data Science

Filed under: Data Science,Mathematics,Open Source,Programming — Patrick Durusau @ 5:20 pm

Let’s build open source tensor libraries for data science by Ben Lorica.

From the post:

Data scientists frequently find themselves dealing with high-dimensional feature spaces. As an example, text mining usually involves vocabularies comprised of 10,000+ different words. Many analytic problems involve linear algebra, particularly 2D matrix factorization techniques, for which several open source implementations are available. Anyone working on implementing machine learning algorithms ends up needing a good library for matrix analysis and operations.

But why stop at 2D representations? In a recent Strata + Hadoop World San Jose presentation, UC Irvine professor Anima Anandkumar described how techniques developed for higher-dimensional arrays can be applied to machine learning. Tensors are generalizations of matrices that let you look beyond pairwise relationships to higher-dimensional models (a matrix is a second-order tensor). For instance, one can examine patterns between any three (or more) dimensions in data sets. In a text mining application, this leads to models that incorporate the co-occurrence of three or more words, and in social networks, you can use tensors to encode arbitrary degrees of influence (e.g., “friend of friend of friend” of a user).

In case you are interested, Wikipedia has a list of software packages for tensor analaysis.

Not mentioned by Wikipedia: Facebook open sourcing TH++ last year, a library for tensor analysis. Along with fblualibz, which includes a bridge between Python and Lua (for running tensor analysis).

Uni10 wasn’t mentioned by Wikipedia either.

Good starting place: Big Tensor Mining, Carnegie Mellon Database Group.

Suggest you join an existing effort before you start duplicating existing work.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress