Archive for the ‘Feature Spaces’ Category

“Almost there….” (Computing Homology)

Friday, April 12th, 2013

We all remember the pilot in Star Wars that kept saying, “Almost there….” Jeremy Kun has us “almost there…” in his latest installment: Computing Homology.

To give you some encouragement, Jeremy concludes the post saying:

The reader may be curious as to why we didn’t come up with a more full-bodied representation of a simplicial complex and write an algorithm which accepts a simplicial complex and computes all of its homology groups. We’ll leave this direct approach as a (potentially long) exercise to the reader, because coming up in this series we are going to do one better. Instead of computing the homology groups of just one simplicial complex using by repeating one algorithm many times, we’re going to compute all the homology groups of a whole family of simplicial complexes in a single bound. This family of simplicial complexes will be constructed from a data set, and so, in grandiose words, we will compute the topological features of data.

If it sounds exciting, that’s because it is! We’ll be exploring a cutting-edge research field known as persistent homology, and we’ll see some of the applications of this theory to data analysis. (bold emphasis added)

Data analysts are needed at all levels.

Do you want to be a spreadsheet data analyst or something a bit harder to find?

Collaborative Filtering via Group-Structured Dictionary Learning

Wednesday, January 30th, 2013

Collaborative Filtering via Group-Structured Dictionary Learning by Zoltan Szabo, Barnabas Poczos , and Andras Lorincz.

Abstract:

Structured sparse coding and the related structured dictionary learning problems are novel research areas in machine learning. In this paper we present a new application of structured dictionary learning for collaborative filtering based recommender systems. Our extensive numerical experiments demonstrate that the presented method outperforms its state-of-the-art competitors and has several advantages over approaches that do not put structured constraints on the dictionary elements.

From the paper:

Novel advances on CF show that dictionary learning based approaches can be efficient for making predictions about users’ preferences [2]. The dictionary learning based approach assumes that (i) there is a latent, unstructured feature space (hidden representation/code) behind the users’ ratings, and (ii) a rating of an item is equal to the product of the item and the user’s feature.

Is a “preference” actually a form of subject identification?

I ask because the notion of a “real time” system is incompatible with users researching the proper canonical subject identifier and/or waiting for a response from an inter-departmental committee to agree on correct terminology.

Perhaps subject identification in some systems must be on the basis of “…latent, unstructured feature space[s]…” that are known (and disclosed) imperfectly at best.

Zoltán Szabó’s Home Page, numerous publications and the source code for this article.

The TV-tree — an index structure for high-dimensional data (1994)

Saturday, September 18th, 2010

The TV-tree — an index structure for high-dimensional data (1994) Authors: King-ip Lin , H. V. Jagadish , Christos Faloutsos Keywords:Spatial Index, Similarity Retrieval, Query by Context, R*-Tree, High-Dimensionality Feature Spaces.

Abstract:

We propose a file structure to index high-dimensionality data, typically, points in some feature space. The idea is to use only a few of the features, utilizing additional features whenever the additional discriminatory power is absolutely necessary. We present in detail the design of our tree structure and the associated algorithms that handle such `varying length’ feature vectors. Finally we report simulation results, comparing the proposed structure with the R -tree, which is one of the most successful methods for low-dimensionality spaces. The results illustrate the superiority of our method, with up to 80% savings in disk accesses.

The notion of “…utilizing additional features whenever the additional discriminatory power is absolutely necessary…” is an important one.

Compare to fixed simplistic discrimination and/or fixed complex, high-overhead, discrimination between subject representatives.

Either one represents a failure of imagination.