Archive for the ‘Procrustes Transformation’ Category

Fusion and inference from multiple data sources in a commensurate space

Friday, June 29th, 2012

Fusion and inference from multiple data sources in a commensurate space by Zhiliang Ma, David J. Marchette and Carey E. Priebe. (Ma, Z., Marchette, D. J. and Priebe, C. E. (2012), Fusion and inference from multiple data sources in a commensurate space. Statistical Analy Data Mining, 5: 187–193. doi: 10.1002/sam.11142)

Abstract:

Given objects measured under multiple conditions—for example, indoor lighting versus outdoor lighting for face recognition, multiple language translation for document matching, etc.—the challenging task is to perform data fusion and utilize all the available information for inferential purposes. We consider two exploitation tasks: (i) how to determine whether a set of feature vectors represent a single object measured under different conditions; and (ii) how to create a classifier based on training data from one condition in order to classify objects measured under other conditions. The key to both problems is to transform data from multiple conditions into one commensurate space, where the (transformed) feature vectors are comparable and would be treated as if they were collected under the same condition. Toward this end, we studied Procrustes analysis and developed a new approach, which uses the interpoint dissimilarities for each condition. We impute the dissimilarities between measurements of different conditions to create one omnibus dissimilarity matrix, which is then embedded into Euclidean space. We illustrate our methodology on English and French documents collected from Wikipedia, demonstrating superior performance compared to that obtained via standard Procrustes transformation.

An early example of identity issues in topic maps from Steve Newcomb made this paper resonate for me. Steve used the example that his home has a set of geographic coordinates, a street address and a set of directions to arrive at his home, all of which identify the same subjects. All the things that can be said using one identifier can be gathered up with statements using the other identifiers.

While I still have reservations about the use of Euclidean space when dealing with non-Euclidean semantics, one has to admit that it is possible to derive some value from it.

I had to file an ILL for a print copy of the article. More to follow when it arrives.