Archive for the ‘Data Fusion’ Category

Fusion and inference from multiple data sources in a commensurate space

Friday, June 29th, 2012

Fusion and inference from multiple data sources in a commensurate space by Zhiliang Ma, David J. Marchette and Carey E. Priebe. (Ma, Z., Marchette, D. J. and Priebe, C. E. (2012), Fusion and inference from multiple data sources in a commensurate space. Statistical Analy Data Mining, 5: 187–193. doi: 10.1002/sam.11142)

Abstract:

Given objects measured under multiple conditions—for example, indoor lighting versus outdoor lighting for face recognition, multiple language translation for document matching, etc.—the challenging task is to perform data fusion and utilize all the available information for inferential purposes. We consider two exploitation tasks: (i) how to determine whether a set of feature vectors represent a single object measured under different conditions; and (ii) how to create a classifier based on training data from one condition in order to classify objects measured under other conditions. The key to both problems is to transform data from multiple conditions into one commensurate space, where the (transformed) feature vectors are comparable and would be treated as if they were collected under the same condition. Toward this end, we studied Procrustes analysis and developed a new approach, which uses the interpoint dissimilarities for each condition. We impute the dissimilarities between measurements of different conditions to create one omnibus dissimilarity matrix, which is then embedded into Euclidean space. We illustrate our methodology on English and French documents collected from Wikipedia, demonstrating superior performance compared to that obtained via standard Procrustes transformation.

An early example of identity issues in topic maps from Steve Newcomb made this paper resonate for me. Steve used the example that his home has a set of geographic coordinates, a street address and a set of directions to arrive at his home, all of which identify the same subjects. All the things that can be said using one identifier can be gathered up with statements using the other identifiers.

While I still have reservations about the use of Euclidean space when dealing with non-Euclidean semantics, one has to admit that it is possible to derive some value from it.

I had to file an ILL for a print copy of the article. More to follow when it arrives.

Working with your Data: Easier and More Fun

Monday, April 16th, 2012

Working with your Data: Easier and More Fun by Rebecca Shapley.

From the post:

The Fusion Tables team has been a little quiet lately, but that’s just because we’ve been working hard on a whole bunch of new stuff that makes it easier to discover, manage and visualize data.

New features from Fusion Tables include:

  • Faceted search
  • Multiple tabs
  • Line charts
  • Graph visualizations
  • New API that returns JSON
  • and more features on the way!

The ability of tools to ease users into data mining, visualization and exploration continues to increase.

Question: How do you counter mis-application of a tool with a sophisticated looking result?

Information Heterogeneity and Fusion

Thursday, May 12th, 2011

2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011)

Important Dates:

Paper submission deadline: 25th July 2011
Notification of acceptance: 19th August 2011
Camera-ready version due: 12th September 2011
Workshop: 23rd or 27th October 2011

Datasets are also being made available. Just in case you can’t find any heterogeneous data lying around. ;-)

Looks like a perfect venue for topic map papers. (Not to mention that a re-usable mapping between recommender systems looks like a commercial opportunity.)

From the website:

In recent years, increasing attention has been given to finding ways for combining, integrating and mediating heterogeneous sources of information for the purpose of providing better personalized services in many information seeking and e-commerce applications. Information heterogeneity can indeed be identified in any of the pillars of a recommender system: the modeling of user preferences, the description of resource contents, the modeling and exploitation of the context in which recommendations are made, and the characteristics of the suggested resource lists.

Almost all current recommender systems are designed for specific domains and applications, and thus usually try to make best use of a local user model, using a single kind of personal data, and without explicitly addressing the heterogeneity of the existing personal information that may be freely available (on social networks, homepages, etc.). Recognizing this limitation, among other issues: a) user models could be based on different types of explicit and implicit personal preferences, such as ratings, tags, textual reviews, records of views, queries, and purchases; b) recommended resources may belong to several domains and media, and may be described with multilingual metadata; c) context could be modeled and exploited in multi-dimensional feature spaces; d) and ranked recommendation lists could be diverse according to particular user preferences and resource attributes, oriented to groups of users, and driven by multiple user evaluation criteria.

The aim of HetRec workshop is to bring together students, faculty, researchers and professionals from both academia and industry who are interested in addressing any of the above forms of information heterogeneity and fusion in recommender systems. We would like to raise awareness of the potential of using multiple sources of information, and look for sharing expertise and suitable models and techniques.

Another dire need is for strong datasets, and one of our aims is to establish benchmarks and standard datasets on which the problems could be investigated. In this edition, we make available on-line datasets with heterogeneous information from several social systems. These datasets can be used by participants to experiment and evaluate their recommendation approaches, and be enriched with additional data, which may be published at the workshop website for future use.

Ensemble Based Systems in Decision Making

Friday, November 26th, 2010

Ensemble Based Systems in Decision Making Authors: Robi Polikar Keywords: Multiple classifier systems, classifier combination, classifier fusion, classifier selection, classifier diversity, incremental learning, data fusion

Abstract:

In matters of great importance that have financial, medical, social, or other implications, we often seek a second opinion before making a decision, sometimes a third, and sometimes many more. In doing so, we weigh the individual opinions, and combine them through some thought process to reach a final decision that is presumably the most informed one. The process of consulting “several experts” before making a final decision is perhaps second nature to us; yet, the extensive benefits of such a process in automated decision making applications have only recently been discovered by computational intelligence community.

Also known under various other names, such as multiple classifier systems, committee of classifiers, or mixture of experts, ensemble based systems have shown to produce favorable results compared to those of single-expert systems for a broad range of applications and under a variety of scenarios. Design, implementation and application of such systems are the main topics of this article. Specifically, this paper reviews conditions under which ensemble based systems may be more beneficial than their single classifier counterparts, algorithms for generating individual components of the ensemble systems, and various procedures through which the individual classifiers can be combined. We discuss popular ensemble based algorithms, such as bagging, boosting, AdaBoost, stacked generalization, and hierarchical mixture of experts; as well as commonly used combination rules, including algebraic combination of outputs, voting based techniques, behavior knowledge space, and decision templates. Finally, we look at current and future research directions for novel applications of ensemble systems. Such applications include incremental learning, data fusion, feature selection, learning with missing features, confidence estimation, and error correcting output codes; all areas in which ensemble systems have shown great promise

Ironic that the second paragraph of the abstract starts off with the very semantic diversity that bedevils effective information retrieval and navigation.

Excellent survey article on ensemble systems.

Questions:

  1. Read and summarize this article. (1-2 pages)
  2. Choose a data set (list to be posted for class). Outline the choices or evaluations you would make in assembling an ensemble system. (3-5 pages, no citations)
  3. Build an ensemble system to assist with building a topic map for a specific data set (Project)