On Clustering on Graphs with Multiple Edge Types

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 11, 2011

On Clustering on Graphs with Multiple Edge Types

Filed under: Clustering,Graphs — Patrick Durusau @ 7:12 pm

On Clustering on Graphs with Multiple Edge Types by Matthew Rocklin and Ali Pinar.

Abstract:

We study clustering on graphs with multiple edge types. Our main motivation is that similarities between objects can be measured in many different metrics. For instance similarity between two papers can be based on common authors, where they are published, keyword similarity, citations, etc. As such, graphs with multiple edges is a more accurate model to describe similarities between objects. Each edge/metric provides only partial information about the data; recovering full information requires aggregation of all the similarity metrics. Clustering becomes much more challenging in this context, since in addition to the difficulties of the traditional clustering problem, we have to deal with a space of clusterings. We generalize the concept of clustering in single-edge graphs to multi-edged graphs and investigate problems such as: Can we find a clustering that remains good, even if we change the relative weights of metrics? How can we describe the space of clusterings efficiently? Can we find unexpected clusterings (a good clustering that is distant from all given clusterings)? If given the ground-truth clustering, can we recover how the weights for edge types were aggregated? %In this paper, we discuss these problems and the underlying algorithmic challenges and propose some solutions. We also present two case studies: one based on papers on Arxiv and one based on CIA World Factbook.

From the introduction:

In many real-world problems, similarities between objects can be defined by many different relationships. For instance, similarity between two scientific articles can be defined based on authors, citations to, citations from, keywords, titles, where they are published, text similarity and many more. Relationships between people can be based on the nature of the relationship (e.g., business, family, friendships) a means of communication (e.g., emails, phone, in person), etc. Electronic files can be grouped by their type (Latex, C, html), names, the time they are created, or the pattern they are accessed. In these examples, there are multiple graphs that define relationships between the subjects. We may choose to reduce this multivariate information to construct a single composite graph. This is convenient as it enables application of many strong results from the literature. However, information being lost during this aggregation may be crucial, and we believe working on graphs with multiple edge types is a more precise representation of the problem, and thus will lead to more accurate analyses. Despite its importance, the literature on clustering graphs with multiple edge types is very rare….

Sounds familiar doesn’t it? A good deal like associations in a topic map. Where implicit information, such as roles and which role is being played becomes explicit.

The clustering described in this paper is, of course, a way to group and identify collective subjects.

You will probably be interested in the Graclus software that the authors use for splitting graphs.

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 11, 2011

On Clustering on Graphs with Multiple Edge Types

No Comments