Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 5, 2014

On Graph Stream Clustering with Side Information

Filed under: Clustering,Graphs — Patrick Durusau @ 7:56 pm

On Graph Stream Clustering with Side Information by Yuchen Zhao and Philip S. Yu.

Abstract:

Graph clustering becomes an important problem due to emerging applications involving the web, social networks and bio-informatics. Recently, many such applications generate data in the form of streams. Clustering massive, dynamic graph streams is significantly challenging because of the complex structures of graphs and computational difficulties of continuous data. Meanwhile, a large volume of side information is associated with graphs, which can be of various types. The examples include the properties of users in social network activities, the meta attributes associated with web click graph streams and the location information in mobile communication networks. Such attributes contain extremely useful information and have the potential to improve the clustering process, but are neglected by most recent graph stream mining techniques. In this paper, we define a unified distance measure on both link structures and side attributes for clustering. In addition, we propose a novel optimization framework DMO, which can dynamically optimize the distance metric and make it adapt to the newly received stream data. We further introduce a carefully designed statistics SGS(C) which consume constant storage spaces with the progression of streams. We demonstrate that the statistics maintained are sufficient for the clustering process as well as the distance optimization and can be scalable to massive graphs with side attributes. We will present experiment results to show the advantages of the approach in graph stream clustering with both links and side information over the baselines.

The authors have a concept of “side attributes,” examples of which are:

  • In social networks, many social activities are generated daily in the form of streams, which can be naturally represented as graphs. In addition to the graph representation, there are tremendous side information associated with social activities, e.g. user profiles, behaviors, activity types and geographical information. These attributes can be quite informative to analyze the social graphs. We illustrate an example of such user interaction graph stream in Figure 1.
  • Web click events are graph object streams generated by users. Each graph object represents a series of web clicks by a specific user within a time frame. Besides the click graph object, the meta data of webpages, users’ IP addresses and time spent on browsing can all provide insights to the subtle correlations of click graph objects.
  • In a large scientific repository (e.g. DBLP), each single article can be modeled as an authorship graph object [1][4]. In Figure 2, we illustrate an example of an authorship graph (paper) which consists of three authors (nodes) and a list of side information. For each article, the side attributes, including paper keywords, published venues and years, may be used to enhance the mining quality since they indicate tremendous meaningful relationships among authorship graphs.

Although the “side attributes” are “second-class citizens,” not part of the graph structure, the authors demonstrate effective clustering of graph streams based upon those “side attributes.”

An illustration of the point that even though you could represent the “side attributes” as part of the graph structure, you don’t necessarily have to represent them that way.

Much in the same way that some subjects in a topic map may not be represented by topics, it really depends on your use cases and requirements.

PS: If you are looking for the Cora data set cited in this paper, it has moved. See: http://people.cs.umass.edu/~mccallum/data.html, the code is now located at: http://people.cs.umass.edu/~mccallum/code.html.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress