Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 24, 2015

Summarizing and understanding large graphs

Filed under: Data Mining,Graphs — Patrick Durusau @ 3:26 pm

Summarizing and understanding large graphs by Danai Koutra, U Kang, Jilles Vreeken and Christos Faloutsos. (DOI: 10.1002/sam.11267)

Abstract:

How can we succinctly describe a million-node graph with a few simple sentences? Given a large graph, how can we find its most “important” structures, so that we can summarize it and easily visualize it? How can we measure the “importance” of a set of discovered subgraphs in a large graph? Starting with the observation that real graphs often consist of stars, bipartite cores, cliques, and chains, our main idea is to find the most succinct description of a graph in these “vocabulary” terms. To this end, we first mine candidate subgraphs using one or more graph partitioning algorithms. Next, we identify the optimal summarization using the minimum description length (MDL) principle, picking only those subgraphs from the candidates that together yield the best lossless compression of the graph—or, equivalently, that most succinctly describe its adjacency matrix.

Our contributions are threefold: (i) formulation: we provide a principled encoding scheme to identify the vocabulary type of a given subgraph for six structure types prevalent in real-world graphs, (ii) algorithm: we develop VoG, an efficient method to approximate the MDL-optimal summary of a given graph in terms of local graph structures, and (iii) applicability: we report an extensive empirical evaluation on multimillion-edge real graphs, including Flickr and the Notre Dame web graph.

Use the DOI if you need the official citation, otherwise select the title link and it takes you a non-firewalled copy.

A must read if you are trying to extract useful information out of multimillion-edge graphs.

I first saw this in a tweet by Kirk Borne.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress