Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 4, 2013

Pattern Based Graph Generator

Filed under: Graph Databases,Graph Generator,Graphs,Networks — Patrick Durusau @ 5:35 pm

Pattern Based Graph Generator by Hong-Han Shuai, De-Nian Yang, Philip S. Yu, Chih-Ya Shen, Ming-Syan Chen.

Abstract:

The importance of graph mining has been widely recognized thanks to a large variety of applications in many areas, while real datasets always play important roles to examine the solution quality and efficiency of a graph mining algorithm. Nevertheless, the size of a real dataset is usually fixed and constrained according to the available resources, such as the efforts to crawl an on-line social network. In this case, employing a synthetic graph generator is a possible way to generate a massive graph (e.g., billions nodes) for evaluating the scalability of an algorithm, and current popular statistical graph generators are properly designed to maintain statistical metrics such as total node degree, degree distribution, diameter, and clustering coefficient of the original social graphs. Nevertheless, in addition to the above metrics, recent studies on graph mining point out that graph frequent patterns are also important to provide useful implications for the corresponding social networking applications, but this crucial criterion has not been noticed in the existing graph generators. This paper first manifests that numerous graph patterns, especially large patterns that are crucial with important domain-specific semantic, unfortunately disappear in the graphs created by popular statistic graph generators, even though those graphs enjoy the same statistical metrics with the original real dataset. To address this important need, we make the first attempt to propose a pattern based graph generator (PBGG) to generate a graph including all patterns and satisfy the user-specified parameters on supports, network size, degree distribution, and clustering coefficient. Experimental results show that PBGG is efficient and able to generate a billion-node graph with about 10 minutes, and PBGG is released for free download.

Download at: http://arbor.ee.ntu.edu.tw/~hhshuai/PBGG.html.

OK, this is not a “moderate size database” program.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress