Network motif discovery: A GPU approach by Wenqing Lin ; Xiaokui Xiao ; Xing Xie ; Xiao-Li Li.
Abstract:
The identification of network motifs has important applications in numerous domains, such as pattern detection in biological networks and graph analysis in digital circuits. However, mining network motifs is computationally challenging, as it requires enumerating subgraphs from a real-life graph, and computing the frequency of each subgraph in a large number of random graphs. In particular, existing solutions often require days to derive network motifs from biological networks with only a few thousand vertices. To address this problem, this paper presents a novel study on network motif discovery using Graphical Processing Units (GPUs). The basic idea is to employ GPUs to parallelize a large number of subgraph matching tasks in computing subgraph frequencies from random graphs, so as to reduce the overall computation time of network motif discovery. We explore the design space of GPU-based subgraph matching algorithms, with careful analysis of several crucial factors that affect the performance of GPU programs. Based on our analysis, we develop a GPU-based solution that (i) considerably differs from existing CPU-based methods, and (ii) exploits the strengths of GPUs in terms of parallelism while mitigating their limitations in terms of the computation power per GPU core. With extensive experiments on a variety of biological networks, we show that our solution is up to two orders of magnitude faster than the best CPU-based approach, and is around 20 times more cost-effective than the latter, when taking into account the monetary costs of the CPU and GPUs used.
For those of you who need to dodge the paywall: Network motif discovery: A GPU approach
From the introduction (topic map comments interspersed):
Given a graph $$G$$, a network motif in $$G$$ is a subgraph $$g$$ of $$G$$, such that $$g$$ appears much more frequently in $$G$$ than in random graphs whose degree distributions are similar to that of $$G$$ [1]. The identification of network motifs finds important applications in numerous domains. For example, network motifs are used (i) in system biology to predict protein interactions in biological networks and discover functional sub-units [2], (ii) in electronic engineering to understand the characteristics of circuits [3], and (iii) in brain science to study the functionalities of brain networks [4].
Unlike a topic map considered as graph $$G$$, the network motif problem is to discover subgraphs of $$G$$. In a topic map, the subgraph constituting a topic and its defined isomophism with other topics, is declarable.
…Roughly speaking, all existing techniques adopt a common two-phase framework as follows:
• Subgraph Enumeration: Given a graph $$G$$ and a parameter $$k$$, enumerate the subgraphs $$g$$ of $$G$$ with $$k$$ vertices each;
• Frequency Estimation:
…
Rather than enumerating subgraph $$g$$ of $$G$$ (a topic map), we need only collect those subgraphs for the isomorphism test.
…To compute the frequency of $$g$$ in a random graph $$G’$$, however, we need to derive the number of subgraphs of $$G$$ that are isomorphic to $$g$$ – this requires a large number of subgraph isomorphism tests [14], which are known to be computationally expensive.
…
Footnote 14 is a reference to: Practical graph isomorphism, II by Brendan D. McKay, Adolfo Piperno.
See also: nauty and Traces by Brendan McKay and Adolfo Piperno.
nauty and Traces are programs for computing automorphism groups of graphs and digraphs [*]. They can also produce a canonical label. They are written in a portable subset of C, and run on a considerable number of different systems.
This is where topic map subgraph $$g$$ isomorphism issue intersects with the more general subgraph isomorphism case.
Any topic can have properties (nee internal occurrences) in addition to those thought to make it “isomorphic” to another topic. Or play roles in associations that are not played by other topics representing the same subject.
Motivated by the deficiency of existing work, we present an in-depth study on efficient solutions for network motif discovery. Instead of focusing on the efficiency of individual subgraph isomorphism tests, we propose to utilize Graphics Processing Units (GPUs) to parallelize a large number of isomorphism tests, in order to reduce the computation time of the frequency estimation phase. This idea is intuitive, and yet, it presents a research challenge since there is no existing algorithm for testing subgraph isomorphisms on GPUs. Furthermore, as shown in Section III, existing CPU-based algorithms for subgraph isomorphism tests cannot be translated into efficient solutions on GPUs, as the characteristics of GPUs make them inherently unsuitable for several key procedures used in CPU-based algorithms.
The punch line is that the authors present a solution that:
…is up to two orders of magnitude faster than the best CPU-based approach, and is around 20 times more cost-effective than the latter, when taking into account the monetary costs of the CPU and GPUs used.
Since topic isomophism is defined as opposed to discovered, this looks like a fruitful avenue to explore for topic map engine performance.