Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

August 3, 2012

Genome assembly and comparison using de Bruijn graphs

Filed under: Bioinformatics,De Bruijn Graphs,Genome,Graphs,Networks — Patrick Durusau @ 10:41 am

Genome assembly and comparison using de Bruijn graphs by Daniel Robert Zerbino. (thesis)

Abstract:

Recent advances in sequencing technology made it possible to generate vast amounts of sequence data. The fragments produced by these high-throughput methods are, however, far shorter than in traditional Sanger sequencing. Previously, micro-reads of less than 50 base pairs were considered useful only in the presence of an existing assembly. This thesis describes solutions for assembling short read sequencing data de novo, in the absence of a reference genome.

The algorithms developed here are based on the de Bruijn graph. This data structure is highly suitable for the assembly and comparison of genomes for the following reasons. It provides a flexible tool to handle the sequence variants commonly found in genome evolution such as duplications, inversions or transpositions. In addition, it can combine sequences of highly different lengths, from short reads to assembled genomes. Finally, it ensures an effective data compression of highly redundant datasets.

This thesis presents the development of a collection of methods, called Velvet, to convert a de Bruijn graph into a traditional assembly of contiguous sequences. The first step of the process, termed Tour Bus, removes sequencing errors and handles biological variations such as polymorphisms. In its second part, Velvet aims to resolve repeats based on the available information, from low coverage long reads (Rock Band) or paired shotgun reads (Pebble). These methods were tested on various simulations for precision and efficiency, then on control experimental datasets.

De Bruijn graphs can also be used to detect and analyse structural variants from unassembled data. The final chapter of this thesis presents the results of collaborative work on the analysis of several experimental unassembled datasets.

De Bruijn graphs are covered in pages 22-42 if you want to cut to the chase.

Obviously of interest to the bioinformatics community.

Where else would you use de Bruijn graph structures?

Is “Massive Data” > “Big Data”?

Filed under: BigData,Bioinformatics,Genome,Graphs,Networks — Patrick Durusau @ 8:32 am

Science News announces: New Computational Technique Relieves Logjam from Massive Amounts of Data, which is a better title than: Scaling metagenome sequence assembly with probabilistic de Bruijn graphs by Jason Pell, Arend Hintze, Rosangela Canino-Koning, Adina Howe, James M. Tiedje, and C. Titus Brown.

But I have to wonder about “massive data,” versus “big data,” versus “really big data,” versus “massive data streams,” as informative phrases. True, I have a weakness for an eye-catching headline but in prose, shouldn’t we say what data is under consideration? Let the readers draw their own conclusions?

The paper abstract reads:

Deep sequencing has enabled the investigation of a wide range of environmental microbial ecosystems, but the high memory requirements for de novo assembly of short-read shotgun sequencing data from these complex populations are an increasingly large practical barrier. Here we introduce a memory-efficient graph representation with which we can analyze the k-mer connectivity of metagenomic samples. The graph representation is based on a probabilistic data structure, a Bloom filter, that allows us to efficiently store assembly graphs in as little as 4 bits per k-mer, albeit inexactly. We show that this data structure accurately represents DNA assembly graphs in low memory. We apply this data structure to the problem of partitioning assembly graphs into components as a prelude to assembly, and show that this reduces the overall memory requirements for de novo assembly of metagenomes. On one soil metagenome assembly, this approach achieves a nearly 40-fold decrease in the maximum memory requirements for assembly. This probabilistic graph representation is a significant theoretical advance in storing assembly graphs and also yields immediate leverage on metagenomic assembly.

If “de Bruijn graphs,” sounds familiar, see: Memory Efficient De Bruijn Graph Construction [Attn: Graph Builders, Chess Anyone?].

August 2, 2012

Universal properties of mythological networks

Filed under: Networks,Social Networks — Patrick Durusau @ 2:30 pm

Universal properties of mythological networks by Pádraig Mac Carron and Ralph Kenna (2012 EPL 99 28002 doi:10.1209/0295-5075/99/28002)

Abstract:

As in statistical physics, the concept of universality plays an important, albeit qualitative, role in the field of comparative mythology. Here we apply statistical mechanical tools to analyse the networks underlying three iconic mythological narratives with a view to identifying common and distinguishing quantitative features. Of the three narratives, an Anglo-Saxon and a Greek text are mostly believed by antiquarians to be partly historically based while the third, an Irish epic, is often considered to be fictional. Here we use network analysis in an attempt to discriminate real from imaginary social networks and place mythological narratives on the spectrum between them. This suggests that the perceived artificiality of the Irish narrative can be traced back to anomalous features associated with six characters. Speculating that these are amalgams of several entities or proxies, renders the plausibility of the Irish text comparable to the others from a network-theoretic point of view.

A study that suggests there is more to be learned about networks, social, mythological and otherwise. But three (3) examples out of extant accounts, mythological and otherwise, isn’t enough for definitive conclusions.

BTW, if you are interested in the use of social networks with literature, see: Extracting Social Networks from Literary Fiction by David K. Elson , Nicholas Dames , Kathleen R. Mckeown for one approach. (If you know of a recent survey on extraction of social networks, please forward and I will cite you in a post.)

How to Make an Interactive Network Visualization

Filed under: D3,Graphics,Graphs,Javascript,Music,Networks,Visualization — Patrick Durusau @ 10:10 am

How to Make an Interactive Network Visualization by Jim Vallandingham.

From the post:

Interactive network visualizations make it easy to rearrange, filter, and explore your connected data. Learn how to make one using D3 and JavaScript.

Networks! They are all around us. The universe is filled with systems and structures that can be organized as networks. Recently, we have seen them used to convict criminals, visualize friendships, and even to describe cereal ingredient combinations. We can understand their power to describe our complex world from Manuel Lima’s wonderful talk on organized complexity. Now let’s learn how to create our own.

In this tutorial, we will focus on creating an interactive network visualization that will allow us to get details about the nodes in the network, rearrange the network into different layouts, and sort, filter, and search through our data.

In this example, each node is a song. The nodes are sized based on popularity, and colored by artist. Links indicate two songs are similar to one another.

Try out the visualization on different songs to see how the different layouts and filters look with the different graphs.

You know this isn’t a post about politics because it would be visualizing friendships with convicted criminals. 😉

A degree of separation graph between elected public officials and convicted white collar criminals? A topic map for another day.

For today, enjoy learning how to use D3 and JavaScript for impressive network visualizations.

Imagine mapping the cereal visualization to the shelf locations at your local Kroger, where selecting one ingredient identifies the store locations of others.

July 31, 2012

Political Moneyball

Filed under: Government Data,Networks,Politics — Patrick Durusau @ 1:51 pm

Nathan Yau points out the Wall Street Journal’s “Political Moneyball” visualization in Network of political contributions.

You will probably benefit from starting with Nathan’s comments and then navigating the WSJ visualization.

I like the honesty of the Wall Street Journal. They have chosen a side and yet see humor in its excesses.

Nathan mentions the difficulty with unfamiliar names and organizations.

An example of where topic maps could enable knowledgeable users to gather information together for the benefit of subsequent, less knowledgeable users of the map.

Creating the potential for a collaborative, evolutionary information resource that improves with usage.

July 27, 2012

Computational Aspects of Social Networks (CASoN) [Conference]

Filed under: Conferences,Networks,Social Networks — Patrick Durusau @ 7:51 am

Computational Aspects of Social Networks (CASoN)

Important Dates:

Paper submission due:Aug. 15 2012
Notification of paper acceptance:Sep. 15 2012
Final manuscript due:Sep. 30 2012
Registration and full payment due:Sep. 30 2012
Conference date:Nov. 21-23 2012

Conference venue: São Carlos, Brazil.

From the Call for Papers:

The International Conference on Computational Aspects of Social Networks (CASoN 2012) brings together an interdisciplinary venue for social scientists, mathematicians, computer scientists, engineers, computer users, and students to exchange and share their experiences, new ideas, and research results about all aspects (theory, applications and tools) of intelligent methods applied to Social Networks, and to discuss the practical challenges encountered and the solutions adopted.

Social networks provide a powerful abstraction of the structure and dynamics of diverse kinds of people or people-to-technology interaction. These social network systems are usually characterized by the complex network structures and rich accompanying contextual information. Recent trends also indicate the usage of complex network as a key feature for next generation usage and exploitation of the Web. This international conference on Computational Aspect of Networks is focused on the foundations of social networks as well as case studies, empirical, and other methodological works related to the computational tools for the automatic discovery of Web-based social networks. This conference provides an opportunity to compare and contrast the ethological approach to social behavior in animals (including the study of animal tracks and learning by members of the same species) with web-based evidence of social interaction, perceptual learning, information granulation, the behavior of humans and affinities between web-based social networks. The main topics cover the design and use of various computational intelligence tools and software, simulations of social networks, representation and analysis of social networks, use of semantic networks in the design and community-based research issues such as knowledge discovery, privacy and protection, and visualization.

We solicit original research and technical papers not published elsewhere. The papers can be theoretical, practical and application, and cover a broad set of intelligent methods, with particular emphasis on Social Network computing.

One of the more interesting aspects of social network study, at least to me, is the existence of social networks of researchers who are studying social networks. Implies, to me at least, that “subjects” of discussion have their origins in social networks.

Some approaches, I won’t name names, take “subjects” as given and never question their origins. That leads directly to fragile systems/ontologies because change isn’t taken into account.

Clearly saying “stop” is insufficient, else the many attempts to fix some standardized language would have succeeded long ago.

If you know approaches that attempt to allow for change, would appreciate a note.

July 25, 2012

Searching the WWW for found things, a bad solution

Filed under: Algorithms,Graphs,Networks — Patrick Durusau @ 1:53 pm

Searching for something you have found once before, is a very bad solution.

It is like catching a fish and then throwing it back into the ocean and attempting to find that same fish, again.

Real example that happened to me this week. I had mentioned this research in a post but did not include a link to it. I didn’t remember the names of the researchers or their location.

From the news bulletin:

Whether sudoku, a map of Germany or solid bodies – in all of these cases, it’s all about counting possibilities. In the sudoku, it is the permitted solutions; in the solid body, it is the possible arrangements of atoms. In the map, the question is how many ways the map can be colored so that adjacent countries are always shown in a different color. Scientists depict these counting problems as a network of lines and nodes. Consequently, they need to answer just one question: How many different ways are there to color in the nodes with a certain number of colors? The only condition: nodes joined by a line may not have the same color. Depending on the application, the color of a node is given a completely new significance. In the case of the map, “color” actually means color; with sudoku the “colors” represent different figures.

“The existing algorithm copies the whole network for each stage of the calculation and only changes one aspect of it each time,” explains Frank van Bussel of the Max Planck Institute for Dynamics and Self-Organization (MPIDS). Increasing the number of nodes dramatically increases the calculation time. For a square lattice the size of a chess board, this is estimated to be many billions of years. The new algorithm developed by the Göttingen-based scientists is significantly faster. “Our calculation for the chess board lattice only takes seven seconds,” explains Denny Fliegner from MPIDS.

Without naming the search engine, would you believe that:

network +color +node

results in 24 “hits,” none of which are the research in question.

Remembering some of the terms in the actual scholarly article I searched using:

"chromatic polynomials" +network

Which resulted in three (3) scholarly articles and one “hit,” none of which were the research in question.

As you may suspect, variations on these searches resulted in similar “non-helpful” results.

I had not imagined the research in question but searching was unable to recover the reference.

Well, using a search engine was unable to recover the reference.

Knowing that I had bookmarked the site, I had to scan a large bookmark file for the likely entry.

Found it and so I don’t have to repeat this non-productive behavior, what follows are the citations and some text from each to help with the finding, next time.

The general news article:

A new kind of counting

How many different sudokus are there? How many different ways are there to color in the countries on a map? And how do atoms behave in a solid? Researchers at the Max Planck Institute for Dynamics and Self-Organization in Göttingen and at Cornell University (Ithaca, USA) have now developed a new method that quickly provides an answer to these questions. In principle, there has always been a way to solve them. However, computers were unable to find the solution as the calculations took too long. With the new method, the scientists look at separate sections of the problem and work through them one at a time. Up to now, each stage of the calculation has involved the whole map or the whole sudoku. The answers to many problems in physics, mathematics and computer science can be provided in this way for the first time. (New Journal of Physics, February 4, 2009)

The New Journal of Physics article:

Counting complex disordered states by efficient pattern matching: chromatic polynomials and Potts partition functions by Marc Timme, Frank van Bussel, Denny Fliegner, and Sebastian Stolzenberg.

Abstract:

Counting problems, determining the number of possible states of a large system under certain constraints, play an important role in many areas of science. They naturally arise for complex disordered systems in physics and chemistry, in mathematical graph theory, and in computer science. Counting problems, however, are among the hardest problems to access computationally. Here, we suggest a novel method to access a benchmark counting problem, finding chromatic polynomials of graphs. We develop a vertex-oriented symbolic pattern matching algorithm that exploits the equivalence between the chromatic polynomial and the zero-temperature partition function of the Potts antiferromagnet on the same graph. Implementing this bottom-up algorithm using appropriate computer algebra, the new method outperforms standard top-down methods by several orders of magnitude, already for moderately sized graphs. As a first application, we compute chromatic polynomials of samples of the simple cubic lattice, for the first time computationally accessing three-dimensional lattices of physical relevance. The method offers straightforward generalizations to several other counting problems.

GENERAL SCIENTIFIC SUMMARY

Introduction and background. The number of accessible states of a complex physical system fundamentally impacts its static and dynamic properties. For instance, antiferromagnets often exhibit an exponential number of energetically equivalent ground states and thus positive entropy at zero temperature – an exception to the third law of thermodynamics. However, counting the number of ground states, such as for the Potts model antiferromagnet, is computationally very hard (so-called sharp-P hard), i.e. the computation time generally increases exponentially with the size of the system. Standard computational counting methods that use theorems of graph theory are therefore mostly restricted to very simple or very small lattices.

Main results. Here we present a novel general-purpose method for counting. It relies on a symbolic algorithm that is based on the original physical representation of a Potts partition function and is implemented in the computer algebra language FORM that was successfully used before in precision high-energy physics.

Wider implications. The bottom-up nature of the algorithm, together with the purely symbolic implementation make the new method many orders of magnitude faster than standard methods. It now enables exact solutions of various systems that have been thus far computationally inaccessible, including lattices in three dimensions. Through the relation of the Potts partition functions to universal functions in graph theory, this new method may also help to access related counting problems in communication theory, graph theory and computer science.

The language used was FORM.

This search reminded me that maps are composed of found and identified things.

Which explains the difference between searching the WWW versus a map of found things.

July 21, 2012

Update on Assimilation Project: Neo4j Server Schema

Filed under: Neo4j,Networks — Patrick Durusau @ 8:10 pm

Update on Assimilation Project: Neo4j Server Schema

From the post:

In his latest blog post, Alan Robertson makes a bold move by building a Neo4j – a schemaless graph database – schema server for his Assimilation Monitoring Project which is a comprehensive monitoring of systems and services for networks of potentially unlimited size.

For the schema, Robertson outlines the basic entities: Servers, NICs and IP addresses which indexes of servers names, MAC addresses and IP addresses. The power in using a graph is the ability to define the relationships between the various entities such as NIC owner, IP Owner, IP Host and Primary IP.

Correct me if I’m wrong but doesn’t that sound a lot like the topic map that Robert Barta was running when he was down under?

I don’t have the reference within easy reach so if you can post it as a comment I would appreciate it.

July 19, 2012

GraphLab 2.1 [New Release]

Filed under: GraphLab,Graphs,Machine Learning,Networks — Patrick Durusau @ 10:43 am

GraphLab 2.1

A new release (July 10, 2012) of GraphLab!

From the webpage:

Overview

Designing and implementing efficient and provably correct parallel machine learning (ML) algorithms can be very challenging. Existing high-level parallel abstractions like MapReduce are often insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. By targeting common patterns in ML, we developed GraphLab, which improves upon abstractions like MapReduce by compactly expressing asynchronous iterative algorithms with sparse computational dependencies while ensuring data consistency and achieving a high degree of parallel performance.

The new GraphLab 2.1 features:

  • a new GraphLab 2 abstraction
  • Fully Distributed with HDFS integration
  • New toolkits
    • Collaborative Filtering
    • Clustering
    • Text Modeling
    • Computer Vision
  • Improved Build system and documentation

Go to http://graphlab.org for details.

If you want to get started with GraphLab today download the source or clone us Google code. We recommend cloning to get the latest features and bug fixes.

hg clone https://code.google.com/p/graphlabapi/

If you don't have mercurial (hg) you can get it from http://mercurial.selenic.com/.

I almost didn’t find the download link. Just larger than anything else on the page, white letters, black backgroud, at: http:www.graphlab.org. Kept looking for a drop down menu item, etc.

Shows even the clearest presentation can be missed by a user. 😉

Now to get this puppy running on my local box.

GraphLab Workshop Presentations

Filed under: Conferences,Graphs,Networks — Patrick Durusau @ 10:16 am

GraphLab Workshop Presentations

Just in case you missed the GraphLab workshop, most of the presentations are now available online, including an introduction to GraphLab 2.1!

Very much worth your time to review!

OK, just a sample:

GraphLab Version 2 Overview (60 mins) (GraphLab Keynote Slides) by Carlos Guestrin

Large scale ML challenges by Ted Willke, Intel Labs

See the agenda for more of same.

Web Scale with a Laptop? [GraphChi]

Filed under: GraphChi,Graphs,Networks — Patrick Durusau @ 9:54 am

GraphChi promises in part:

The promise of GraphChi is to bring web-scale graph computation, such as analysis of social networks, available to anyone with a modern laptop.

Well, that certainly draws a line in the sand doesn’t it?

A bit more from the introduction:

GraphChi is a spin-off of the GraphLab ( http://www.graphlab.org ) -project from the Carnegie Mellon University. It is based on research by Aapo Kyrola ( http://www.cs.cmu.edu/~akyrola/) and his advisors.

GraphChi can run very large graph computations on just a single machine, by using a novel algorithm for processing the graph from disk (SSD or hard drive). Programs for GraphChi are written in the vertex-centric model, proposed by GraphLab and Google's Pregel. GraphChi runs vertex-centric programs asynchronously (i.e changes written to edges are immediately visible to subsequent computation), and in parallel. GraphChi also supports streaming graph updates and removal of edges from the graph. Section 'Performance' contains some examples of applications implemented for GraphChi and their running times on GraphChi.

The promise of GraphChi is to bring web-scale graph computation, such as analysis of social networks, available to anyone with a modern laptop. It saves you from the hassle and costs of working with a distributed cluster or cloud services. We find it much easier to debug applications on a single computer than trying to understand how a distributed algorithm is executed.

In some cases GraphChi can solve bigger problems in reasonable time than many other available distributed frameworks. GraphChi also runs efficiently on servers with plenty of memory, and can use multiple disks in parallel by striping the data.

Even if you do require the processing power of high-performance clusters, GraphChi can be an excellent tool for developing and debugging your algorithms prior to deploying them to the cluster. For high-performance graph computation in the distributed setting, we direct you to GraphLab's new version (v2.1), which can now handle large graphs in astonishing speed. GraphChi supports also most of the new GraphLab v2.1 API (with some restrictions), making the transition easy.

GraphChi is implemented in plain C++, and available as open-source under the flexible Apache License 2.0.

Java version

Java-version of GraphChi: http://code.google.com/p/graphchi-java

The performance numbers are impressive.

Not sure I would want to run production code on a laptop in any case but performance should be enough for on-the-road experiments.

Good documentation and examples that should ease you into experimenting with GraphChi.

I first saw this at High Scalability.

July 17, 2012

Memory Efficient De Bruijn Graph Construction [Attn: Graph Builders, Chess Anyone?]

Filed under: Bioinformatics,Genome,Graphs,Networks — Patrick Durusau @ 10:44 am

Memory Efficient De Bruijn Graph Construction by Yang Li, Pegah Kamousi, Fangqiu Han, Shengqi Yang, Xifeng Yan, and Subhash Suri.

Abstract:

Massively parallel DNA sequencing technologies are revolutionizing genomics research. Billions of short reads generated at low costs can be assembled for reconstructing the whole genomes. Unfortunately, the large memory footprint of the existing de novo assembly algorithms makes it challenging to get the assembly done for higher eukaryotes like mammals. In this work, we investigate the memory issue of constructing de Bruijn graph, a core task in leading assembly algorithms, which often consumes several hundreds of gigabytes memory for large genomes. We propose a disk-based partition method, called Minimum Substring Partitioning (MSP), to complete the task using less than 10 gigabytes memory, without runtime slowdown. MSP breaks the short reads into multiple small disjoint partitions so that each partition can be loaded into memory, processed individually and later merged with others to form a de Bruijn graph. By leveraging the overlaps among the k-mers (substring of length k), MSP achieves astonishing compression ratio: The total size of partitions is reduced from $\Theta(kn)$ to $\Theta(n)$, where $n$ is the size of the short read database, and $k$ is the length of a $k$-mer. Experimental results show that our method can build de Bruijn graphs using a commodity computer for any large-volume sequence dataset.

A discovery in one area of data processing can have a large impact in a number of others. I suspect that will be the case with the technique described here.

The use of substrings for compression and to determine the creation of partitions was particularly clever.

Software and data sets

Questions:

  1. What are the substring characteristics of your data?
  2. How would you use a De Bruijn graph with your data?

If you don’t know the answers to those questions, you might want to find out.

Additional Resources:

De Bruijn Graph (Wikipedia)

De Bruijn Sequence (Wikipedia)

How to apply de Bruijn graphs to genome assembly by Phillip E C Compeau, Pavel A Pevzner, and Glenn Tesler. Nature Biotechnology 29, 987–991 (2011) doi:10.1038/nbt.2023

And De Bruijn graphs/sequences are not just for bioinformatics: from the Chess Programming Wiki: De Bruijn Sequences. (Lots of pointers and additional references.)

July 16, 2012

Graphing every idea in history

Filed under: Graphs,Humor,Networks — Patrick Durusau @ 3:20 pm

Graphing every idea in history by Nathan Yau.

I did a spot check and my idea about …., well, never mind, it wasn’t listed. (good thing)

Then I read that “every” idea meant only those in Wikipedia with an “”influenced by” or “influences” field.’

Started to breath a little easier. 😉

Interesting work but think about the number of facts that you know. Facts that influence your opinions and judgements that aren’t captured in any “fact” database.

July 11, 2012

GraphPack

Filed under: Graph Traversal,GraphPack,Graphs,Networks,Traversal — Patrick Durusau @ 2:27 pm

GraphPack

From the webpage:

GraphPack is a network of autonomous services that manage graph structures. Each node in those graphs may refer to a node in another service, effectively forming a distributed graph. GraphPack deals with the processing of such decentralized graphs. GraphPack supports its own traverse/query language (inspired by neo4j::cypher) that can executed as transparently distributed traverses.

Amit Portnoy wrote about GraphPack on the neo4j mailing list:

The prototype, called GraphPack, has a very lite design, actual persistence and communication aspects can be easily pluged-in by injection (using Guice).

GraphPack enables transperantly distributed traverses (in a decentralized graph), which can be specified by Cypher-inspired traverse specification language.

That is, clients of a GraphPack service (which may have a graph the refer to other nodes in other GraphPack services) can write a cypher-like expressions and simply receive a result, while the actual implementation may make many remote communication steps. This is done by deriving a new traverse specification in every edge a long specified paths and sending this derived specification to the next node (in other words, the computation moves along the nodes that are matched by the traverse specification).

Sounds like there should be lessons here for distributed topic maps. Yes?

Robustness Elasticity in Complex Networks

Filed under: Complex Networks,Graphs,Networks — Patrick Durusau @ 2:27 pm

Robustness Elasticity in Complex Networks by Timothy C. Matisziw, Tony H. Grubesic, and Junyu Guo. (Matisziw TC, Grubesic TH, Guo J (2012) Robustness Elasticity in Complex Networks. PLoS ONE 7(7): e39788. doi:10.1371/journal.pone.0039788)

Abstract:

Network robustness refers to a network’s resilience to stress or damage. Given that most networks are inherently dynamic, with changing topology, loads, and operational states, their robustness is also likely subject to change. However, in most analyses of network structure, it is assumed that interaction among nodes has no effect on robustness. To investigate the hypothesis that network robustness is not sensitive or elastic to the level of interaction (or flow) among network nodes, this paper explores the impacts of network disruption, namely arc deletion, over a temporal sequence of observed nodal interactions for a large Internet backbone system. In particular, a mathematical programming approach is used to identify exact bounds on robustness to arc deletion for each epoch of nodal interaction. Elasticity of the identified bounds relative to the magnitude of arc deletion is assessed. Results indicate that system robustness can be highly elastic to spatial and temporal variations in nodal interactions within complex systems. Further, the presence of this elasticity provides evidence that a failure to account for nodal interaction can confound characterizations of complex networked systems.

As you might expect, I am reading this paper from the perspective of connections between nodes of information and not, for example, nodes on the Internet. I suspect it is particularly relevant for models of (or the lack thereof) information sharing between agencies for example.

With only anecdotal evidence about sharing, it isn’t possible to determine what structural, organization or other barriers are blocking the flow of information. Or what changes would result in the largest amount of effective information sharing. If shared information is mired at upper management levels, then it is of little use to actual analysts.

BTW, you can rest assured this work will not be used inappropriately:

Matisziw’s model was documented in the publicly available journal PLoS ONE. Making such a powerful tool widely available won’t be a danger, Matisziw said. To use his model, a network must be understood in detail. Since terrorists and other criminals don’t have access to enough data about the networks, they won’t be able to use the model to develop doomsday scenarios. [From: Cyberwarfare, Conservation and Disease Prevention Could Benefit from New Network Model, where I first saw this story.

Do terrorists develop anything other than “doomsday scenarios?” Or is that just PR? Like Y2K or the recent DNS issue. Everyone gets worked up, small bump, life goes on.

July 7, 2012

When are graphs coming to ESPN?

Filed under: Graphs,Networks — Patrick Durusau @ 6:16 pm

Sooner than you think.

Read: PageRank Algorithm Reveals Soccer Teams’ Strategies.

From the post:

Many readers will have watched the final of the Euro 2012 soccer championships on Sunday in which Spain demolished a tired Italian team by 4 goals to nil. The result, Spain’s third major championship in a row, confirms the team as the best in the world and one of the greatest in history.

So what makes Spain so good? Fans, pundits and sports journalists all point to Spain’s famous strategy of accurate quick-fire passing, known as the tiki-taka style. It’s easy to spot and fabulous to watch, as the game on Sunday proved. But it’s much harder to describe and define.

That looks set to change. Today, Javier Lopez Pena at University College London and Hugo Touchette at Queen Mary University of London reveal an entirely new way to analyse and characterise the performance of soccer teams and players using network theory.

They say their approach produces a quantifiable representation of a team’s style, identifies key individuals and highlights potential weaknesses.

Their idea is to think of each player as a node in a network and each pass as an edge that connects nodes. They then distribute the nodes in a way that reflects the playing position of each player on the pitch.

Add to the graph representation links to related resources and analysis, can you say topic map?

I wonder when they will think about adding the officials and what fouls they usually call on what players?


In full: A network theory analysis of football strategies

Abstract:

We showcase in this paper the use of some tools from network theory to describe the strategy of football teams. Using passing data made available by FIFA during the 2010 World Cup, we construct for each team a weighted and directed network in which nodes correspond to players and arrows to passes. The resulting network or graph provides a direct visual inspection of a team’s strategy, from which we can identify play pattern, determine hot-spots on the play and localize potential weaknesses. Using different centrality measures, we can also determine the relative importance of each player in the game, the `popularity’ of a player, and the effect of removing players from the game.

I first saw this at Four short links: 4 July 2012 by Nat Torkington.

Genome-scale analysis of interaction dynamics reveals organization of biological networks

Filed under: Bioinformatics,Biomedical,Genome,Graphs,Networks — Patrick Durusau @ 5:25 am

Genome-scale analysis of interaction dynamics reveals organization of biological networks by Jishnu Das, Jaaved Mohammed, and Haiyuan Yu. (Bioinformatics (2012) 28 (14): 1873-1878. doi: 10.1093/bioinformatics/bts283)

Summary:

Analyzing large-scale interaction networks has generated numerous insights in systems biology. However, such studies have primarily been focused on highly co-expressed, stable interactions. Most transient interactions that carry out equally important functions, especially in signal transduction pathways, are yet to be elucidated and are often wrongly discarded as false positives. Here, we revisit a previously described Smith–Waterman-like dynamic programming algorithm and use it to distinguish stable and transient interactions on a genomic scale in human and yeast. We find that in biological networks, transient interactions are key links topologically connecting tightly regulated functional modules formed by stable interactions and are essential to maintaining the integrity of cellular networks. We also perform a systematic analysis of interaction dynamics across different technologies and find that high-throughput yeast two-hybrid is the only available technology for detecting transient interactions on a large scale.

Research of obvious importance to anyone investigating biological networks but I mention it for the problem of how to represent transient relationships/interactions in a network?

Assuming a graph/network typology, how does a transient relationship impact a path traversal?

Assuming a graph/network typology, do we ignore the transience for graph theoretical properties such as shortest path?

Do we need graph theoretical queries versus biological network queries? Are the results always the same?

Can transient relationships results in transient properties? How do we record those?

Better yet, how do we ignore transient properties and under what conditions? (Leaving to one side how we would formally/computationally accomplish that ignorance.) What are the theoretical issues?

You can find the full text of this article at Professor Yu’s site: http://yulab.icmb.cornell.edu/PDF/Das_B2012.pdf

July 6, 2012

Tutorial on biological networks [The Heterogeneity of Nature]

Filed under: Bioinformatics,Biomedical,Graphs,Heterogeneous Data,Networks — Patrick Durusau @ 3:54 pm

Tutorial on biological networks by Francisco G. Vital-Lopez, Vesna Memišević, and Bhaskar Dutta. (Vital-Lopez, F. G., Memišević, V. and Dutta, B. (2012), Tutorial on biological networks. WIREs Data Mining Knowl Discov, 2: 298–325. doi: 10.1002/widm.1061)

Abstract:

Understanding how the functioning of a biological system emerges from the interactions among its components is a long-standing goal of network science. Fomented by developments in high-throughput technologies to characterize biomolecules and their interactions, network science has emerged as one of the fastest growing areas in computational and systems biology research. Although the number of research and review articles on different aspects of network science is increasing, updated resources that provide a broad, yet concise, review of this area in the context of systems biology are few. The objective of this article is to provide an overview of the research on biological networks to a general audience, who have some knowledge of biology and statistics, but are not necessarily familiar with this research field. Based on the different aspects of network science research, the article is broadly divided into four sections: (1) network construction, (2) topological analysis, (3) network and data integration, and (4) visualization tools. We specifically focused on the most widely studied types of biological networks, which are, metabolic, gene regulatory, protein–protein interaction, genetic interaction, and signaling networks. In future, with further developments on experimental and computational methods, we expect that the analysis of biological networks will assume a leading role in basic and translational research.

As a frozen artifact in time, I would suggest reading this article before it is too badly out of date. It will be sad to see it ravaged by time and pitted by later research that renders entire sections obsolete. Or of interest only to medical literature spelunkers of some future time.

Developers of homogeneous and “correct” models of biological networks should take warning from the closing lines of this survey article:

Currently different types of networks, such as PPI, GRN, or metabolic networks are analyzed separately. These heterogeneous networks have to be integrated systematically to generate comprehensive network, which creates a realistic representation of biological systems.[cite omitted] The integrated networks have to be combined with different types of molecular profiling data that measures different facades of the biological system. A recent multi institutional collaborative project, named The Cancer Genome Atlas,[cite omitted] has already started generating much multi-‘omics’ data for large cancer patient cohorts. Thus, we can expect to witness an exciting and fast paced growth on biological network research in the coming years.

Interesting.

Nature uses heterogeneous networks, with great success.

We can keep building homogenous networks or we can start building heterogeneous networks (at least to the extent we are capable).

What do you think?

July 5, 2012

Mosaic: making biological sense of complex networks

Filed under: Bioinformatics,Gene Ontology,Genome,Graphs,Networks — Patrick Durusau @ 12:14 pm

Mosaic: making biological sense of complex networks by Chao Zhang, Kristina Hanspers, Allan Kuchinsky, Nathan Salomonis, Dong Xu, and Alexander R. Pico. (Bioinformatics (2012) 28 (14): 1943-1944. doi: 10.1093/bioinformatics/bts278)

Abstract:

We present a Cytoscape plugin called Mosaic to support interactive network annotation, partitioning, layout and coloring based on gene ontology or other relevant annotations.

From the Introduction:

The increasing throughput and quality of molecular measurements in the domains of genomics, proteomics and metabolomics continue to fuel the understanding of biological processes. Collected per molecule, the scope of these data extends to physical, genetic and biochemical interactions that in turn comprise extensive networks. There are software tools available to visualize and analyze data-derived biological networks (Smoot et al., 2011). One challenge faced by these tools is how to make sense of such networks often represented as massive ‘hairballs’. Many network analysis algorithms filter or partition networks based on topological features, optionally weighted by orthogonal node or edge data (Bader and Hogue, 2003; Royer et al., 2008). Another approach is to mathematically model networks and rely on their statistical properties to make associations with other networks, phenotypes and drug effects, sidestepping the issue of making sense of the network itself altogether (Machado et al., 2011). Acknowledging that there is still great value in engaging the minds of researchers in exploratory data analysis at the level of networks (Kelder et al., 2010), we have produced a Cytoscape plugin called Mosaic to support interactive network annotation and visualization that includes partitioning, layout and coloring based on biologically relevant ontologies (Fig. 1). Mosaic shows slices of a given network in the visual language of biological pathways, which are familiar to any biologist and are ideal frameworks for integrating knowledge.

[Fig. 1 omitted}

Cytoscape is a free and open source network visualization platform that actively supports independent plugin development (Smoot et al., 2011). For annotation, Mosaic relies primarily on the full gene ontology (GO) or simplified ‘slim’ versions (http://www.geneontology.org/GO.slims.shtml). The cellular layout of partitioned subnetworks strictly depends on the cellular component branch of GO, but the other two functions, partitioning and coloring, can be driven by any annotation associated with a major gene or protein identifier system.

You will need:

As per the Mosaic project page.

The Mosaic page offers additional documentation, which will take a while to process. I am particularly interested in annotations of the network driving partitioning.

GraphStream 1.1

Filed under: Graphs,Networks — Patrick Durusau @ 10:28 am

GraphStream 1.1 was released November, 2011. Sorry, ran across slides from a recent presentation and thence the updated release.

From the release notes:

We are happy to announce a new minor release of GraphStream stable version, 1.1. We hope it will fulfill your needs and that you will enjoy the new features that come with it. As usual, please do not hesitate to provide us with your comments through the mailing list and to submit bugs on the issue tracking system.

What is new in release 1.1?

  • GraphStream 1.1 supports most of the commonly used graph file formats (DOT, GML, GEXF, Pajek, GraphML, TLP). It can read files in these formats thus making the interface with other graph libraries easier. Some of these parsers (DOT, GML, Pajek, TLP) are (re)written using a JavaCC grammar to reproduce the exact format specifications.
  • There is a new way to access graph elements (nodes and edges) by index in addition to the access by identifier. The access by index is faster and allows easy interfacing with APIs that use arrays.
  • New methods are added to Graph and Node interfaces for more flexibility. In general, there are three ways to pass a graph element to a method: by id, by index and by reference.
  • The Graph implementations (AdjacencyListGraph, SingleGraph and MultiGraph) were completely rewritten. The common code (Sink and Source implementation) was refactored. The new implementations are more stable and provide faster access and iteration (especially breadth-first and depth-first iteration) with almost no memory overhead.
  • Concept of “Camera” has been extracted from the previous implementation. With this new version, each view of a viewer has to return a camera object. This object allows to get informations about the view (view center, zoom, etc …), to control this view and to convert pixels to graphic units and vice-versa.
  • There is a new directive in the DGS specifications. This directive, called “cl”, is linked to the “graphCleared()” event of a sink.
  • Dijkstra’s algorithm was reimplemented. The new implementation is much faster. The API has slightly changed.
  • With the help of our users many bugs were detected and fixed. Special thanks to all of them for their feedback.

The presentation?

Dynamic Graphs… …and a tool to handle them 6th Complex Systems Summer School Paris, July 5th 2012.

On GitHub: github.com/organizations/graphstream

Connect the Stars (Graphs Anyone?)

Filed under: Graphs,Mathematics,Networks — Patrick Durusau @ 7:59 am

Connect the Stars (How papers are like constellations ) by KW Regan.

From the post:

Bob Vaughan is a mathematician at Penn State University. He is also a Fellow of the Royal Society—not ours, Ben Franklin helped make it tough for us to have one about 236 years ago this Wednesday. He is a great expert on analytic number theory, especially applied to the prime numbers. His work involves the deep connections between integers and complex numbers that were first charted by Leonhard Euler in the time of Franklin.

Today we examine how connections are made in the literature, and how choosing them influences our later memory of what is known and what is not.

Proved mathematical statements are like stars of various magnitudes: claim, proposition, lemma, theorem… A paper usually connects several of the former kinds to a few bright theorems. Often there are different ways the connections could go, and a lengthened paper may extend them to various corollaries and other theorems. Thus we can get various constellations even from the same stars. Consider the Big Dipper and the larger Ursa Major:

I lack the mathematical chops to follow the substance of the post but can read along to see the connections that were made at different times by different people that contributed to what is reported as the present state of knowledge.

How to capture that, dare I say network/graph of interconnections?

Search seems haphazard and lossy.

Writing it out in prose monographs or articles isn’t much better because you still have to find the monograph or article.

What if there were a dynamic network/graph of connections that is overlaid and grows with publications? Both in the way of citations but less formal connections and to less than an entire article?

The social life of research as it is read, assimilated, used, revised and extended by members of a discipline.

That is to say that research isn’t separate from us, research is us. It is as much a social phenomena as prose, plays or poetry. Just written in a different style.

July 4, 2012

Plane Old Networks

Filed under: Graphs,Networks,Visualization — Patrick Durusau @ 7:29 pm

Plane Old Networks by Skye Bender-deMoll.

From the post:

This is a catchall post to collect together a number of interesting network images I’ve run across in the last few years. The common feature is that they are all networks that are based in or arise from geography or spatial processes. Unlike most of the networks we often have to work with, these are mostly “planar” (or nearly so) meaning that they can usually be drawn in two dimensions with minimal crossing and distortion.

I had to reference this post because the networks are interesting and my hometown gets mentioned on one of them. 😉

I do think the author is correct when he speculates:

I have a hunch (but no stats to back it up) that the sorts of networks generated by process that essentially operate on an a flat substrate may be structurally different (have certain specific network properties) than the kinds of networks generated from processes like citations, campaign contributions, ownership relations, or other less-geographic systems.

Assuming there are different network properties, my question would be what underlying cause creates that difference?

July 2, 2012

Bisociative Knowledge Discovery

Filed under: Bisociative,Graphs,Knowledge Discovery,Marketing,Networks,Topic Maps — Patrick Durusau @ 8:43 am

Bisociative Knowledge Discovery: An Introduction to Concept, Algorithms, Tools, and Applications by Michael R. Berthold. (Lecture Notes in Computer Science, Volume 7250, 2012, DOI: 10.1007/978-3-642-31830-6)

The volume where Berthold’s Towards Bisociative Knowledge Discovery appears.

Follow the links for article abstracts and additional information. “PDFs” are available under Springer Open Access.

If you are familiar with Steve Newcomb’s universes of discourse, this will sound hauntingly familiar.

How will diverse methodologies of bisociative knowledge discovery, being in different universes of discourse, interchange information?

Topic maps anyone?

June 30, 2012

Simple network diagrams in R

Filed under: Graphs,Networks,R — Patrick Durusau @ 6:51 pm

Simple network diagrams in R by Steve Powell.

From the post:

Why study networks?

Development and aid projects these days are more and more often focussing on supporting networks, so tools to analyse networks are always welcome.

In this post I am going to present a very easy-to-use package for the stats program R which makes nice-looking graphs of these kinds of networks.

In a recent project for a client, one of the outcomes is to improve how a bunch of different local and regional organisations work together. The local organisations in particular are associated with one of three ethnicities, and one project goal is to encourage these organisations to work with one another other as peers.

One tool we used to look at this is the old friend of the educational psychologist, the sociogram. We made a numbered list of about 80 relevant local and regional organisations. Then we sent this list to each of the local organisations and asked them to list the five with which they communicated the most, the five with which they cooperated the most, and the five which they think make the biggest contribution to solving their collective problems.

You won’t always need to spin up a server farm at Amazon or Google for preliminary data analysis.

A tool for quick and dirty analysis but also capable of client friendly presentation.

June 25, 2012

Scholarly network similarities…

Filed under: Networks,Similarity — Patrick Durusau @ 4:40 pm

Scholarly network similarities: How bibliographic coupling networks, citation networks, cocitation networks, topical networks, coauthorship networks, and coword networks relate to each other by Erjia Yan and Ying Ding. (Journal of the American Society for Information Science and Technology Volume 63, Issue 7, pages 1313–1326, July 2012)

Abstract:

This study explores the similarity among six types of scholarly networks aggregated at the institution level, including bibliographic coupling networks, citation networks, cocitation networks, topical networks, coauthorship networks, and coword networks. Cosine distance is chosen to measure the similarities among the six networks. The authors found that topical networks and coauthorship networks have the lowest similarity; cocitation networks and citation networks have high similarity; bibliographic coupling networks and cocitation networks have high similarity; and coword networks and topical networks have high similarity. In addition, through multidimensional scaling, two dimensions can be identified among the six networks: Dimension 1 can be interpreted as citation-based versus noncitation-based, and Dimension 2 can be interpreted as social versus cognitive. The authors recommend the use of hybrid or heterogeneous networks to study research interaction and scholarly communications.

Interesting that I should come across this article after posting about data sets. See http://info.slis.indiana.edu/~eyan/papers/citation/ the post-processing data and interactive visualizations that are reproduced as static views in the article.

At page 1323 the authors say:

In addition to social networks versus information networks, another distinction of real connection-based networks versus artificial connection-based networks can be made. Coauthorship networks and citation networks are constructed based on real connections, whereas cocitation, bibliographic coupling, topical, and coword networks are constructed based on artificial connections,5 usually in the form of similarity measurements.

I am not sure about the “real” versus “artificial” connection that comes so easily to hand. In part because authors, in my experience, tend to use terminology similar to other scholars with who they agree. So the connection between the work of scholars isn’t “artificial,” although it isn’t accounted for in this study.

There is much to admire and even imitate in this article but the interaction between scholars is more complex than its representation by the networks here.

June 22, 2012

Uncovering disassortativity in large scale-free networks

Filed under: Disassortativity,Networks,Scale-Free — Patrick Durusau @ 3:09 pm

Uncovering disassortativity in large scale-free networks by Nelly Litvak and Remco van der Hofstad.

Abstract:

Mixing patterns in large self-organizing networks, such as the Internet, the World Wide Web, social and biological networks are often characterized by degree-degree {dependencies} between neighbouring nodes. In this paper we propose a new way of measuring degree-degree {dependencies}. We show that the commonly used assortativity coefficient significantly underestimates the magnitude of {dependencies}, especially in large disassortative networks. We mathematically explain this phenomenon and validate the results on synthetic graphs and real-world network data. As an alternative, we suggest to use rank correlation measures such as the well-known Spearman’s rho. Our experiments convincingly show that Spearman’s rho produces consistent values in graphs of different sizes but similar structure, and it is able to reveal strong (positive or negative) {dependencies} in large graphs. In particular, using the Spearman’s rho we show that preferential attachment model exhibits significant negative degree-degree {dependencies}. We also discover much stronger negative {degree-degree dependencies} in Web graphs than was previously thought. We conclude that rank correlations provide a suitable and informative method for uncovering network mixing patterns.

If you are using graphs/networks and your analysis relies on dependencies between nodes, this could be of interest.

Graphs that involve large numbers of nodes in terrorism analysis, for example.

Being mindful that one person’s “terrorist” is another person’s defender of the homeland.

June 12, 2012

Network Medicine: Using Visualization to Decode Complex Diseases

Filed under: Bioinformatics,Biomedical,Genome,Graphs,Networks — Patrick Durusau @ 6:26 pm

Network Medicine: Using Visualization to Decode Complex Diseases

From the post:

Albert Làszló Barabàsi is a physicist, but maybe best known for his work in the field of network theory. In his TEDMED talk titled “Network Medicine: A Network Based Approach to Decode Complex Diseases” [tedmed.com], Albert-Làszló applies advanced network theory to the field of biology.

Using a metaphor of Manhattan maps, he explains how an all-encompassing map of the relationships between genes, proteins and metabolites can form the key to truly understand the mechanisms behind many diseases. He further makes the point that diseases should not be divided up in organ-based separate branches of medicin, but rather as a tightly interconnected network.

More information and movies at the post (information aesthetics)

Turns out that relationships (can you say graph/network?) are going to be critical in the treatment of disease. (Not treatment of symptoms, treatment of disease.)

June 10, 2012

Inferring General Relations between Network Characteristics from Specific Network Ensembles

Filed under: Networks,Sampling,Topology — Patrick Durusau @ 8:17 pm

Inferring General Relations between Network Characteristics from Specific Network Ensembles by Stefano Cardanobile, Volker Pernice, Moritz Deger, and Stefan Rotter.

Abstract:

Different network models have been suggested for the topology underlying complex interactions in natural systems. These models are aimed at replicating specific statistical features encountered in real-world networks. However, it is rarely considered to which degree the results obtained for one particular network class can be extrapolated to real-world networks. We address this issue by comparing different classical and more recently developed network models with respect to their ability to generate networks with large structural variability. In particular, we consider the statistical constraints which the respective construction scheme imposes on the generated networks. After having identified the most variable networks, we address the issue of which constraints are common to all network classes and are thus suitable candidates for being generic statistical laws of complex networks. In fact, we find that generic, not model-related dependencies between different network characteristics do exist. This makes it possible to infer global features from local ones using regression models trained on networks with high generalization power. Our results confirm and extend previous findings regarding the synchronization properties of neural networks. Our method seems especially relevant for large networks, which are difficult to map completely, like the neural networks in the brain. The structure of such large networks cannot be fully sampled with the present technology. Our approach provides a method to estimate global properties of under-sampled networks in good approximation. Finally, we demonstrate on three different data sets (C. elegans neuronal network, R. prowazekii metabolic network, and a network of synonyms extracted from Roget’s Thesaurus) that real-world networks have statistical relations compatible with those obtained using regression models.

The key insight is that sampling can provide the basis for reliable estimates of global properties of networks too large to fully model.

I first saw this at: Understanding Complex Relationships: How Global Properties of Networks Become Apparent Locally.

If you don’t already get one of the ScienceDaily newsletters, you should consider it.

May 30, 2012

Graph/Network World View

Filed under: Graphs,Networks — Patrick Durusau @ 12:28 pm

Or should I say: graph/network view of the world?

To name just a few of the available graph databases:

AllegroGraph

FlockDB

GraphDB

HypergraphDB

InfiniteGraph

InfoGrid

Neo4j

OrientDB

A more extensive list: Graph Database at Wikipedia.

Asking because graph/network software popular, it may not be the best for you. Or your data.

With Facebook at 900 million users, social networks are on the tip of the tongue of just about everyone.

What if you are interested in a subset of a social network? Or a subset of characteristics in a social network? Or a “social network” with strict limits on what can appear?

Sounds like the schema for a relational database doesn’t it? Has specified properties, relationships (foreign keys), tables, etc.

Is it surprising a schema can be viewed as a network? Albeit one with predefined limits and contours?

But a schema, the relational kind, can be implemented and optimized for particular operations. Those may be operations that are of interest to you.

Or not.

You may have a need for a greater range of operations or different operations than are supported by a relational schema.

One of the NoSQL database offerings, viewed as networks, “from a certain point of view,” may be more appropriate.

Or you may need one of the graph databases I listed earlier or that you find elsewhere.

A “graph” is an abstraction onto which you can map relationships, characteristics and capabilities.

A graph/network database comes with built-in relationships, characteristics and capabilities, chosen by its implementers.

Just like relational, NoSQL, New SQL and other databases.

So, saying “graph or network” doesn’t mean your requirements are going to be met.

Comparing your requirements to assumed relationships, characteristics and capabilities is up to you.

Introductions to Graph Databases and Theory

Filed under: Graphs,Networks — Patrick Durusau @ 9:42 am

Introductions to Graph Databases and Theory

Links to a couple of introductory videos on graphs and to Graph Theory and Complex Networks: An Introduction by Maarten van Steen.

Life sciences/bioinformatics orientation.

A blog to revisit for graphs + life sciences + bioinformatics.

« Newer PostsOlder Posts »

Powered by WordPress