Archive for the ‘Hypergraphs’ Category

Simit: A Language for Physical Simulation

Sunday, August 14th, 2016

Simit: A Language for Physical Simulation by Fredrik Kjolstad, et al.

Abstract:

With existing programming tools, writing high-performance simulation code is labor intensive and requires sacrificing readability and portability. The alternative is to prototype simulations in a high-level language like Matlab, thereby sacrificing performance. The Matlab programming model naturally describes the behavior of an entire physical system using the language of linear algebra. However, simulations also manipulate individual geometric elements, which are best represented using linked data structures like meshes. Translating between the linked data structures and linear algebra comes at significant cost, both to the programmer and to the machine. High-performance implementations avoid the cost by rephrasing the computation in terms of linked or index data structures, leaving the code complicated and monolithic, often increasing its size by an order of magnitude.

In this article, we present Simit, a new language for physical simulations that lets the programmer view the system both as a linked data structure in the form of a hypergraph and as a set of global vectors, matrices, and tensors depending on what is convenient at any given time. Simit provides a novel assembly construct that makes it conceptually easy and computationally efficient to move between the two abstractions. Using the information provided by the assembly construct, the compiler generates efficient in-place computation on the graph. We demonstrate that Simit is easy to use: a Simit program is typically shorter than a Matlab program; that it is high performance: a Simit program running sequentially on a CPU performs comparably to hand-optimized simulations; and that it is portable: Simit programs can be compiled for GPUs with no change to the program, delivering 4 to 20× speedups over our optimized CPU code.

Very deep sledding ahead but consider the contributions:


Simit is the first system that allows the development of physics code that is simultaneously:

Concise. The Simit language has Matlab-like syntax that lets algorithms be implemented in a compact, readable form that closely mirrors their mathematical expression. In addition, Simit matrices assembled from hypergraphs are indexed by hypergraph elements like vertices and edges rather than by raw integers, significantly simplifying indexing code and eliminating bugs.

Expressive. The Simit language consists of linear algebra operations augmented with control flow that let developers implement a wide range of algorithms ranging from finite elements for deformable bodies to cloth simulations and more. Moreover, the powerful hypergraph abstraction allows easy specification of complex geometric data structures.

Fast. The Simit compiler produces high-performance executable code comparable to that of hand-optimized end-to-end libraries and tools, as validated against the state-of-the-art SOFA [Faure et al. 2007] and Vega [Sin et al. 2013] real-time simulation frameworks. Simulations can now be written as easily as a traditional prototype and yet run as fast as a high-performance implementation without manual optimization.

Performance Portable. A Simit program can be compiled to both CPUs and GPUs with no additional programmer effort, while generating efficient code for each architecture. Where Simit delivers performance comparable to hand-optimized CPU code on the same processor, the same simple Simit program delivers roughly an order of magnitude higher performance on a modern GPU in our benchmarks, with no changes to the program.

Interoperable. Simit hypergraphs and program execution are exposed as C++ APIs, so developers can seamlessly integrate with existing C++ programs, algorithms, and libraries.
(emphasis in original)

Additional resources:

http://simit-lang.org/

Getting Started

Simit mailing list

Source code (MIT license)

Enjoy!

Graph for Scala

Monday, December 23rd, 2013

Graph for Scala

From the webpage:

Welcome to scalax.collection.Graph

Graph for Scala provides basic graph functionality that seamlessly fits into the Scala standard collections library. Like members of scala.collection, graph instances are in-memory containers that expose a rich, user-friendly interface.

Graph for Scala also has ready-to-go implementations of JSON-Import/Export and Dot-Export. Database emulation and distributed graph processing are due to be supported.

Backed by the Scala core team, Graph for Scala started in 2011 as an open source project in the EPFL Scala incubator space on Assembla. Meanwhile it is also hosted on Github.

Want to take it for a spin? Grab the latest release to get started, then visit the Core User Guide (Warning: Broken Link)to learn more!

If you follow the “Core” option under “Users Guides” on the top menu bar, you will find: Core User Guide: Introduction, which reads in part:

Why Use Graph for Scala?

The most important reasons why Graph for Scala speeds up your development are:

  • Simplicity: Creating, manipulating and querying Graph is intuitive.
  • Consistency: Graph for Scala seamlessly maintains a consistent state of nodes and edges including prevention of duplicates, intelligent addition and removal.
  • Conformity: As a regular collection class, Graph has the same “look and feel” as other members of the Scala collection framework. Whenever appropriate, result types are Scala collection types themselves.
  • Flexibility: All kinds of graphs including mixed graphs, multi-graphs and hypergraphs are supported.
  • Functional Style: Graph for Scala facilitates a concise, functional style of utilizing graph functionality, including traversals, not seen in Java-based libraries.
  • Extendibility: You can easily customize Graph for Scala to reflect the needs of you application retaining all benefits of Graph.
  • Documentation: Ideal progress curve through adequate documentation.

Look and see!

You will find support for hyperedges, directed hyperedges, edges and directed edges.

Further documentation covers exporting to Dot, moving data into and out of JSON, and constraining graphs.

Scalable Property and Hypergraphs in RDF

Wednesday, November 6th, 2013

From the description:

There is a misconception that Triple Stores are not ‘true’ graph databases because they supposedly do not support Property Graphs and Hypergraphs.

We will demonstrate that Property and Hypergraphs are not only natural to Triple Stores and RDF but allow for potentially even more powerful graph models than non-RDF approaches.

AllegroGraph defends their implementation of Triple Stores as both property and hypergraphs.

The second story (see also A Letter Regarding Native Graph Databases) I have heard in two days based upon an unnamed vendor trash talking other graph databases.

Are graph databases catching on enough for that kind of marketing effort?

BTW, AllegroGraph does have a Free Server Edition Download.

Limited to 5 million triples but that should capture your baseball card collection or home recipe book. 😉

Database and Query Analysis Tools for MySQL:…

Thursday, October 24th, 2013

Database and Query Analysis Tools for MySQL: Exploiting Hypertree and Hypergraph Decompositions by Selvameenal Chokkalingam.

Abstract:

A database is an organized collection of data. Database systems are widely used and have a broad range of applications. It is thus essential to find efficient database query evaluation techniques. In the recent years, new theories and algorithms for database query optimization have been developed that exploit advanced graph theoretic concepts. In particular, the graph theoretic concepts of hypergraphs, hypergraph decompositions, and hypertree decompositions have played an important role in the recent research.

This thesis studies algorithms that employ hypergraph decompositions in order to detect the cyclic or acyclic degree of database schema, and describes implementations of those algorithms. The main contribution of this thesis is a collection of software tools for MySQL that exploit hypergraph properties associated with database schema and query structures.

If you remember hypergraphs from database theory this may be the refresher for you.

I stumbled across it earlier today while running down references on hypergraphs.

Hypergraphs and Cellular Networks

Thursday, October 24th, 2013

Hypergraphs and Cellular Networks by Steffen Klamt, Utz-Uwe Haus, Fabian Theis. (Klamt S, Haus U-U, Theis F (2009) Hypergraphs and Cellular Networks. PLoS Comput Biol 5(5): e1000385. doi:10.1371/journal.pcbi.1000385)

Background:

The understanding of biological networks is a fundamental issue in computational biology. When analyzing topological properties of networks, one often tends to substitute the term “network” for “graph”, or uses both terms interchangeably. From a mathematical perspective, this is often not fully correct, because many functional relationships in biological networks are more complicated than what can be represented in graphs.

In general, graphs are combinatorial models for representing relationships (edges) between certain objects (nodes). In biology, the nodes typically describe proteins, metabolites, genes, or other biological entities, whereas the edges represent functional relationships or interactions between the nodes such as “binds to”, “catalyzes”, or “is converted to”. A key property of graphs is that every edge connects two nodes. Many biological processes, however, are characterized by more than two participating partners and are thus not bilateral. A metabolic reaction such as A+B→C+D (involving four species), or a protein complex consisting of more than two proteins, are typical examples. Hence, such multilateral relationships are not compatible with graph edges. As illustrated below, transformation to a graph representation is usually possible but may imply a loss of information that can lead to wrong interpretations afterward.

Hypergraphs offer a framework that helps to overcome such conceptual limitations. As the name indicates, hypergraphs generalize graphs by allowing edges to connect more than two nodes, which may facilitate a more precise representation of biological knowledge. Surprisingly, although hypergraphs occur ubiquitously when dealing with cellular networks, their notion is known to a much lesser extent than that of graphs, and sometimes they are used without explicit mention.

This contribution does by no means question the importance and wide applicability of graph theory for modeling biological processes. A multitude of studies proves that meaningful biological properties can be extracted from graph models (for a review see [1]). Instead, this contribution aims to increase the communities’ awareness of hypergraphs as a modeling framework for network analysis in cell biology. We will give an introduction to the notion of hypergraphs, thereby highlighting their differences from graphs and discussing examples of using hypergraph theory in biological network analysis. For this Perspective, we propose using hypergraph statistics of biological networks, where graph analysis is predominantly used but where a hypergraph interpretation may produce novel results, e.g., in the context of a protein complex hypergraph.

Like graphs, hypergraphs may be classified by distinguishing between undirected and directed hypergraphs, and, accordingly, we divide the introduction to hypergraphs given below into two major parts.

When I read:

Many biological processes, however, are characterized by more than two participating partners and are thus not bilateral.

I am reminded of Steve Newcomb’s multilateral models of subject identity.

Possibly overkill for some subject identity use cases and just right for other subject identity use cases. A matter of requirements.

This is a very good introduction to hypergraphs that concludes with forty-four (44) references to literature you may find useful.

3 Myths about graph query languages…

Tuesday, October 15th, 2013

3 Myths about graph query languages. Busted by Pixy. by Sridhar Ramachandran.

A very short slide deck that leaves you wanting more information.

It got my attention because I didn’t know there were any myths about graph query languages. 😉

I think the sense of “myth” here is more “misunderstanding” or simply “incorrect information.”

Some references that may be helpful while reading these slides:

I must confess that “Myth #3: GQLs [Graph Query Languages] can’t be relational” has always puzzled me.

In part because hypergraphs have been used to model databases for quite some time.

For example:

Making use of arguments from information theory it is shown that a boolean function can represent multivalued dependencies. A method is described by which a hypergraph can be constructed to represent dependencies in a relation. A new normal form called generalized Boyce-Codd normal form is defined. An explicit formula is derived for representing dependencies that would remain in a projection of a relation. A definition of join is given which makes the derivation of many theoretical results easy. Another definition given is that of information in a relation. The information gets conserved whenever lossless decompositions are involved. It is shown that the use of null elements is important in handling data.

Would you believe: Some analytic tools for the design of relational database systems by K. K. Nambiar in 1980?

So far as I know, hypergraphs are a form of graph so it isn’t true that “graphs can only express binary relations/predicates.”

One difference (there are others) is that a hypergraph database doesn’t require derivation of relationships because those relationships are already captured by a hyperedge.

Moreover, a vertex can (whether it “may” or not in a particular hypergraph is another issue) be a member of more than one hyperedge.

Determination of common members becomes a straight forward query as opposed to two or more derivations of associations and then calculation of any intersection.

For all of that, it remains important as notice of a new declarative graph query language (GQL).

Hypergraph-Based Image Retrieval for Graph-Based Representation

Friday, September 13th, 2013

Hypergraph-Based Image Retrieval for Graph-Based Representation by Salim Jouili and Salvatore Tabbone.

Abstract:

In this paper, we introduce a novel method for graph indexing. We propose a hypergraph-based model for graph data sets by allowing cluster overlapping. More precisely, in this representation one graph can be assigned to more than one cluster. Using the concept of the graph median and a given threshold, the proposed algorithm detects automatically the number of classes in the graph database. We consider clusters as hyperedges in our hypergraph model and we index the graph set by the hyperedge centroids. This model is interesting to traverse the data set and efficient to retrieve graphs.

(Salim Jouili and Salvatore Tabbone, Hypergraph-based image retrieval for graph-based representation. Journal of the Pattern Recognition Society, April 2012. © 2012 Elsevier Ltd.)

From the introduction:

In the present work, we address the problematic of graph indexing using directly the graph domain. We provide a new approach based on the hypergraph model. The main idea of this contribution is first to re-organize the graph space (domain) into a hypergraph structure. In this hypergraph, each vertex is a graph and each hyperedge corresponds to a set of similar graphs. Second, our method uses this hypergraph structure to index the graph set by making use of the centroids of the hyperedges as index entries. By this way, our method does not need to store additional information about the graph set. In fact, our method creates an index that contains only pointers to some selected graphs from the dataset which is an interesting feature, especially, in the case of large datasets. Besides indexing, our method addresses also the navigation problem in a database of images represented by graphs. Thanks to the hypergraph structure, the navigation through the data set can be performed by a classical traversal algorithm. The experimental results show that our method provides good performance in term of indexing for tested image databases as well as for a chemical database containing about 35,000 graphs, which points out that the proposed method is scalable and can be applied in different domains to retrieve graphs including clustering, indexing and navigation steps.

Sounds very exciting until I think about the difficulty of constructing a generalized “semantic centroid.”

For example, what is the semantic distance between black and white?

Was disambiguation of black and white a useful thing. Yes/No?

Suggestions on how to develop domain specific “semantic centroids?”

HyperGraphDB (Google+)

Wednesday, June 12th, 2013

HyperGraphDB (Google+)

Jack Park forwarded a note pointing out this Google+ group for HyperGraphDB.

Not a lot of traffic, must be waiting on you!

See also: HyperGraphDB 1.2 Final for links to the software.

ConceptNet5 [Herein of Hypergraphs]

Friday, May 17th, 2013

ConceptNet5

From the website:

ConceptNet is a semantic network containing lots of things computers should know about the world, especially when understanding text written by people.

It is built from nodes representing concepts, in the form of words or short phrases of natural language, and labeled relationships between them. These are the kinds of things computers need to know to search for information better, answer questions, and understand people’s goals. If you wanted to build your own Watson, this should be a good place to start!

ConceptNet contains everyday basic knowledge:

(…)

ConceptNet 5 is a graph

To be precise, it’s a hypergraph, meaning it has edges about edges. Each statement in ConceptNet has justfications pointing to it, explaining where it comes from and how reliable the information seems to be.

Previous versions of ConceptNet has been distributed as idiosyncratic database structures plus some software to interact with them. ConceptNet 5 is not a piece of software or a database; it is a graph. It’s a set of nodes and edges, which we can represent in multiple formats including JSON. You probably know better than we do what software you want to use to interact with it!

(That said, you can have our idiosyncratic Solr index if you want, but that’s not ConceptNet, it’s just a system for quickly looking things up in ConceptNet.)

Some other interesting properties:

  • The ConceptNet graph is ID-less. Every node and assertion contains all the information necessary to identify it and no more in its URI, and does not rely on arbitrarily-assigned IDs. The advantage of this is that if multiple branches of ConceptNet are developed in multiple places, we can later merge them simply by taking the union of the nodes and edges. (And we hope for this to happen!)
  • ConceptNet supports linked data: you can download a list of links to the greater Semantic Web, via DBPedia and via RDF/OWL WordNet. For example, our concept cat is linked to the DBPedia node at http://dbpedia.org/resource/Cat.

In addition to being a data source, interesting notion of “ID-less” nodes and edges.

Information on the software setup, Solr and Python to deliver ConceptNet5 as a hypergraph is also available.

I first saw this in Max De Marzi’s Knowledge Bases in Neo4j. You will find that Max’s approach involves dumbing down the hypergraph.

Why Hypergraphs?

Thursday, April 25th, 2013

Why Hypergraphs? by Linas Vepstas.

From the post:

OpenCog uses hypergraphs to represent knowledge. Why? I don’t think this is clearly, succinctly explained anywhere, so I will try to do so here. This is a very important point: I can’t begin to tell you how many times I went searching for some whiz-bang logic programming system, or inference engine, or theorem-prover, or some graph re-writing engine, or some probabilistic programming system, only to throw up my hands up and realize that, after many wasted hours, none of them do what I want. If you’re interested in AGI, then let me assure you: they don’t do what you want, either. So, what do I want them to do, and why?

Well, lets begin easy: with graph re-writing systems. These days, almost everyone agrees that a great way to represent knowledge is with graphs. The structure IsA(Cat, Animal) looks like a graph with two vertexes, Cat and Animal, and a labelled edge, IsA, between them. If I also know that IsA(Binky, Cat), then, in principle, I should be able to deduce that IsA(Binky, Animal). This is a simple transitive relationship, and the act of logical deduction, for this example, is a simple graph re-write rule: If you see two IsA edges in a row, you should draw a third IsA edge between the first and the last vertex. Easy, right?

So perhaps you’d think that all logic induction and reasoning engines have graph rewrite systems at their core, right? So you’d think. In fact, almost none of them do. And those that do, do it in some internal, ad hoc, non-public, undocumented way: there’s no API, its not exposed externally; its not an ‘official’ part of the system for you to use or tinker with.

You know how I feel about AI triumphalism so I won’t bother to repeat the rant.

However, the hypergraph part of this work looks interesting. Whatever your views on AI.

A good place to start would be the OpenCog Development page.

HyperGraphDB: A Generalized Graph Database

Thursday, February 14th, 2013

HyperGraphDB: A Generalized Graph Database by Borislav Iordanov.

Abstract:

We present HyperGraphDB, a novel graph database based on generalized hypergraphs where hyperedges can contain other hyperedges. This generalization automatically reifies every entity expressed in the database thus removing many of the usual difficulties in dealing with higher-order relationships. An open two-layered architecture of the data organization yields a highly customizable system where specific domain representations can be optimized while remaining within a uniform conceptual framework. HyperGraphDB is an embedded, transactional database designed as a universal data model for highly complex, large scale knowledge representation applications such as found in artificial intelligence, bioinformatics and natural language processing.

A formal treatment of HyperGraphDB.

Merits being printed out and given a slow read.

Borisla comments on both RDF and Topic Maps:

…Two other prominent issues are contextuality (scoping) and reification.

Those and other considerations from semantic web research disappear or find natural solutions in the model implemented by HyperGraphDB.

But when I search the paper, scoping comes up in an NLP example as:

The tree-like structure of the document is also recorded in HyperGraphDB with scoping parent-child binary links between (a) the document and its paragraphs, (b) a paragraph and its sentences, (c) a sentence and each linguistic relationship inferred from it.

Scoping at least in one sense of the word, but not the in the sense of say a name being “scoped” by the language French.

Reification, other than the discussion of RDF and topic maps, doesn’t appear again in the paper.

As I said, it needs a slow read but if you see something about scoping and/or reification that I have missed, please give a shout!

Hypergraph-based multidimensional data modeling…

Thursday, February 14th, 2013

Hypergraph-based multidimensional data modeling towards on-demand business analysis by Duong Thi Anh Hoang, Torsten Priebe and A. Min Tjoa. (Proceeding iiWAS ’11 Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services Pages 36-43 )

Abstract:

In the last few years, web-based environments have witnessed the emergence of new types of on-demand business analysis that facilitate complex and integrated analytical information from multidimensional databases. In these on-demand environments, users of business intelligence architectures can have very different reporting and analytical needs, requiring much greater flexibility and adaptability of today’s multidimensional data modeling. While structured data models for OLAP have been studied in detail, a majority of current approaches has not put its focus on the dynamic aspect of the multidimensional design and/or semantic enriched impact model. Within the scope of this paper, we present a flexible approach to model multidimensional databases in the context of dynamic web-based analysis and adaptive users’ requirements. The introduced formal approach is based on hypergraphs with the ability to provide formal constructs specifying the different types of multidimensional elements and relationships which enable the support of highly customized business analysis. The introduced hypergraphs are used to formally define the semantics of multidimensional models with unstructured ad-hoc analytic activities. The proposed model also supports a formal representation of advanced concepts like dynamic hierarchies, many-to-many associations, additivity constraints etc. Some scenario example are also provided to motivate and illustrate the proposed approach.

If you like illustrations of technologies with examples from the banking industry, this is the paper on hypergraphs for you.

Besides, banks are where they keep the money. 😉

Seriously, a very well illustrated introduction to the use of hypergraphs and multidimensional data modeling, plus why multidimensional data models matter to clients. (Another place where they keep money.)

HyperGraphDB 1.2 Final

Thursday, December 27th, 2012

HyperGraphDB 1.2 Final

From the post:

HyperGraphDB is a general purpose, free open-source data storage mechanism. Geared toward modern applications with complex and evolving domain models, it is suitable for semantic web, artificial intelligence, social networking or regular object-oriented business applications.

This release contains numerous bug fixes and improvements over the previous 1.1 release. A fairly complete list of changes can be found at the Changes for HyperGraphDB, Release 1.2 wiki page.

  1. Introduction of a new HyperNode interface together with several implementations, including subgraphs and access to remote database peers. The ideas behind are documented in the blog post HyperNodes Are Contexts.
  2. Introduction of a new interface HGTypeSchema and generalized mappings between arbitrary URIs and HyperGraphDB types.
  3. Implementation of storage based on the BerkeleyDB Java Edition (many thanks to Alain Picard and Sebastian Graf!). This version of BerkeleyDB doesn’t require native libraries, which makes it easier to deploy and, in addition, performs better for smaller datasets (under 2-3 million atoms).
  4. Implementation of parametarized pre-compiled queries for improved query performance. This is documented in the Variables in HyperGraphDB Queries blog post.

HyperGraphDB is a Java based product built on top of the Berkeley DB storage library.

This release dates from November 4, 2012. Apologies for missing the news until now.

Getting Started With Hyperdex

Saturday, August 11th, 2012

Getting Started With Hyperdex by Ṣeyi Ogunyẹ́mi.

From the post:

Alright, let’s start this off with a fitting soundtrack just because we can. Open it up in a tab and come back?

Greetings, valiant adventurer!

So, I heard you care about data. You aren’t storing your precious data in anything that acknowledges PUT requests before being certain it’ll be able to return it to you? Well then, you’ve come to the right place.

Okay, I’m clearly excited, but with good reason. Some time in the past few months, I ran into a paper; “HyperDex: A Distributed, Searchable Key-Value Store”1 from a team at Cornell. By now the typical reaction to NoSQL news tends to be that your eyes glaze over and you start mouthing “…is Web-Scale™”, but this isn’t “yet another NoSQL database”. So, I’ve finally gotten round to writing this piece in hopes of sharing it with others.

Before plunging into the deep end, it’s probably a good idea to discuss why I’ve found HyperDex to be particularly exciting. For reasons that will probably be in a different blog post, I’ve been researching the design of a distributed key/value store with support for strong consistency (for the morbidly curious, it’s connected to Ampify). You must realise that the state-of-the-art distributed key/value stores such as Dynamo (and it’s open-source clone, Riak) tend to aim for eventual consistency.

If you aren’t already experimenting with Hyperdex you may well be after reading this post.

Hypergraphs and Colored Maps

Monday, July 23rd, 2012

Hypergraphs and Colored Maps by James Mallos.

From the post:

A graph, in general terms, is a set of vertices connected by edges. Finding good colorings for the vertices (or edges) of a graph may seem like a hobby interest, but, in fact, graphs bearing certain rule-based colorings represent mathematical objects that are more general than graphs themselves. By bearing colors, graphs let us see objects that could not be quite so easily drawn.

I discovered James’ site, Weave Anything while searching for blogs on hypergraphs.

I recommend it as an entertaining way to learn more about graphs, hypergraphs, hypermaps and similar structures.

Software support for SBGN maps: SBGN-ML and LibSBGN

Friday, July 20th, 2012

Software support for SBGN maps: SBGN-ML and LibSBGN (Martijn P. van Iersel, Alice C. Villéger, Tobias Czauderna, Sarah E. Boyd, Frank T. Bergmann, Augustin Luna, Emek Demir, Anatoly Sorokin, Ugur Dogrusoz, Yukiko Matsuoka, Akira Funahashi, Mirit I. Aladjem, Huaiyu Mi, Stuart L. Moodie, Hiroaki Kitano, Nicolas Le Novère, and Falk Schreiber
Software support for SBGN maps: SBGN-ML and LibSBGN Bioinformatics 2012 28: 2016-2021. )

Warning: Unless you really like mapping and markup languages this is likely to be a boring story. If you do (and I do), it is the sort of thing you will print out and enjoy reading. Just so you know.

Abstract:

Motivation: LibSBGN is a software library for reading, writing and manipulating Systems Biology Graphical Notation (SBGN) maps stored using the recently developed SBGN-ML file format. The library (available in C++ and Java) makes it easy for developers to add SBGN support to their tools, whereas the file format facilitates the exchange of maps between compatible software applications. The library also supports validation of maps, which simplifies the task of ensuring compliance with the detailed SBGN specifications. With this effort we hope to increase the adoption of SBGN in bioinformatics tools, ultimately enabling more researchers to visualize biological knowledge in a precise and unambiguous manner.

Availability and implementation: Milestone 2 was released in December 2011. Source code, example files and binaries are freely available under the terms of either the LGPL v2.1+ or Apache v2.0 open source licenses from http://libsbgn.sourceforge.net.

Contact: sbgn-libsbgn@lists.sourceforge.net

I included the hyperlinks to standards and software for the introduction but not the article references. Those are of interest too but for the moment I only want to entice you to read the article in full. There is a lot of graph work going on in bioinformatics and we would all do well to be more aware of it.

The Systems Biology Graphical Notation (SBGN, Le Novère et al., 2009) facilitates the representation and exchange of complex biological knowledge in a concise and unambiguous manner: as standardized pathway maps. It has been developed and supported by a vibrant community of biologists, biochemists, software developers, bioinformaticians and pathway databases experts.

SBGN is described in detail in the online specifications (see http://sbgn.org/Documents/Specifications). Here we summarize its concepts only briefly. SBGN defines three orthogonal visual languages: Process Description (PD), Entity Relationship (ER) and Activity Flow (AF). SBGN maps must follow the visual vocabulary, syntax and layout rules of one of these languages. The choice of language depends on the type of pathway or process being depicted and the amount of available information. The PD language, which originates from Kitano’s Process Diagrams (Kitano et al., 2005) and the related CellDesigner tool (Funahashi et al., 2008), is equivalent to a bipartite graph (with a few exceptions) with one type of nodes representing pools of biological entities, and a second type of nodes representing biological processes such as biochemical reactions, transport, binding and degradation. Arcs represent consumption, production or control, and can only connect nodes of differing types. The PD language is very suitable for metabolic pathways, but struggles to concisely depict the combinatorial complexity of certain proteins with many phosphorylation states. The ER language, on the other hand, is inspired by Kohn’s Molecular Interaction Maps (Kohn et al., 2006), and describes relations between biomolecules. In ER, two entities can be linked with an interaction arc. The outcome of an interaction (for example, a protein complex), is considered an entity in itself, represented by a black dot, which can engage in further interactions. Thus ER represents dependencies between interactions, or putting it differently, it can represent which interaction is necessary for another one to take place. Interactions are possible between two or more entities, which make ER maps roughly equivalent to a hypergraph in which an arc can connect more than two nodes. ER is more concise than PD when it comes to representing protein modifications and protein interactions, although it is less capable when it comes to presenting biochemical reactions. Finally, the third language in the SBGN family is AF, which represents the activities of biomolecules at a higher conceptual level. AF is suitable to represent the flow of causality between biomolecules even when detailed knowledge on biological processes is missing.

Efficient integration of the SBGN standard into the research cycle requires adoption by visualization and modeling software. Encouragingly, a growing number of pathway tools (see http://sbgn.org/SBGN_Software) offer some form of SBGN compatibility. However, current software implementations of SBGN are often incomplete and sometimes incorrect. This is not surprising: as SBGN covers a broad spectrum of biological phenomena, complete and accurate implementation of the full SBGN specifications represents a complex, error-prone and time-consuming task for individual tool developers. This development step could be simplified, and redundant implementation efforts avoided, by accurately translating the full SBGN specifications into a single software library, available freely for any tool developer to reuse in their own project. Moreover, the maps produced by any given tool usually cannot be reused in another tool, because SBGN only defines how biological information should be visualized, but not how the maps should be stored electronically. Related community standards for exchanging pathway knowledge, namely BioPAX (Demir et al., 2010) and SBML (Hucka et al., 2003), have proved insufficient for this role (more on this topic in Section 4). Therefore, we observed a second need, for a dedicated, standardized SBGN file format.

Following these observations, we started a community effort with two goals: to encourage the adoption of SBGN by facilitating its implementation in pathway tools, and to increase interoperability between SBGN-compatible software. This has resulted in a file format called SBGN-ML and a software library called LibSBGN. Each of these two components will be explained separately in the next sections.

Of course, there is always the data prior to this markup and the data that comes afterwards, so you could say I see a role for topic maps. 😉

Clustering by hypergraphs and dimensionality of cluster systems

Friday, May 11th, 2012

Clustering by hypergraphs and dimensionality of cluster systems by S. Albeverio and S.V. Kozyrev.

Abstract:

In the present paper we discuss the clustering procedure in the case where instead of a single metric we have a family of metrics. In this case we can obtain a partially ordered graph of clusters which is not necessarily a tree. We discuss a structure of a hypergraph above this graph. We propose two definitions of dimension for hyperedges of this hypergraph and show that for the multidimensional p-adic case both dimensions are reduced to the number of p-adic parameters.

We discuss the application of the hypergraph clustering procedure to the construction of phylogenetic graphs in biology. In this case the dimension of a hyperedge will describe the number of sources of genetic diversity.

A pleasant reminder that hypergraphs and hyperedges are simplifications of the complexity we find in nature.

If hypergraphs/hyperedges are simplifications, what would you call a graph/edges?

A simplification of a simplification?

Graphs are useful sometimes.

Useful sometimes doesn’t mean useful at all times.

Neo4j – Hyperedges and Cypher – Suggested Revisions

Friday, March 30th, 2012

Recently “Hyperedges and Cypher” was cited to illustrate “improvements” to Neo4j documentation. It is deeply problematic.

The first paragraph and header read:

5.1 Hyperedges and Cypher

Imagine a user being part of different groups. A group can have different roles, and a user can be part of different groups. He also can have different roles in different groups apart from the membership. The association of a User, a Group and a Role can be referred to as a HyperEdge. However, it can be easily modeled in a property graph as a node that captures this n-ary relationship, as depicted below in the U1G2R1 node.

This is the first encounter of hyperedge (other than in the table of contents) for the reader. The manual offers no definition for or illustration of a “hyperedge.”

When terms are introduced, they need to be defined.

Here is the Neo4j illustration for the preceding description (from the latest milestone release):

Neo4j-Cypher-HyperEdge

I don’t get that graph from the description in the text.

This graph comes closer:

User-Roles-Group

You may object that role1 and role2 should be nodes rather than an edges, but that is a modeling decision, another area where the Neo4j manual is weak. The reader doesn’t share in that process, nodes and edges suddenly appear and the reader must work out why?

If the current prose were cleaned up, by providing a better prose description, modeling choices and alternatives could be illustrated, along with Cypher queries.

On hypergraphs/hyperedges:

A user having different roles in different groups could be modeled with a hyperedge, but not necessarily so. If Neo4j isn’t going to support hyperedges, why bring it up? Show the modeling that Neo4j does support.

If I were going to discuss hyperedges/hypergraphs at all, I would point out examples of where they are used, along with citations to the literature.

Visualization of Hyperedges in Fixed Graph Layouts

Wednesday, March 28th, 2012

Visualization of Hyperedges in Fixed Graph Layouts by Martin Junghans.

Abstract:

Graphs and their visualizations are widely used to communicate the structure of complex data in a formal way. Hypergraphs are dedicated to represent real-world data as they allow to relate multiple objects with each other. However, existing graph drawing techniques lack the ability to embed hyperedges into fixed two-dimensional graph layouts. We utilize a set of curves to visualize hyperedges and employ an energy-based technique to position them in the layout. By avoiding node occlusion and cluster intersections we are able to preserve the expressiveness of the given graph layout. Additionally, we investigate techniques to reduce the visual complexity of hypergraph drawings. A comprehensive evaluation using real-world data sets demonstrates the suitability of the proposed hyperedge layout techniques.

A thesis I ran across today while researching the display of hyperedges.

Graphs are being used for the storage/analysis/visualization of data. Given the history of hypergraphs in CS research, hypergraphs aren’t far behind. Now would be the time to get ahead of the curve, however briefly.

…trimming the spring algorithm for drawing hypergraphs

Tuesday, March 20th, 2012

…trimming the spring algorithm for drawing hypergraphs by Harri Klemetti, Ismo Lapinleimu, Erkki Mäkinen, and Mika Sieranta. ACM SIGCSE Bulletin, Volume 27 Issue 3, Sept. 1995.

Abstract:

Graph drawing problems provide excellent material for programming projects. As an example, this paper describes the results of an undergraduate project which dealt with hypergraph drawing. We introduce a practical method for drawing hypergraphs. The method is based on the spring algorithm, a well-known method for drawing normal graphs.

Not the earliest or the latest on drawing hypergraphs (for which there is apparently no consensus) but something I ran across while researching the issue. Thought it best to write it down so I can refer to it from other posts.

Hypergraphs have a long history with analysis of relational databases and I suspect their applications to modeling NoSQL databases has already happened or at least isn’t far off. Not to mention their relevance to graph databases.

In any event, being able to visualize hypergraphs, by one of more methods, is likely to be useful both for topic map authors and users but other investigators as well.

Bipartite Graphs as Intermediate Model for RDF

Saturday, March 3rd, 2012

Bipartite Graphs as Intermediate Model for RDF by Jonathan Hayes and Claudio Gutierrez.

Abstract:

RDF Graphs are sets of assertions in the form of subject-predicate-object triples of information resources. Although for simple examples they can be understood intuitively as directed labeled graphs, this representation does not scale well for more complex cases, particularly regarding the central notion of connectivity of resources.

We argue in this paper that there is need for an intermediate representation of RDF to enable the application of well-established methods from Graph Theory. We introduce the concept of Bipartite Statement-Value Graph and show its advantages as intermediate model between the abstract triple syntax and data structures used by applications. In the light of this model we explore issues like transformation costs, data/schema structure, the notion of connectivity, and database mappings.

A quite different take on the representation of RDF than in Is That A Graph In Your Cray? Here we encounter hypergraphs for modeling RDF.

Suggestions on how to rank graph representations of RDF?

Or perhaps better, suggestion on how to rank graph representations for use cases?

Putting the question of what (connections/properties) we want to model before the question of how (RDF, etc.) we intend to model it.

Isn’t that the right order?

Comments?

The hypernode model and its associated query language

Wednesday, February 8th, 2012

The hypernode model and its associated query language

Abstract:

A data model called the hypernode model, whose single data structure is the hypernode, is introduced. Hypernodes are graphs whose node set can contain graphs in addition to primitive nodes. Hypernodes can be used to represent arbitrarily complex objects and can support the encapsulation of information, to any level. A declarative logic-based language for the hypernode model is introduced and shown to be query complete. It is then shown that hypernodes can represent extensional functions, nested relations, and composite objects. Thus, the model is at least as expressive as the functional and nested relational database models. It is demonstrated that the hypernode model can be regarded as an object-oriented one.

Interesting departure from hypergraphs with hyperedges, the latter being replaced in this model with hypernodes. Hypernodes consists of a unique label, nodes, which may be primitive or hypernodes, and, edges between nodes.

The authors went on to create an implementation and storage model for this model.

A transient hypergraph-based model for data access

Monday, February 6th, 2012

A transient hypergraph-based model for data access by Carolyn Watters and Michael A. Shepherd.

Abstract:

Two major methods of accessing data in current database systems are querying and browsing. The more traditional query method returns an answer set that may consist of data values (DBMS), items containing the answer (full text), or items referring the user to items containing the answer (bibliographic). Browsing within a database, as best exemplified by hypertext systems, consists of viewing a database item and linking to related items on the basis of some attribute or attribute value.

A model of data access has been developed that supports both query and browse access methods. The model is based on hypergraph representation of data instances. The hyperedges and nodes are manipulated through a set of operators to compose new nodes and to instantiate new links dynamically, resulting in transient hypergraphs. These transient hypergraphs are virtual structures created in response to user queries, and lasting only as long as the query session. The model provides a framework for general data access that accommodates user-directed browsing and querying, as well as traditional models of information and data retrieval, such as the Boolean, vector space, and probabilistic models. Finally, the relational database model is shown to provide a reasonable platform for the implementation of this transient hypergraph-based model of data access.

I call your attention to the line that reads:

The hyperedges and nodes are manipulated through a set of operators to compose new nodes and to instantiate new links dynamically, resulting in transient hypergraphs.

For a topic map to create subject representatives (nodes) and relationships between subjects (edges) dynamically and differently depending upon the user, would be a very useful thing.

Don’t be daunted by the complexity of the proposal.

The authors had a working prototype 22 years ago using a relational database.

(Historical note: You will not find HyTime mentioned in this paper because it was published prior to the first edition of HyTime.)

Dynamic Shortest Path Algorithms for Hypergraphs

Monday, February 6th, 2012

Dynamic Shortest Path Algorithms for Hypergraphs by Jianhang Gao, Qing Zhao, Wei Ren, Ananthram Swami, Ram Ramanathan and Amotz Bar-Noy.

Abstract:

A hypergraph is a set V of vertices and a set of non-empty subsets of V, called hyperedges. Unlike graphs, hypergraphs can capture higher-order interactions in social and communication networks that go beyond a simple union of pairwise relationships. In this paper, we consider the shortest path problem in hypergraphs. We develop two algorithms for finding and maintaining the shortest hyperpaths in a dynamic network with both weight and topological changes. These two algorithms are the first to address the fully dynamic shortest path problem in a general hypergraph. They complement each other by partitioning the application space based on the nature of the change dynamics and the type of the hypergraph.

The applicability of hypergraphs to “…social and communication networks…” should push this item to near the top of your reading list. In addition to the alphabet soup of various government agencies from around the world mining such networks are e-commerce and traditional vendors. Developing solutions and/or having the skills to mine such networks should make you a hot-ticket item.

Cypher Cookbook

Friday, October 14th, 2011

Cypher Cookbook

I have been learning to bake bread so you can imagine my disappointment when I saw “Cypher Cookbook” only to find that Peter was talking about Neo4j queries. Really! 😉

From the first entry:

Hyperedges and Cypher

Imagine a user being part of different groups. A group can have different roles, and a user can be part of different groups. He also can have different roles in different groups apart from the membership. The association of a User, a Group and a Role can be referred to as a HyperEdge. However, it can be easily modeled in a property graph as a node that captures this n-ary relationship, as depicted below in the U1G2R1 node.

The graph model is necessary to illustrate the query but hyperedges need to also be treated under modeling or the current Domain Modeling Galllery. I would argue that full examples should be provided for as many domains as possible. (See how easy it is to assign work to others? I don’t know how soon but I hope to be a contributor in that respect.)

Another “cookbook” section could address importing data into Neo4j. Particularly from some of the larger public databases.

If anyone who wants wider adoption of Neo4j needs a motivating example, consider the number of people who use DocBook (its an XML format) versus ODF or OOXML (used by OpenOffice and MS Office (well, MS Office saves as both). If you want wide adoption (which I personally think is a good idea for graph databases) then use can’t be a test of user dedication or integrity.

…Advanced Graph-­‐Analysis Algorithms on Very Large Graphs

Tuesday, July 26th, 2011

Enabling Rapid Development and Execution of Advanced Graph-­‐Analysis Algorithms on Very Large Graphs by Aydin Buluc, John Gilbert, Adam Lugowski, and, Steve Reinhardt.

Great overview of the Knowledge Discovery Toolkit project and its goals.

From the website:

The Knowledge Discovery Toolbox (KDT) provides domain experts with a simple interface to analyze very large graphs quickly and effectively without requiring knowledge of the underlying graph representation or algorithms. The current version provides a tiny selection of functions on directed graphs, from simple exploratory functions to complex algorithms. Because KDT is open-source, it can be customized or extended by interested (and intrepid) users.

HyperGraphDB

Sunday, June 5th, 2011

HyperGraphDB has changed in appearance since my last visit.

From the website:

HyperGraphDB is a general purpose, open-source data storage mechanism based on a powerful knowledge management formalism known as directed hypergraphs. While a persistent memory model designed mostly for knowledge management, AI and semantic web projects, it can also be used as an embedded object-oriented database for Java projects of all sizes. Or a graph database. Or a (non-SQL) relational database.

Read Alex Popescu’s HyperGraphDB interview with Borislav Iordanov for a high-level overview.

Watch Borislav Iordanov’s HyperGraphDB Presentation at StrangeLoop 2010.

Feature Summary

  • Powerful data modeling and knowledge representation.
  • Graph-oriented storage.
  • N-ary, higher order relationships (edges) between graph nodes.
  • Graph traversals and relational-style queries.
  • Customizable indexing.
  • Customizable storage management.
  • Extensible, dynamic DB schema through custom typing.
  • Out of the box Java OO database.
  • Fully transactional and multi-threaded, MVCC/STM.
  • P2P framework for data distribution.

HyperGraphDB implements TopicMaps 1.0, TuProlog and a number of other models/standards.

Definitely worth taking out for a spin!

HyperGraphDB – Data Management for Complex Systems

Wednesday, December 22nd, 2010

HyperGraphDB – Data Management for Complex Systems Author: Borislav Iordanov

Presentation on the architecture of HyperGraphDB.

Slides and MP3 file are available at the presentation link.

Covers the architecture of HyperGraphDB in just under 20 minutes.

Good for an overview but I would suggest looking at the documentation, etc. for a more detailed view.

The documentation describes its topic map component in part as:

In HGTM, all topic maps constructs are represented as HGDB atoms. The Java classes implementing those atoms are in the package org.hypergraphdb.apps.tm. The API is an almost complete implementation of the 1.0 specification. Everything except merging is implementing. Merging wouldn’t be hard, but I haven’t found the need for it yet.

I will be following up with the HyperGraphDB project on how merging was understood.

Will report back on what comes of that discussion.

Peter McBrien

Saturday, May 22nd, 2010

Peter McBrien focuses on data modeling and integration.

Part of the AutoMed project on database integration. Recent work includes temporal constraints and P2P exchange of heterogeneous data.

Publications (dblp).

Homepage

Databases: Tools and Data for Teaching and Research: Useful collection of datasets and other materials on databases, data modeling and integration.

I first encountered Peter’s research in Comparing and Transforming Between Data Models via an Intermediate Hypergraph Data Model.

From a topic map perspective, the authors assumed the identities of the subjects to which their transformation rules were applied. Someone less familiar with the schema languages could have made other choices.

That’s the hard question isn’t it? How to have reliable integration without presuming a common perspective/interpretation of the schema languages?

*****
PS: This is the first of many posts on researchers working in areas of interest to the topic maps community.