Archive for the ‘SQLite’ Category

Planform:… [Graph vs. SQL?]

Sunday, April 14th, 2013

Planform: an application and database of graph-encoded planarian regenerative experiments by Daniel Lobo, Taylor J. Malone and Michael Levin. Bioinformatics (2013) 29 (8): 1098-1100. doi: 10.1093/bioinformatics/btt088

Abstract:

Summary: Understanding the mechanisms governing the regeneration capabilities of many organisms is a fundamental interest in biology and medicine. An ever-increasing number of manipulation and molecular experiments are attempting to discover a comprehensive model for regeneration, with the planarian flatworm being one of the most important model species. Despite much effort, no comprehensive, constructive, mechanistic models exist yet, and it is now clear that computational tools are needed to mine this huge dataset. However, until now, there is no database of regenerative experiments, and the current genotype–phenotype ontologies and databases are based on textual descriptions, which are not understandable by computers. To overcome these difficulties, we present here Planform (Planarian formalization), a manually curated database and software tool for planarian regenerative experiments, based on a mathematical graph formalism. The database contains more than a thousand experiments from the main publications in the planarian literature. The software tool provides the user with a graphical interface to easily interact with and mine the database. The presented system is a valuable resource for the regeneration community and, more importantly, will pave the way for the application of novel artificial intelligence tools to extract knowledge from this dataset.

Availability: The database and software tool are freely available at http://planform.daniel-lobo.com.

Watch the video tour for an example of a domain specific authoring tool.

It does not use any formal graph notation/terminology or attempt a new form of ASCII art.

Users can enter data about worms with four (4) heads. That bodes well for new techniques to author topic maps.

On the use of graphs, the authors write:

We have created a formalism based on graphs to encode the resultant morphologies and manipulations of regenerative experiments (Lobo et al., 2013). Mathematical graphs are ideal to encode relationships between individuals and have been previously used to encode morphologies (Lobo et al., 2011). The formalism divided a morphology into adjacent regions (graph nodes) connected to each other (graph edges). The geometrical characteristics of the regions (connection angles, distances, shapes, type, etc.) are stored as node and link labels. Importantly, the formalism permits automatic comparisons between morphologies: we implemented a metric to quantify the difference between two morphologies based on the graph edit distance algorithm.

The experiment manipulations are encoded in a tree structure. Nodes represent specific manipulations (cuts, irradiation and transplantations) where links define the order and relations between manipulations. This approach permits encode the majority of published planarian regenerative experiments.

The graph vs. relational crowd will be disappointed to learn the project uses SQLite (“the most widely deployed SQL database engine in the world”) for the storage/access to its data. ;-)

You were aware that hypergraphs were used to model relational databases in the “old days.” Yes?

I will try to pull together some of those publications in the near future.

Saving Tweets

Sunday, November 4th, 2012

No, it not another social cause to save X but rather Pierre Lindenbaum saving his own tweets in: Saving your tweets in a database using sqlite, rhino, scribe, javascript.

Requires sqlite, Apache Rhino, Scribe and Apache codec.

Mapping the saved tweets comes to mind. I am sure you can imagine other uses in a network of topic maps.

Accelerating SQL Database Operations on a GPU with CUDA (merging spreadsheet data?)

Tuesday, January 31st, 2012

Accelerating SQL Database Operations on a GPU with CUDA by Peter Bakkum and Kevin Skadron.

Abstract:

Prior work has shown dramatic acceleration for various database operations on GPUs, but only using primitives that are not part of conventional database languages such as SQL. This paper implements a subset of the SQLite command processor directly on the GPU. This dramatically reduces the eff ort required to achieve GPU acceleration by avoiding the need for database programmers to use new programming languages such as CUDA or modify their programs to use non-SQL libraries.

This paper focuses on accelerating SELECT queries and describes the considerations in an efficient GPU implementation of the SQLite command processor. Results on an NVIDIA Tesla C1060 achieve speedups of 20-70X depending on the size of the result set.

Important lessons to be learned from this paper:

  • Don’t invent new languages for the average user to learn.
  • Avoid the need to modify existing programs
  • Write against common software

Remember that 75% of the BI market is still using spreadsheets. For all sorts of data but numeric data in particular.

I don’t have any experience with importing files into Excel but I assume there is a macro language that can used to create import processes.

Curious if there has been any work on creating import macros for Excel that incorporate merging as part of those imports?

That would:

  • Not be a new language for users to learn.
  • Avoid modification of existing programs (or data)
  • Be written against common software

I am not sure about the requirements for merging numeric data but that should make the exploration process all the more enjoyable.