Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 14, 2017

Datasette: instantly create and publish an API for your SQLite databases

Filed under: SQL,SQLite — Patrick Durusau @ 8:01 pm

Datasette: instantly create and publish an API for your SQLite databases by Simon Willison.

From the webpage:

I just shipped the first public version of datasette, a new tool for creating and publishing JSON APIs for SQLite databases.

You can try out out right now at fivethirtyeight.datasettes.com, where you can explore SQLite databases I built from Creative Commons licensed CSV files published by FiveThirtyEight. Or you can check out parlgov.datasettes.com, derived from the parlgov.org database of world political parties which illustrates some advanced features such as SQLite views.

That sounds really great but then I read:


Or you can try it out on your own machine. If you run OS X and use Google Chrome, try running the following:

pip3 install datasette
datasette ~/Library/Application\ Support/Google/Chrome/Default/History

This will start a web server on http://127.0.0.1:8001/ displaying an interface that will let you browse your Chrome browser history, which is conveniently stored in a SQLite database.

Warning – Warning:: Don’t have datasette on your laptop at a conference. Yes?

Other than the caution about your own security, this looks very cool!

Enjoy!

March 20, 2016

sqlite3 test suite

Filed under: Cybersecurity,Software,SQL,SQLite — Patrick Durusau @ 2:00 pm

sqlite3 test suite by Nelson Minar.

From the post:

I felt guilty complaining about sqlite3’s source distribution, so I went to look at the real source, what the authors work with. It’s not managed by git but rather in Fossil (an SCM written by the sqlite3 author). Happily the web view is quite good.

One of the miraculous things about sqlite3 is its incredible test suite. There are 683,932 lines of test code. Compare to 273,000 lines of C code for the library and all its extensions. sqlite3 has a reputation for being solid and correct. It’s not an accident.

The test size is overcounted a bit because there’s a lot of test data. For instance the test for the Porter Stemmer is 24k lines of code, but almost all of that is a giant list of words and their correct stemming. Still very useful tests! But not quite as much human effort as it looks on first blush.

Just a quick reminder that test suites have the same mixture of code and data subjects as the code being tested.

So your software passes the test. What was being tested? What was not (the weird machines input) being tested?

If you don’t think that is a serious question, consult the page of SQLite vulnerabilities.

I saw this in a tweet by Julia Evans.

March 11, 2015

termsql

Filed under: Spark,SQL,SQLite,Text Analytics — Patrick Durusau @ 2:06 pm

termsql

From the webpage:

Convert text from a file or from stdin into SQL table and query it instantly. Uses sqlite as backend. The idea is to make SQL into a tool on the command line or in scripts.

Online manual: http://tobimensch.github.io/termsql

So what can it do?

  • convert text/CSV files into sqlite database/table
  • work on stdin data on-the-fly
  • it can be used as swiss army knife kind of tool for extracting information from other processes that send their information to termsql via a pipe on the command line or in scripts
  • termsql can also pipe into another termsql of course
  • you can quickly sort and extract data
  • creates string/integer/float column types automatically
  • gives you the syntax and power of SQL on the command line

Sometimes you need the esoteric and sometimes not!

Enjoy!

I first saw this in a tweet by Christophe Lalanne.

October 3, 2014

How does SQLite work? Part 2: btrees!…

Filed under: Database,SQLite — Patrick Durusau @ 1:50 pm

How does SQLite work? Part 2: btrees! (or: disk seeks are slow don’t do them!) by Julia Evans.

From the post:

Welcome back to fun with databases! In Part 1 of this series, we learned that:

  • SQLite databases are organized into fixed-size pages. I made an example database which had 1k pages.
  • The pages are all part of a kind of tree called a btree.
  • There are two kinds of pages: interior pages and leaf pages. Data is only stored in leaf pages.

I mentioned last time that I put in some print statements to tell me every time I read a page, like this:

I suspect Chris Granger would consider this as “plumbing” that prevents some users from using computation.

Chris would be right, to a degree, but Julia continues to lower the bar that “plumbing” poses to users.

Looking forward to more untangling and clarifying of SQLite plumbing!

September 29, 2014

How does SQLite work? Part 1: pages!

Filed under: Database,Programming,SQLite — Patrick Durusau @ 4:58 pm

How does SQLite work? Part 1: pages! by Julia Evans.

From the post:

This evening the fantastic Kamal and I sat down to learn a little more about databases than we did before.

I wanted to hack on SQLite, because I’ve used it before, it requires no configuration or separate server process, I’d been told that its source code is well-written and approachable, and all the data is stored in one file. Perfect!

Following Julia down a rabbit hole to program internals encourages you to venture out on your own!

I can’t say why her posts have that quality, but they do.

Enjoy!

I first saw this in a tweet by FoundationDB.

April 14, 2013

Planform:… [Graph vs. SQL?]

Filed under: Bioinformatics,Graphs,SQL,SQLite — Patrick Durusau @ 3:16 pm

Planform: an application and database of graph-encoded planarian regenerative experiments by Daniel Lobo, Taylor J. Malone and Michael Levin. Bioinformatics (2013) 29 (8): 1098-1100. doi: 10.1093/bioinformatics/btt088

Abstract:

Summary: Understanding the mechanisms governing the regeneration capabilities of many organisms is a fundamental interest in biology and medicine. An ever-increasing number of manipulation and molecular experiments are attempting to discover a comprehensive model for regeneration, with the planarian flatworm being one of the most important model species. Despite much effort, no comprehensive, constructive, mechanistic models exist yet, and it is now clear that computational tools are needed to mine this huge dataset. However, until now, there is no database of regenerative experiments, and the current genotype–phenotype ontologies and databases are based on textual descriptions, which are not understandable by computers. To overcome these difficulties, we present here Planform (Planarian formalization), a manually curated database and software tool for planarian regenerative experiments, based on a mathematical graph formalism. The database contains more than a thousand experiments from the main publications in the planarian literature. The software tool provides the user with a graphical interface to easily interact with and mine the database. The presented system is a valuable resource for the regeneration community and, more importantly, will pave the way for the application of novel artificial intelligence tools to extract knowledge from this dataset.

Availability: The database and software tool are freely available at http://planform.daniel-lobo.com.

Watch the video tour for an example of a domain specific authoring tool.

It does not use any formal graph notation/terminology or attempt a new form of ASCII art.

Users can enter data about worms with four (4) heads. That bodes well for new techniques to author topic maps.

On the use of graphs, the authors write:

We have created a formalism based on graphs to encode the resultant morphologies and manipulations of regenerative experiments (Lobo et al., 2013). Mathematical graphs are ideal to encode relationships between individuals and have been previously used to encode morphologies (Lobo et al., 2011). The formalism divided a morphology into adjacent regions (graph nodes) connected to each other (graph edges). The geometrical characteristics of the regions (connection angles, distances, shapes, type, etc.) are stored as node and link labels. Importantly, the formalism permits automatic comparisons between morphologies: we implemented a metric to quantify the difference between two morphologies based on the graph edit distance algorithm.

The experiment manipulations are encoded in a tree structure. Nodes represent specific manipulations (cuts, irradiation and transplantations) where links define the order and relations between manipulations. This approach permits encode the majority of published planarian regenerative experiments.

The graph vs. relational crowd will be disappointed to learn the project uses SQLite (“the most widely deployed SQL database engine in the world”) for the storage/access to its data. 😉

You were aware that hypergraphs were used to model relational databases in the “old days.” Yes?

I will try to pull together some of those publications in the near future.

November 4, 2012

Saving Tweets

Filed under: Javascript,SQLite,Tweets — Patrick Durusau @ 3:24 pm

No, it not another social cause to save X but rather Pierre Lindenbaum saving his own tweets in: Saving your tweets in a database using sqlite, rhino, scribe, javascript.

Requires sqlite, Apache Rhino, Scribe and Apache codec.

Mapping the saved tweets comes to mind. I am sure you can imagine other uses in a network of topic maps.

January 31, 2012

Accelerating SQL Database Operations on a GPU with CUDA (merging spreadsheet data?)

Filed under: CUDA,GPU,Spreadsheets,SQL,SQLite — Patrick Durusau @ 4:33 pm

Accelerating SQL Database Operations on a GPU with CUDA by Peter Bakkum and Kevin Skadron.

Abstract:

Prior work has shown dramatic acceleration for various database operations on GPUs, but only using primitives that are not part of conventional database languages such as SQL. This paper implements a subset of the SQLite command processor directly on the GPU. This dramatically reduces the eff ort required to achieve GPU acceleration by avoiding the need for database programmers to use new programming languages such as CUDA or modify their programs to use non-SQL libraries.

This paper focuses on accelerating SELECT queries and describes the considerations in an efficient GPU implementation of the SQLite command processor. Results on an NVIDIA Tesla C1060 achieve speedups of 20-70X depending on the size of the result set.

Important lessons to be learned from this paper:

  • Don’t invent new languages for the average user to learn.
  • Avoid the need to modify existing programs
  • Write against common software

Remember that 75% of the BI market is still using spreadsheets. For all sorts of data but numeric data in particular.

I don’t have any experience with importing files into Excel but I assume there is a macro language that can used to create import processes.

Curious if there has been any work on creating import macros for Excel that incorporate merging as part of those imports?

That would:

  • Not be a new language for users to learn.
  • Avoid modification of existing programs (or data)
  • Be written against common software

I am not sure about the requirements for merging numeric data but that should make the exploration process all the more enjoyable.

Powered by WordPress