Semi-structured data and P2P graph databases by Jeff Rose.
From the post:
In a previous post I introduced the Plasma graph query engine that I’ve been working on as part of my thesis project. With Plasma you can declaratively define queries and evaluate them against a graph database. The heart of the system is a library of dataflow query operators, and on top of them sits a fairly simplistic query “language”. (I put it in quotes because in a lisp based language like Clojure the line between a mini-language and an API gets blurry.) In this post I’ll write a bit about why I think graph databases could be an interesting foundation for next generation P2P networks, and then I’ll give some examples of performing distributed graph queries using Plasma. First I think it is important to motivate the use of a graph database though. While most of the marketing speak on the web regarding graph databases is all about representing social network data, this is just one of many potential applications.
I am not convinced the categories of “structured,” “semi-structured,” and “unstructured” data are all that helpful.
For example, when did the New Testament become a structured text? Division into chapters? (13th century) Division into verses? (mid-16th century) or is it still “unstructured?” Or the same question for the Tanakh, except there relying on a much richer system of divisions.
If you mean by “structured” a particular form of internal representation and reference, such as are represented to users as relational tables, why not say so? That is a particular form of structuring data, not the only one.
And as Wikipedia observes (Table (Database):
An equally valid representations of a relation is as an n-dimensional chart, where n is the number of attributes (a table’s columns). For example, a relation with two attributes and three values can be represented as a table with two columns and three rows, or as a two-dimensional graph with three points. The table and graph representations are only equivalent if the ordering of rows is not significant, and the table has no duplicate rows.
I take that to mean that I can treat a graph as a data structure with more “structure” as it were.
I am equally unconvinced that P2P networks are the key to avoiding the control and censorship issues of architectures like the Internet. If you think the telcos rolled over quick when asked information for “national security,” just think about your CIO or even your local network administrator. And being P2P means arbitrary peers can pick up the data stream. Want to see the folks in dark shades and cheap suits?
P2P maybe a better technological choice to lessen the chances of censorship, but social institutions that oppose censorship or make it more difficult are equally important, if not more so.