Archive for the ‘Triplestore’ Category

The Truth About Triplestores [Opaqueness]

Friday, August 22nd, 2014

The Truth About Triplestores

A vendor “truth” document from Ontotext. Not that being from a vendor is a bad thing, but you should always consider the source of a document when evaluating its claims.

Quite naturally I jumped to: “6. Data Integration & Identity Resolution: Identifying the same entity across disparate data sources.”

With so many different databases and systems existing inside any single organization, how do companies integrate all of their data? How do they recognize that an entity in one database is the same entity in a completely separate database?

Resolving identities across disparate sources can be tricky. First, they need to be identified and then linked.

To do this effectively, you need two things. Earlier, we mentioned that through the use of text analysis, the same entity spelled differently can be recognized. Once this happens, the references to entities need to be stored correctly in the triplestore. The triplestore needs to support predicates that can declare two different Universal Resource Indicators (URIs) as one in the same. By doing this, you can align the same real-world entity used in different data sources. The most standard and powerful predicate used to establish mappings between multiple URIs of a single object is owl:sameAs. In turn, this allows you to very easily merge information from multiple sources including linked open data or proprietary sources. The ability to recognize entities across multiple sources holds great promise helping to manage your data more effectively and pinpointing connections in your data that may be masked by slightly different entity references. Merging this information produces more accurate results, a clearer picture of how entities are related to one another and the ability to improve the speed with which your organization operates.

In case you are unfamiliar with owl:sameAS, here is an example from OWL Web Ontology Language Reference

<rdf:Description rdf:about="#William_Jefferson_Clinton">:
  <owl:sameAs rdf:resource="#BillClinton"/>
</rdf:Description>

The owl:sameAs in this case is opaque because there is no way to express why an author thought #William_Jefferson_Clinton and #BillClinton were about the same subject. You could argue that any prostitute in Columbia would recognize that mapping so let’s try a harder case.

<rdf:Description rdf:about="#United States of America">:
  <owl:sameAs rdf:resource="#الولايات المتحدة الأمريكية"/>
</rdf:Description>

Less confident than you were about the first one?

The problem with owl:sameAs is its opaqueness. You don’t know why an author used owl:sameAs. You don’t know what property or properties they saw that caused them to use one of the various understandings of owl:sameAs.

Without knowing those properties, accepting any owl:sameAs mapping is buying a pig in a poke. Not a proposition that interests me. You?

I first saw this in a tweet by graphityhq.

How to Use Graph Databases… [Topic Maps as Graph++?]

Tuesday, November 26th, 2013

You have a choice of titles:

How to Use Graph Databases to Analyze Relationships, Risks and Business Opportunities (YouTube)

Graph Databases, Triple Stores and their uses… (slides of Jans Aasman at Enterprise Data World 2012)

From the description:

Graph databases are one of the new technologies encouraging a rapid re-thinking of the analytics landscape. By tracking relationships – in a network of people, organizations, events and data – and applying reasoning (inference) to the data and connections, powerful new answers and insights are enabled.

This presentation will explain how graph databases work, and how graphs can be used for a number of important functions, including risk management, relationship analysis and the identification of new business opportunities. It will use a case study in the manufacturing sector to demonstrate how complex relationships can be discovered and integrated into analytical systems. For example, what are the repercussions for the supply chain of a major flood in China? Which products are affected by political unrest in Thailand? Has a sub-subcontractor started selling to our competition and what does that mean for us? What happened historically to the price of an important sub-component when the prices for crude oil or any other raw material went up? Lots of answers can be provided by graph (network) analysis that cannot be answered any other way, so it is crucial that business and BI executives learn how to use this important new tool.

At time marks 18:30 to 19:09, major customers who are interested in graph databases.
An impressive list of potential customers.

If you wanted to find comments about this presentation you could search for:

How to Use Graph Databases to Analyze Relationships, Risks and Business Opportunities (YouTube) (9,530 “hits”)

Graph Databases, Triple Stores and their uses… (slides of Jans Aasman at Enterprise Data World 2012) (7 “hits”)

If you pick the wrong title as your search string, you will miss 9,523 mentions of this video on the WWW.

The same danger comes up when you rely on normalized data, the sort of data you saw in this video.

If the data you are searching has missed data that needs to be normalized, well, you just don’t find the data.

With a topic map based system, normalization isn’t necessary so long as there is mapping in the topic map.

Think of it this way, you can normalize data over and over again, making it unusable by its source, or you can create a mapping rule into a topic map once.

And the data remains findable by its original creator or source.

I would say yes, topic maps are graphs++, they don’t require normalization.

Scalable Property and Hypergraphs in RDF

Wednesday, November 6th, 2013

From the description:

There is a misconception that Triple Stores are not ‘true’ graph databases because they supposedly do not support Property Graphs and Hypergraphs.

We will demonstrate that Property and Hypergraphs are not only natural to Triple Stores and RDF but allow for potentially even more powerful graph models than non-RDF approaches.

AllegroGraph defends their implementation of Triple Stores as both property and hypergraphs.

The second story (see also A Letter Regarding Native Graph Databases) I have heard in two days based upon an unnamed vendor trash talking other graph databases.

Are graph databases catching on enough for that kind of marketing effort?

BTW, AllegroGraph does have a Free Server Edition Download.

Limited to 5 million triples but that should capture your baseball card collection or home recipe book. 😉

TripleRush: A Fast and Scalable Triple Store

Monday, October 21st, 2013

TripleRush: A Fast and Scalable Triple Store by Philip Stutz, Mihaela Verman, Lorenz Fischer, and Abraham Bernstein.

Abstract:

TripleRush is a parallel in-memory triple store designed to address the need for efficient graph stores that quickly answer queries over large-scale graph data. To that end it leverages a novel, graph-based architecture.

Specifi cally, TripleRush is built on our parallel and distributed graph processing framework Signal/Collect. The index structure is represented as a graph where each index vertex corresponds to a triple pattern. Partially matched copies of a query are routed in parallel along di fferent paths of this index structure.

We show experimentally that TripleRush takes less than a third of the time to answer queries compared to the fastest of three state-of-the-art triple stores, when measuring time as the geometric mean of all queries for two benchmarks. On individual queries, TripleRush is up to three orders of magnitude faster than other triple stores.

If the abstract hasn’t already gotten your interest, consider the following:

The index graph we just described is di fferent from traditional index structures, because it is designed for the efficient parallel routing of messages to triples that correspond to a given triple pattern. All vertices that form the index structure are active parallel processing elements that only interact via message passing.

That is the beginning to section “3.2 Query Processing.” It has a worked example that will repay a close reading.

The processing model outlined here is triple specific, but I don’t see any reason why the principles would not work for other graph structures.

This is going to the top of my reading list.

I first saw this in a tweet by Stefano Bertolo.

Introduction to: Triplestores [Perils of Inferencing]

Monday, February 4th, 2013

Introduction to: Triplestores Juan Sequeda.

From the post:

Triplestores are Database Management Systems (DBMS) for data modeled using RDF. Unlike Relational Database Management Systems (RDBMS), which store data in relations (or tables) and are queried using SQL, triplestores store RDF triples and are queried using SPARQL.

A key feature of many triplestores is the ability to do inference. It is important to note that a DBMS typically offers the capacity to deal with concurrency, security, logging, recovery, and updates, in addition to loading and storing data. Not all Triplestores offer all these capabilities (yet).

Unless you have been under a rock or in another dimension, triplestores are not news.

This is a short list of some of the more popular ones and illustrates one of the problems with “inferencing,” inside or outside of a triple store.

The inference in this article says that “full professors,” “assistant professors,” and “teachers” are all “professors.”

Suggest you drop by the local university to see if “full professors” think of instructors or “adjunct professors” as “professors.”

BTW, the “inferencing” is “correct” as far as the OWL ontology in the article goes. But that’s part of the problem.

Being “correct” in OWL may or may not have any relationship to the world as you experience it.


My wife reminded me at lunch that piano players in whore houses around the turn of the 19th century were also called “professor.”

Another inference not accounted for.