Semantic Queries by Example by Lipyeow Lim, Haixun Wang, Min Wang.
With the ever increasing quantities of electronic data, there is a growing need to make sense out of the data. Many advanced database applications are beginning to support this need by integrating domain knowledge encoded as ontologies into queries over relational data. However, it is extremely difficult to express queries against graph structured ontology in the relational SQL query language or its extensions. Moreover, semantic queries are usually not precise, especially when data and its related ontology are complicated. Users often only have a vague notion of their information needs and are not able to specify queries precisely. In this paper, we address these challenges by introducing a novel method to support semantic queries in relational databases with ease. Instead of casting ontology into relational form and creating new language constructs to express such queries, we ask the user to provide a small number of examples that satisfy the query she has in mind. Using those examples as seeds, the system infers the exact query automatically, and the user is therefore shielded from the complexity of interfacing with the ontology. Our approach consists of three steps. In the first step, the user provides several examples that satisfy the query. In the second step, we use machine learning techniques to mine the semantics of the query from the given examples and related ontologies. Finally, we apply the query semantics on the data to generate the full query result. We also implement an optional active learning mechanism to find the query semantics accurately and quickly. Our experiments validate the effectiveness of our approach.
Potentially deeply important work for both a topic map query language and topic map authoring.
The authors conclude:
In this paper, we introduce a machine learning approach to support semantic queries in relational database. In semantic query processing, the biggest hurdle is to represent ontological data in relational form so that the relational database engine can manipulate the ontology in a way consistent with manipulating the data. Previous approaches include transforming the graph ontological data into tabular form, or representing ontological data in XML and leveraging database extenders on XML such as DB2’s Viper. These approaches, however, are either expensive (materializing a transitive relationship represented by a graph may increase the data size exponentially) or requiring changes in the database engine and new extensions to SQL. Our approach shields the user from the necessity of dealing with the ontology directly. Indeed, as our user study indicates, the diﬃculty of expressing ontology-based query semantics in a query language is the major hurdle of promoting semantic query processing. With our approach, the users do not even need to know ontology representation. All that is required is that the user gives some examples that satisfy the query he has in mind. The system then automatically ﬁnds the answer to the query. In this process, semantics, which is a concept usually hard to express, remains as a concept in the mind of user, without having to be expressed explicitly in a query language. Our experiments and user study results show that the approach is eﬃcient, eﬀective, and general in supporting semantic queries in terms of both accuracy and usability. (emphasis added)
I rather like: “In this process, semantics, which is a concept usually hard to express, remains as a concept in the mind of user, without having to be expressed explicitly in a query language.”
To take it a step further, it should apply to the authoring of topic maps as well.
A user selects from a set of examples the subjects they want to talk about. Quite different from any topic map authoring interface I have seen to date.
The “details” of capturing and querying semantics have stymied RDF:
(From: The Semantic Web Is Failing — But Why? (Part 4))
And topic map authoring as well.
Is your next authoring/querying interface going to be by example?
I first saw this in a tweet by Stefano Bertolo.