Scalable reduction of large datasets to interesting subsets

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 26, 2010

Scalable reduction of large datasets to interesting subsets

Filed under: OWL,RDF,Semantic Web — Patrick Durusau @ 11:04 am

Scalable reduction of large datasets to interesting subsets Authors: Gregory Todd Williams, Jesse Weaver, Medha Atre, James A. Hendler Keywords: Billion Triples Challenge, Scalability, Parallel, Inferencing, Query, Triplestore

Abstract:

With a huge amount of RDF data available on the web, the ability to find and access relevant information is crucial. Traditional approaches to storing, querying, and reasoning fall short when faced with web-scale data. We present a system that combines the computational power of large clusters for enabling large-scale reasoning and data access with an efficient data structure for storing and querying the accessed data on a traditional personal computer or other resource-constrained device. We present results of using this system to load the 2009 Billion Triples Challenge dataset, materialize RDFS inferences, extract an “interesting” subset of the data using a large cluster, and further analyze the extracted data using a personal computer, all in the order of tens of minutes.

I wonder about the use of the phrase “…web-scale data?”

if a billion triples is a real challenge, then what happens when RDF/RDFa is deployed across an entity and inference rich body of material like legal texts? Or property descriptions? Or the ownership rights based on property descriptions?

In any event, the prep of the data for inferencing illustrates a use case for topic maps:

Information about people is represented in different ways in the BTC2009 dataset, including the use of the FOAF,7 SIOC,8 DBpedia,9 and AKT10 ontologies. We create a simple upper ontology to bring together concepts and properties pertaining to people. For example, we define the class up:Person which is defined as a superclass to existing person classes, e.g., foaf:Person. We do the same for relevant properties, e.g., up:full name is a superproperty of akt:full-name. Note that “up” is the namespace prefix for our upper ontology.

What subject represented by akt:full-name was responsible for the mapping in question? How does that translate to other ontologies? Oh, sorry, no place to record that mapping.

Questions:

How do you evaluate the claims of “…web-scale data?” (3-5 pages, citations)
Does creating ad-hoc upper ontologies scale? Yes/No/Why? (3-5 pages, citations)
How does interchanges of ad-hoc uppper ontologies work? (3-5 pages, citations)

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 26, 2010

Scalable reduction of large datasets to interesting subsets

No Comments