Archive for the ‘DBpedia’ Category

Normalizing company names with SPARQL and DBpedia

Wednesday, December 5th, 2012

Normalizing company names with SPARQL and DBpedia

Bob DuCharme writes:

Wikipedia page redirection data, waiting for you to query it.

If you send your browser to http://en.wikipedia.org/wiki/Big_Blue, you’ll end up at IBM’s page, because Wikipedia knows that this nickname usually refers to this company. (Apparently, it’s also a nickname for several high schools and universities.) This data pointing from nicknames to official names is also stored in DBpedia, which means that we we can use SPARQL queries to normalize company names. You can use the same technique to normalize other kinds of names—for example, trying to send your browser to http://en.wikipedia.org/wiki/Bobby_Kennedy will actually send it to http://en.wikipedia.org/wiki/Robert_F._Kennedy—but a query that sticks to one domain will have a simpler job. Description Logics and all that.

As always Bob is on the cutting edge of the use of a markup standard!

Possible topic map analogies:

  • create a second name cluster and the “normalized name” is an additional base name
  • move the “nickname” to a variant name (scope?) and update the base name to be the normalized name (with changes to sort/display as necessary)

I am assuming that Bob’s lang(?redirectsTo) = "en" operates like scope in topic maps.

Except that scope in topic map is represented by one or more topics, which means merging can occur between topics that represent the same language.

DBpedia 3.8 Downloads

Tuesday, November 20th, 2012

DBpedia 3.8 Downloads

From the webpage:

This pages provides downloads of the DBpedia datasets. The DBpedia datasets are licensed under the terms of the Creative Commons Attribution-ShareAlike License and the GNU Free Documentation License. The downloads are provided as N-Triples and N-Quads, where the N-Quads version contains additional provenance information for each statement. All files are bzip2 1 packed.

I had to ask to find this one.

One interesting feature that would bear repetition elsewhere is the ability to see a sample of a data file.

For example, at Links to Wikipedia Article, nest to “nt” (N-Triple), there is a “?” that when followed displays in part:

<http://dbpedia.org/resource/AccessibleComputing><http://xmlns.com/foaf/0.1/isPrimaryTopicOf><http://en.wikipedia.org/wiki/AccessibleComputing>.
<http://en.wikipedia.org/wiki/AccessibleComputing><http://xmlns.com/foaf/0.1/primaryTopic><http://dbpedia.org/resource/AccessibleComputing>.
<http://en.wikipedia.org/wiki/AccessibleComputing><http://purl.org/dc/elements/1.1/language>”en”@en .
<http://dbpedia.org/resource/AfghanistanHistory><http://xmlns.com/foaf/0.1/isPrimaryTopicOf><http://en.wikipedia.org/wiki/AfghanistanHistory>.
<http://en.wikipedia.org/wiki/AfghanistanHistory><http://xmlns.com/foaf/0.1/primaryTopic><http://dbpedia.org/resource/AfghanistanHistory>.
<http://en.wikipedia.org/wiki/AfghanistanHistory><http://purl.org/dc/elements/1.1/language>”en”@en .
<http://dbpedia.org/resource/AfghanistanGeography><http://xmlns.com/foaf/0.1/isPrimaryTopicOf><http://en.wikipedia.org/wiki/AfghanistanGeography>.
<http://en.wikipedia.org/wiki/AfghanistanGeography><http://xmlns.com/foaf/0.1/primaryTopic><http://dbpedia.org/resource/AfghanistanGeography>.
<http://en.wikipedia.org/wiki/AfghanistanGeography><http://purl.org/dc/elements/1.1/language>”en”@en .

Which enabled me to conclude for my purposes, the reverse pointing from DBpedia to Wikipedia was repetitious. And since the entire dataset is only for the English version of Wikipedia, the declaration of language was superfluous.

That may not be true for your intended use of DBpedia data.

My point being that seeing sample data allows a quick evaluation before downloading large amounts of data.

A feature I would like to see for other data sets.

Path Report: dbpedia graph

Saturday, November 10th, 2012

Marko A. Rodriguez tweets:

There are 251,818,304,970,074,185 (251 quadrillion) length 5 paths in the #dbpedia graph.

Just in case you are curious.

With a pointer to: Faunus.

One of the use cases for Faunus is graph derivation:

Given an input graph, derive a new graph based upon the input graph’s structure and semantics. Other terms include graph rewriting and graph transformations.

Sounds like merging would fit into “derivation,” “graph rewriting” and “graph transformation” doesn’t it?

Or even spawning content in one graph based in its structure or semantics, using structure and semantics from one or more other graphs as sources.

Much to be thought about here.

DBpedia Spotlight v0.5 – Shedding Light on the Web of Documents

Friday, September 30th, 2011

DBpedia Spotlight v0.5 – Shedding Light on the Web of Documents by Pablo Mendes (email announcement)

We are happy to announce the release of DBpedia Spotlight v0.5 – Shedding Light on the Web of Documents.

DBpedia Spotlight is a tool for annotating mentions of DBpedia entities and concepts in text, providing a solution for linking unstructured information sources to the Linked Open Data cloud through DBpedia. The DBpedia Spotlight Architecture is composed by the following modules:

  • Web application, a demonstration client (HTML/Javascript UI) that allows users to enter/paste text into a Web browser and visualize the resulting annotated text.
  • Web Service, a RESTful Web API that exposes the functionality of annotating and/or disambiguating resources in text. The service returns XML, JSON or XHTML+RDFa.
  • Annotation Java / Scala API, exposing the underlying logic that performs the annotation/disambiguation.
  • Indexing Java / Scala API, executing the data processing necessary to enable the annotation/disambiguation algorithms used.

In this release we have provided many enhancements to the Web Service, installation process, as well as the spotting, candidate selection, disambiguation and annotation stages. More details on the enhancements are provided below.

The new version is deployed at:

Instructions on how to use the Web Service are available at: http://spotlight.dbpedia.org

We invite your comments on the new version before we deploy it on our production server. We will keep it on the “dev” server until October 6th, when we will finally make the switch to the production server at http://spotlight.dbpedia.org/demo/ and http://spotlight.dbpedia.org/rest/

If you are a user of DBpedia Spotlight, please join dbp-spotlight-users@lists.sourceforge.net for announcements and other discussions.

Warning: I think they are serious about the requirement of Firefox 6.0.2 and Chromium 12.0.

I tried it on an older version of Firefox on Ubuntu and got no results at all. Will upgrade Firefox but only in my directory.

DBpedia Live

Saturday, June 25th, 2011

DBpedia Live

From the website:

The main objective of DBpedia is to extract structured information from Wikipedia, convert it into RDF, and make it freely available on the Web. In a nutshell, DBpedia is the Semantic Web mirror of Wikipedia.

Wikipedia users constantly revise Wikipedia articles with updates happening almost each second. Hence, data stored in the official DBpedia endpoint can quickly become outdated, and Wikipedia articles need to be re-extracted. DBpedia-Live enables such a continuous synchronization between DBpedia and Wikipedia.

Important Links:

OK, so you have a live feed. Now how do you judge the importance of updates and which ones trigger alerts to the user? Or are important enough to trigger merges? (Assuming not all possible merges are worth the expense.)

DBpedia4Neo

Saturday, April 9th, 2011

DBpedia4Neo

Claudio Martella walks through loading DBpedia into a graphDB.

DISCLAIMER: this is a bit of a hack, but it should get you started. I managed to get the core dataset of DBpedia into Neo4J, but this procedure should actually be working for any Blueprints-ready vendor, like OrientDB.

Ok, a little background first: we want to store DBpedia inside of a GraphDB, instead of the typical TripleStore, and run SPARQL queries over it. DBpedia is a project aiming to extract structured content from Wikipedia, information such as the one you can find in the infoboxes, the links, the categorization infos, geo-coordinates etc. This information is extracted and exported as triples to form a graph, a network of properties and relationships between Wikipedia resources.

Noting that graph queries are more efficient than when against a triple store.

DBpedia 3.6 – Release

Monday, January 17th, 2011

DBpedia 3.6 – Release

From the announcement:

The new DBpedia dataset describes more than 3.5 million things, of which 1.67 million are classified in a consistent ontology, including 364,000 persons, 462,000 places, 99,000 music albums, 54,000 films, 16,500 video games, 148,000 organizations, 148,000 species and 5,200 diseases.

The DBpedia dataset features labels and abstracts for 3.5 million things in up to 97 different languages; 1,850,000 links to images and 5,900,000 links to external web pages; 6,500,000 external links into other RDF datasets, and 632,000 Wikipedia categories.

The dataset consists of 672 million pieces of information (RDF triples) out of which 286 million were extracted from the English edition of Wikipedia and 386 million were extracted from other language editions and links to external datasets.

Quick Links:

DBpedia MappingTool: a graphical user interface to support the community in creating and editing mappings as well as the ontology.

Improved DBpedia Ontology as well as improved Infobox mappings.

Some commonly used property names changed. Please see http://dbpedia.org/ChangeLog and http://dbpedia.org/Datasets/Properties to know which relations changed and update your applications accordingly!

Download the new DBpedia dataset from http://dbpedia.org/Downloads36

Available as Linked Data and via the DBpedia SPARQL endpoint at http://dbpedia.org/sparql