Archive for the ‘DBpedia’ Category

DBpedia now available as triple pattern fragments

Friday, October 31st, 2014

DBpedia now available as triple pattern fragments by Ruben Verborgh.

From the post:

DBpedia is perhaps the most widely known Linked Data source on the Web. You can use DBpedia in a variety of ways: by querying the SPARQL endpoint, by browsing Linked Data documents, or by downloading one of the data dumps. Access to all of these data sources is offered free of charge.

Last week, a fourth way of accessing DBpedia became publicly available: DBpedia’s triple pattern fragments at http://fragments.dbpedia.org/. This interface offers a different balance of trade-offs: it maximizes the availability of DBpedia by offering a simple server and thus moving SPARQL query execution to the client side. Queries will execute slower than on the public SPARQL endpoint, but their execution should be possible close to 100% of the time.

Here are some fun things to try:
– browse the new interface: http://fragments.dbpedia.org/2014/en?object=dbpedia%3ALinked_Data
– make your browser execute a SPARQL query: http://fragments.dbpedia.org/
– add live queries to your application: https://github.com/LinkedDataFragments/Client.js#using-the-library

Learn all about triple pattern fragments at the Linked Data Fragments website http://linkeddatafragments.org/, the ISWC2014 paper http://linkeddatafragments.org/publications/iswc2014.pdf,
and ISWC2014 slides: http://www.slideshare.net/RubenVerborgh/querying-datasets-on-the-web-with-high-availability.

A new effort to achieve robust processing of triples.

Enjoy!

Want to see how #SchemaOrg #Dbpedia and #SKOS taxonomies can be seamlessly integrated?

Friday, September 12th, 2014

Want to see how #SchemaOrg #Dbpedia and #SKOS taxonomies can be seamlessly integrated? Register for our webinar: http://www.poolparty.biz/webinar-taxonomy-management-content-management-well-integrated/

is how the tweet read.

From the seminar registration page:

With the arrival of semantic web standards and linked data technologies, new options for smarter content management and semantic search have become available. Taxonomies and metadata management shall play a central role in your content management system: By combining text mining algorithms with taxonomies and knowledge graphs from the web a more accurate annotation and categorization of documents and more complex queries over text-oriented repositories like SharePoint, Drupal, or Confluence are now possible.

Nevertheless, the predominant opinion that taxonomy management is a tedious process currently impedes a widespread implementation of professional metadata strategies.

In this webinar, key people from the Semantic Web Company will describe how content management and collaboration systems like SharePoint, Drupal or Confluence can benefit from professional taxonomy management. We will also discuss why taxonomy management is not necessarily a tedious process when well integrated into content management workflows.

I’ve had mixed luck with webinars this year. Some were quite good and others were equally bad.

I have fairly firm opinions about #Schema.org, #Dbpedia and #SKOS taxonomies but tedium isn’t one of them. 😉

You can register for free for: Webinar “Taxonomy management & content management – well integrated!”, October 8th, 2014.

Usual marketing harvesting of contact information. Linux users will have to use VMs for PCs or Mac.

If you attend, be sure to look for my post reviewing the webinar and post your comments there.

DBpedia – Wikipedia Data Extraction

Friday, August 1st, 2014

DBpedia – Wikipedia Data Extraction by Gaurav Vaidya.

From the post:

We are happy to announce an experimental RDF dump of the Wikimedia Commons. A complete first draft is now available online at http://nl.dbpedia.org/downloads/commonswiki/20140705/, and will be eventually accesible from http://commons.dbpedia.org. A small sample dataset, which may be easier to browse, is available on Github at https://github.com/gaurav/commons-extraction/tree/master/commonswiki/20140101

Just in case you are looking for some RDF data to experiment with this weekend!

SKOSsy – Thesauri on the fly!

Tuesday, January 14th, 2014

SKOSsy – Thesauri on the fly!

From the webpage:

SKOSsy extracts data from LOD sources like DBpedia (and basically from any RDF based knowledge base you like) and works well for automatic text mining and whenever a seed thesaurus should be generated for a certain domain, organisation or a project.

If automatically generated thesauri are loaded into an editor like PoolParty Thesaurus Manager (PPT) you can start to enrich the knowledge model by additional concepts, relations and links to other LOD sources. With SKOSsy, thesaurus projects you don´t have to be started in the open countryside anymore. See also how SKOSsy is integrated into PPT.

  • SKOSsy makes heavy use of Linked Data sources, especially DBpedia
  • SKOSsy can generate SKOS thesauri for virtually any domain within a few minutes
  • Such thesauri can be improved, curated and extended to one´s individual needs but they serve usually as “good-enough” knowledge models for any semantic search application you like
  • SKOSsy thesauri serve as a basis for domain specific text extraction and knowledge enrichment
  • SKOSsy based semantic search usually outperform search algorithms based on pure statistics since they contain high-quality information about relations, labels and disambiguation
  • SKOSsy works perfectly together with PoolParty product family

DBpedia is probably closer to some user’s vocabulary than most formal ones. 😉

I have the sense that rather than asking experts for their semantics (and how to represent them), we are about to turn to users to ask about their semantics (and choose simple ways to represent them).

If results that are useful to the average user are the goal, it is a move in the right direction.

DBpedia as Tables [21st Century Interchange]

Monday, November 25th, 2013

DBpedia as Tables

From the webpage:

As some of the potential users of DBpedia might not be familiar with the RDF data model and the SPARQL query language, we provide some of the core DBpedia 3.9 data also in tabular form as Comma-Separated-Values (CSV) files, which can easily be processed using standard tools, such as spreadsheet applications, relational databases or data mining tools.

For each class in the DBpedia ontology (such as Person, Radio Station, Ice Hockey Player, or Band) we provide a single CSV file which contains all instances of this class. Each instance is described by its URI, an English label and a short abstract, the mapping-based infobox data describing the instance (extracted from the English edition of Wikipedia), and geo-coordinates.

Altogether we provide 530 CVS files in the form of a single ZIP file (size 3 GB compressed and 73.4 GB when uncompressed).

The ZIP file can be downloaded here (3 GB).

😉

I have to admit that I applaud the move to release DBpedia as CSV files.

Despite my long time and continuing allegiance to XML, there are times when CSV is the optimum format for interchange.

You need to mark this date on your calendar.

I am curious what projects will appear using DBpedia data, based on the CSV version, in the next calendar year?

I first saw this in a tweet by Nicolas Torzec.

DBpedia 3.9 released…

Monday, September 23rd, 2013

DBpedia 3.9 released, including wider infobox coverage, additional type statements, and new YAGO and Wikidata links by Christopher Sahnwaldt.

From the post:

we are happy to announce the release of DBpedia 3.9.

The most important improvements of the new release compared to DBpedia 3.8 are:

1. the new release is based on updated Wikipedia dumps dating from March / April 2013 (the 3.8 release was based on dumps from June 2012), leading to an overall increase in the number of concepts in the English edition from 3.7 to 4.0 million things.

2. the DBpedia ontology is enlarged and the number of infobox to ontology mappings has risen, leading to richer and cleaner concept descriptions.

3. we extended the DBpedia type system to also cover Wikipedia articles that do not contain an infobox.

4. we provide links pointing from DBpedia concepts to Wikidata concepts and updated the links pointing at YAGO concepts and classes, making it easier to integrate knowledge from these sources.

The English version of the DBpedia knowledge base currently describes 4.0 million things, out of which 3.22 million are classified in a consistent Ontology, including 832,000 persons, 639,000 places (including 427,000 populated places), 372,000 creative works (including 116,000 music albums, 78,000 films and 18,500 video games), 209,000 organizations (including 49,000 companies and 45,000 educational institutions), 226,000 species and 5,600 diseases.

We provide localized versions of DBpedia in 119 languages. All these versions together describe 24.9 million things, out of which 16.8 million overlap (are interlinked) with the concepts from the English DBpedia. The full DBpedia data set features labels and abstracts for 12.6 million unique things in 119 different languages; 24.6 million links to images and 27.6 million links to external web pages; 45.0 million external links into other RDF datasets, 67.0 million links to Wikipedia categories, and 41.2 million YAGO categories.

Altogether the DBpedia 3.9 release consists of 2.46 billion pieces of information (RDF triples) out of which 470 million were extracted from the English edition of Wikipedia, 1.98 billion were extracted from other language editions, and about 45 million are links to external data sets.

Detailed statistics about the DBpedia data sets in 24 popular languages are provided at Dataset Statistics.

The main changes between DBpedia 3.8 and 3.9 are described below. For additional, more detailed information please refer to the Change Log.

Almost like an early holiday present isn’t it? 😉

I continue to puzzle over the notion of “extraction.”

Not that I have an alternative but extracting data only kicks the data can one step down the road.

When someone wants to use my extracted data, they are going to extract data from my extraction. And so on.

That seems incredibly wasteful and error-prone.

Enough money is spend doing the ETL shuffle every year that research on ETL avoidance should be a viable proposition.

Visualizing the News with VivaGraphJS

Friday, May 31st, 2013

Visualizing the News with Vivagraph.js by Max De Marzi.

From the post:

Today I want to introduce you to VivaGraphJS – a JavaScript Graph Drawing Library made by Andrei Kashcha of Yasiv. It supports rendering graphs using WebGL, SVG or CSS formats and currently supports a force directed layout. The Library provides an API which tracks graph changes and reflect changes on the rendering surface which makes it fantastic for graph exploration.

The post includes AlchemyAPI (entity extraction), DBpedia (additional information), Feedzilla (news feeds), and Neo4j (graphs).

The technology rocks but the content, well, your mileage will vary.

Normalizing company names with SPARQL and DBpedia

Wednesday, December 5th, 2012

Normalizing company names with SPARQL and DBpedia

Bob DuCharme writes:

Wikipedia page redirection data, waiting for you to query it.

If you send your browser to http://en.wikipedia.org/wiki/Big_Blue, you’ll end up at IBM’s page, because Wikipedia knows that this nickname usually refers to this company. (Apparently, it’s also a nickname for several high schools and universities.) This data pointing from nicknames to official names is also stored in DBpedia, which means that we we can use SPARQL queries to normalize company names. You can use the same technique to normalize other kinds of names—for example, trying to send your browser to http://en.wikipedia.org/wiki/Bobby_Kennedy will actually send it to http://en.wikipedia.org/wiki/Robert_F._Kennedy—but a query that sticks to one domain will have a simpler job. Description Logics and all that.

As always Bob is on the cutting edge of the use of a markup standard!

Possible topic map analogies:

  • create a second name cluster and the “normalized name” is an additional base name
  • move the “nickname” to a variant name (scope?) and update the base name to be the normalized name (with changes to sort/display as necessary)

I am assuming that Bob’s lang(?redirectsTo) = "en" operates like scope in topic maps.

Except that scope in topic map is represented by one or more topics, which means merging can occur between topics that represent the same language.

DBpedia 3.8 Downloads

Tuesday, November 20th, 2012

DBpedia 3.8 Downloads

From the webpage:

This pages provides downloads of the DBpedia datasets. The DBpedia datasets are licensed under the terms of the Creative Commons Attribution-ShareAlike License and the GNU Free Documentation License. The downloads are provided as N-Triples and N-Quads, where the N-Quads version contains additional provenance information for each statement. All files are bzip2 1 packed.

I had to ask to find this one.

One interesting feature that would bear repetition elsewhere is the ability to see a sample of a data file.

For example, at Links to Wikipedia Article, nest to “nt” (N-Triple), there is a “?” that when followed displays in part:

<http://dbpedia.org/resource/AccessibleComputing><http://xmlns.com/foaf/0.1/isPrimaryTopicOf><http://en.wikipedia.org/wiki/AccessibleComputing>.
<http://en.wikipedia.org/wiki/AccessibleComputing><http://xmlns.com/foaf/0.1/primaryTopic><http://dbpedia.org/resource/AccessibleComputing>.
<http://en.wikipedia.org/wiki/AccessibleComputing><http://purl.org/dc/elements/1.1/language>”en”@en .
<http://dbpedia.org/resource/AfghanistanHistory><http://xmlns.com/foaf/0.1/isPrimaryTopicOf><http://en.wikipedia.org/wiki/AfghanistanHistory>.
<http://en.wikipedia.org/wiki/AfghanistanHistory><http://xmlns.com/foaf/0.1/primaryTopic><http://dbpedia.org/resource/AfghanistanHistory>.
<http://en.wikipedia.org/wiki/AfghanistanHistory><http://purl.org/dc/elements/1.1/language>”en”@en .
<http://dbpedia.org/resource/AfghanistanGeography><http://xmlns.com/foaf/0.1/isPrimaryTopicOf><http://en.wikipedia.org/wiki/AfghanistanGeography>.
<http://en.wikipedia.org/wiki/AfghanistanGeography><http://xmlns.com/foaf/0.1/primaryTopic><http://dbpedia.org/resource/AfghanistanGeography>.
<http://en.wikipedia.org/wiki/AfghanistanGeography><http://purl.org/dc/elements/1.1/language>”en”@en .

Which enabled me to conclude for my purposes, the reverse pointing from DBpedia to Wikipedia was repetitious. And since the entire dataset is only for the English version of Wikipedia, the declaration of language was superfluous.

That may not be true for your intended use of DBpedia data.

My point being that seeing sample data allows a quick evaluation before downloading large amounts of data.

A feature I would like to see for other data sets.

Path Report: dbpedia graph

Saturday, November 10th, 2012

Marko A. Rodriguez tweets:

There are 251,818,304,970,074,185 (251 quadrillion) length 5 paths in the #dbpedia graph.

Just in case you are curious.

With a pointer to: Faunus.

One of the use cases for Faunus is graph derivation:

Given an input graph, derive a new graph based upon the input graph’s structure and semantics. Other terms include graph rewriting and graph transformations.

Sounds like merging would fit into “derivation,” “graph rewriting” and “graph transformation” doesn’t it?

Or even spawning content in one graph based in its structure or semantics, using structure and semantics from one or more other graphs as sources.

Much to be thought about here.

DBpedia Spotlight v0.5 – Shedding Light on the Web of Documents

Friday, September 30th, 2011

DBpedia Spotlight v0.5 – Shedding Light on the Web of Documents by Pablo Mendes (email announcement)

We are happy to announce the release of DBpedia Spotlight v0.5 – Shedding Light on the Web of Documents.

DBpedia Spotlight is a tool for annotating mentions of DBpedia entities and concepts in text, providing a solution for linking unstructured information sources to the Linked Open Data cloud through DBpedia. The DBpedia Spotlight Architecture is composed by the following modules:

  • Web application, a demonstration client (HTML/Javascript UI) that allows users to enter/paste text into a Web browser and visualize the resulting annotated text.
  • Web Service, a RESTful Web API that exposes the functionality of annotating and/or disambiguating resources in text. The service returns XML, JSON or XHTML+RDFa.
  • Annotation Java / Scala API, exposing the underlying logic that performs the annotation/disambiguation.
  • Indexing Java / Scala API, executing the data processing necessary to enable the annotation/disambiguation algorithms used.

In this release we have provided many enhancements to the Web Service, installation process, as well as the spotting, candidate selection, disambiguation and annotation stages. More details on the enhancements are provided below.

The new version is deployed at:

Instructions on how to use the Web Service are available at: http://spotlight.dbpedia.org

We invite your comments on the new version before we deploy it on our production server. We will keep it on the “dev” server until October 6th, when we will finally make the switch to the production server at http://spotlight.dbpedia.org/demo/ and http://spotlight.dbpedia.org/rest/

If you are a user of DBpedia Spotlight, please join dbp-spotlight-users@lists.sourceforge.net for announcements and other discussions.

Warning: I think they are serious about the requirement of Firefox 6.0.2 and Chromium 12.0.

I tried it on an older version of Firefox on Ubuntu and got no results at all. Will upgrade Firefox but only in my directory.

DBpedia Live

Saturday, June 25th, 2011

DBpedia Live

From the website:

The main objective of DBpedia is to extract structured information from Wikipedia, convert it into RDF, and make it freely available on the Web. In a nutshell, DBpedia is the Semantic Web mirror of Wikipedia.

Wikipedia users constantly revise Wikipedia articles with updates happening almost each second. Hence, data stored in the official DBpedia endpoint can quickly become outdated, and Wikipedia articles need to be re-extracted. DBpedia-Live enables such a continuous synchronization between DBpedia and Wikipedia.

Important Links:

OK, so you have a live feed. Now how do you judge the importance of updates and which ones trigger alerts to the user? Or are important enough to trigger merges? (Assuming not all possible merges are worth the expense.)

DBpedia4Neo

Saturday, April 9th, 2011

DBpedia4Neo

Claudio Martella walks through loading DBpedia into a graphDB.

DISCLAIMER: this is a bit of a hack, but it should get you started. I managed to get the core dataset of DBpedia into Neo4J, but this procedure should actually be working for any Blueprints-ready vendor, like OrientDB.

Ok, a little background first: we want to store DBpedia inside of a GraphDB, instead of the typical TripleStore, and run SPARQL queries over it. DBpedia is a project aiming to extract structured content from Wikipedia, information such as the one you can find in the infoboxes, the links, the categorization infos, geo-coordinates etc. This information is extracted and exported as triples to form a graph, a network of properties and relationships between Wikipedia resources.

Noting that graph queries are more efficient than when against a triple store.

DBpedia 3.6 – Release

Monday, January 17th, 2011

DBpedia 3.6 – Release

From the announcement:

The new DBpedia dataset describes more than 3.5 million things, of which 1.67 million are classified in a consistent ontology, including 364,000 persons, 462,000 places, 99,000 music albums, 54,000 films, 16,500 video games, 148,000 organizations, 148,000 species and 5,200 diseases.

The DBpedia dataset features labels and abstracts for 3.5 million things in up to 97 different languages; 1,850,000 links to images and 5,900,000 links to external web pages; 6,500,000 external links into other RDF datasets, and 632,000 Wikipedia categories.

The dataset consists of 672 million pieces of information (RDF triples) out of which 286 million were extracted from the English edition of Wikipedia and 386 million were extracted from other language editions and links to external datasets.

Quick Links:

DBpedia MappingTool: a graphical user interface to support the community in creating and editing mappings as well as the ontology.

Improved DBpedia Ontology as well as improved Infobox mappings.

Some commonly used property names changed. Please see http://dbpedia.org/ChangeLog and http://dbpedia.org/Datasets/Properties to know which relations changed and update your applications accordingly!

Download the new DBpedia dataset from http://dbpedia.org/Downloads36

Available as Linked Data and via the DBpedia SPARQL endpoint at http://dbpedia.org/sparql