DBpedia 3.9 released, including wider infobox coverage, additional type statements, and new YAGO and Wikidata links by Christopher Sahnwaldt.
From the post:
we are happy to announce the release of DBpedia 3.9.
The most important improvements of the new release compared to DBpedia 3.8 are:
1. the new release is based on updated Wikipedia dumps dating from March / April 2013 (the 3.8 release was based on dumps from June 2012), leading to an overall increase in the number of concepts in the English edition from 3.7 to 4.0 million things.
2. the DBpedia ontology is enlarged and the number of infobox to ontology mappings has risen, leading to richer and cleaner concept descriptions.
3. we extended the DBpedia type system to also cover Wikipedia articles that do not contain an infobox.
4. we provide links pointing from DBpedia concepts to Wikidata concepts and updated the links pointing at YAGO concepts and classes, making it easier to integrate knowledge from these sources.
The English version of the DBpedia knowledge base currently describes 4.0 million things, out of which 3.22 million are classified in a consistent Ontology, including 832,000 persons, 639,000 places (including 427,000 populated places), 372,000 creative works (including 116,000 music albums, 78,000 films and 18,500 video games), 209,000 organizations (including 49,000 companies and 45,000 educational institutions), 226,000 species and 5,600 diseases.
We provide localized versions of DBpedia in 119 languages. All these versions together describe 24.9 million things, out of which 16.8 million overlap (are interlinked) with the concepts from the English DBpedia. The full DBpedia data set features labels and abstracts for 12.6 million unique things in 119 different languages; 24.6 million links to images and 27.6 million links to external web pages; 45.0 million external links into other RDF datasets, 67.0 million links to Wikipedia categories, and 41.2 million YAGO categories.
Altogether the DBpedia 3.9 release consists of 2.46 billion pieces of information (RDF triples) out of which 470 million were extracted from the English edition of Wikipedia, 1.98 billion were extracted from other language editions, and about 45 million are links to external data sets.
Detailed statistics about the DBpedia data sets in 24 popular languages are provided at Dataset Statistics.
The main changes between DBpedia 3.8 and 3.9 are described below. For additional, more detailed information please refer to the Change Log.
Almost like an early holiday present isn’t it? 😉
I continue to puzzle over the notion of “extraction.”
Not that I have an alternative but extracting data only kicks the data can one step down the road.
When someone wants to use my extracted data, they are going to extract data from my extraction. And so on.
That seems incredibly wasteful and error-prone.
Enough money is spend doing the ETL shuffle every year that research on ETL avoidance should be a viable proposition.