YAGO: A High-Quality Knowledge Base
Overview:
YAGO is a huge semantic knowledge base, derived from Wikipedia WordNet and GeoNames. Currently, YAGO has knowledge of more than 10 million entities (like persons, organizations, cities, etc.) and contains more than 120 million facts about these entities.
YAGO is special in several ways:
- The accuracy of YAGO has been manually evaluated, proving a confirmed accuracy of 95%. Every relation is annotated with its confidence value.
- YAGO combines the clean taxonomy of WordNet with the richness of the Wikipedia category system, assigning the entities to more than 350,000 classes.
- YAGO is an ontology that is anchored in time and space. YAGO attaches a temporal dimension and a spacial dimension to many of its facts and entities.
- In addition to a taxonomy, YAGO has thematic domains such as "music" or "science" from WordNet Domains.
- YAGO extracts and combines entities and facts from 10 Wikipedias in different languages.
YAGO is developed jointly with the DBWeb group at Télécom ParisTech University.
…
Before you are too impressed by the numbers, which are impressive, realize that 10 million entities is 3% of the current US population. To say nothing of any other entities we might want include along with them. It’s a good start and very useful, but realize it is a limited set of entities.
All the source data is available, along with the source code.
Would be interesting to see how useful the entity set is when used with US campaign contribution data.
Thoughts?