Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 4, 2015

Google Freebase Annotations of TREC KBA 2014 Stream Corpus, v1 (FAKBA1)

Filed under: Entities,Freebase,TREC — Patrick Durusau @ 5:26 pm

Google Freebase Annotations of TREC KBA 2014 Stream Corpus, v1 (FAKBA1)

From the webpage:

Researchers at Google annotated the English-language pages from the TREC KBA Stream Corpus 2014 with links to Freebase. The annotation was performed automatically and are imperfect. For each entity recognized with high confidence an annotation with a link to Freebase is provided (see the details below).

For any questions, join this discussion forum: https://groups.google.com/group/streamcorpus.

Data Description

The entity annotations are for the TREC KBA Stream Corpus 2014. These annotations are freely available. The annotation data for the corpus is provided as a collection of 2000 files (the partitioning is somewhat arbitrary) that total 196 GB, compressed (gz). Each file contains annotations for a batch of pages and the entities identified on the page. These annotations are freely available.

I first saw this in a tweet by Jeff Dalton.

Jeff has a blog post about this release at: Google Research Entity Annotations of the KBA Stream Corpus (FAKBA1). Jeff speculates on the application of this corpus to other TREC tasks.

Jeff suggests that you monitor Knowledge Data Releases for future data releases. I need to ping Jeff as the FAKBA1 release does not appear on the Knowledge Data Release page.

BTW, don’t be misled by the “9.4 billion entity annotations from over 496 million documents” statistic. Impressive but ask yourself, how many of your co-workers, their friends, families, relationships at work, projects where you work, etc. appear in Freebase? Sounds like there is a lot of work to be done with your documents and data that have little or nothing to do with Freebase. Yes?

Enjoy!

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress