Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 4, 2011

More Data: Tweets & News Articles

Filed under: Dataset,News,TREC,Tweets — Patrick Durusau @ 6:07 pm

From Max Lin’s blog, Ian Soboroff posted:

Two new collections being released from TREC today:

The first is the long-awaited Tweets2011 collection. This is 16 million tweets sampled by Twitter for use in the TREC 2011 microblog track. We distribute the tweet identifiers and a crawler, and you download the actual tweets using the crawler. http://trec.nist.gov/data/tweets/

The second is TRC2, a collection of 1.8 million news articles from Thompson Reuters used in the TREC 2010 blog track. http://trec.nist.gov/data/reuters/reuters.html

Both collections are available under extremely permissive usage agreements that limit their use to research and forbid redistribution, but otherwise are very open as data usage agreements go.

It may just be my memory but I don’t recall seeing topic map research with the older Reuters data set (the new one is too recent). Is that true?

Anyway, more large data sets for your research pleasure.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress