Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 1, 2011

Facebook100 data and a parser for it

Filed under: Data,Dataset — Patrick Durusau @ 3:33 pm

Facebook100 data and a parser for it

From the post:

A few weeks ago, Mason Porter posted a goldmine of data, the Facebook100 dataset. The dataset contains all of the Facebook friendships at 100 US universities at some time in 2005, as well as a number of node attributes such as dorm, gender, graduation year, and academic major. The data was apparently provided directly by Facebook.

As far as I know, the dataset is unprecedented and has the potential advance both network methods and insights into the structure of acquaintanceship. Unfortunately, the Facebook Data Team requested that Porter no longer distribute the dataset. It does not include the names of individual or even of any of the node attributes (they have been given integer ids), but Facebook seems to be concerned. Anonymized network data is after all vulnerable to de-anonymization (for some nice examples of why, see the last 20 minutes of this video lecture from Jon Kleinberg).

It’s a shame that Porter can no longer distribute the data. On the other hand, once a dataset like that has been released, will the internet be able to forget it? After a bit of poking around I found the dataset as a torrent file. In fact, if anyone is seeding the torrent, you can download it by following this link and it appears to be on rapidshare.

Can anyone confirm a location for the Facebook100 data? I get “file removed” from the brave folks at rapidshare and ads to register for various download services (before knowing the file is available) from the torrent site. Thanks!

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress