Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 29, 2016

Scraping Reddit

Filed under: Reddit — Patrick Durusau @ 8:02 pm

Scrapping Reddit by Daniel Donohue.

From the post:

For our third project here at NYC Data Science, we were tasked with writing a web scraping script in Python. Since I spend (probably too much) time on Reddit, I decided that it would be the basis for my project. For the uninitiated, Reddit is a content-aggregator, where users submit text posts or links to thematic subforums (called “subreddits”), and other users vote them up or down and comment on them. With over 36 million registered users and nearly a million subreddits, there is a lot of content to scrape.

Daniel walks through his scraping and display of the resulting data.

In case you are sort on encrypted core dumps, you can fill up a stack of DVDs with randomized and encrypted Reddit posts. Just something to leave for unexpected visitors to find.

Be sure to use a Sharpie to copy Arabic letters on some of the DVDs.

Who knows? Someday your post to Reddit, in its encrypted form, may serve to confound and confuse the FBI.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress