Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 13, 2014

Scrape the Gibson: Python skills for data scrapers

Filed under: Python,Web Scrapers — Patrick Durusau @ 7:32 pm

Scrape the Gibson: Python skills for data scrapers by Brian Abelson.

From the post:

Two years ago, I learned I had superpowers. Steve Romalewski was working on some fascinating analyses of CitiBike locations and needed some help scraping information from the city’s data portal. Cobbling together the little I knew about R, I wrote a simple scraper to fetch the json files for each bike share location and output it as a csv. When I opened the clean data in Excel, the feeling was tantamount to this scene from Hackers:

Ever since then I’ve spent a good portion of my life scraping data from websites. From movies, to bird sounds, to missed connections, and john boards (don’t ask, I promise it’s for good!), there’s not much I haven’t tried to scrape. In many cases, I dont’t even analyze the data I’ve obtained, and the whole process amounts to a nerdy version of sport hunting, with my comma-delimited trophies mounted proudly on Amazon S3.

Important post for two reasons:

  • Good introduction to the art of scraping data
  • Set the norm for sharing scraped data
    • The people who force scraping of data don’t want it shared, combined, merged or analyzed.

      You can help in disappointing them! 😉

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress