Scrape the Gibson: Python skills for data scrapers by Brian Abelson.
From the post:
Two years ago, I learned I had superpowers. Steve Romalewski was working on some fascinating analyses of CitiBike locations and needed some help scraping information from the city’s data portal. Cobbling together the little I knew about
R
, I wrote a simple scraper to fetch the json files for each bike share location and output it as a csv. When I opened the clean data in Excel, the feeling was tantamount to this scene from Hackers:Ever since then I’ve spent a good portion of my life scraping data from websites. From movies, to bird sounds, to missed connections, and john boards (don’t ask, I promise it’s for good!), there’s not much I haven’t tried to scrape. In many cases, I dont’t even analyze the data I’ve obtained, and the whole process amounts to a nerdy version of sport hunting, with my comma-delimited trophies mounted proudly on Amazon S3.
…
Important post for two reasons:
- Good introduction to the art of scraping data
- Set the norm for sharing scraped data
The people who force scraping of data don’t want it shared, combined, merged or analyzed.
You can help in disappointing them! 😉