Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 6, 2017

Web Scraping Reference: …

Filed under: Python,Web Scrapers — Patrick Durusau @ 1:15 pm

Web Scraping Reference: A Simple Cheat Sheet for Web Scraping with Python by Hartley Brody.

From the post:

Once you’ve put together enough web scrapers, you start to feel like you can do it in your sleep. I’ve probably built hundreds of scrapers over the years for my own projects, as well as for clients and students in my web scraping course.

Occasionally though, I find myself referencing documentation or re-reading old code looking for snippets I can reuse. One of the students in my course suggested I put together a “cheat sheet” of commonly used code snippets and patterns for easy reference.

I decided to publish it publicly as well – as an organized set of easy-to-reference notes – in case they’re helpful to others.

Brody uses Beautiful Soup, a Python library that will parse even the worst formed HTML.

I mention this so I will remember the next time I scrape Wikileaks, instead of the download then repair with Tidy, parse with Saxon/XQuery, there are easier ways to do the job!

Enjoy!

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress