Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 1, 2011

ScraperWiki

Filed under: Data,Data Mining,Data Source,Text Extraction — Patrick Durusau @ 2:49 pm

ScraperWiki

From the About page:

What is ScraperWiki?

There’s lots of useful data on the internet – crime statistics, government spending, missing kittens…

But getting at it isn’t always easy. There’s a table here, a report there, a few web pages, PDFs, spreadsheets… And it can be scattered over thousands of different places on the web, making it hard to see the whole picture and the story behind it. It’s like trying to build something from Lego when someone has hidden the bricks all over town and you have to find them before you can start building!

To get at data, programmers write bits of code called ‘screen scrapers’, which extract the useful bits so they can be reused in other apps, or rummaged through by journalists and researchers. But these bits of code tend to break, get thrown away or forgotten once they have been used, and so the data is lost again. Which is bad.

ScraperWiki is an online tool to make that process simpler and more collaborative. Anyone can write a screen scraper using the online editor. In the free version, the code and data are shared with the world. Because it’s a wiki, other programmers can contribute to and improve the code.

Something to keep an eye on and whenever possible, to contribute to.

People make data difficult to access for a reason. Let’s disappoint them.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress