Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 27, 2012

Building a Scalable Web Crawler with Hadoop

Filed under: Hadoop,Webcrawler — Patrick Durusau @ 4:31 pm

Building a Scalable Web Crawler with Hadoop

Ahad Rana of Common Crawl presents an architectural view of a web crawler based on Hadoop.

You can access the data from Common Crawl.

But the architecture notes may be useful if you decide to crawl a sub-part of the web and/or you need to crawl “deep web” data in your organization.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress