Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 28, 2012

The Anatomy of Search Technology: blekko’s NoSQL database [part 1]

Filed under: blekko,Search Engines — Patrick Durusau @ 6:57 pm

The Anatomy of Search Technology: blekko’s NoSQL database by Greg Lindahl.

From the post:

This is a guest post by Greg Lindahl, CTO of blekko, the spam free search engine that had over 3.5 million unique visitors in March. Greg Lindahl was Founder and Distinguished Engineer at PathScale, at which he was the architect of the InfiniPath low-latency InfiniBand HCA, used to build tightly-coupled supercomputing clusters.

Imagine that you’re crazy enough to think about building a search engine. It’s a huge task: the minimum index size needed to answer most queries is a few billion webpages. Crawling and indexing a few billion webpages requires a cluster with several petabytes of usable disk — that’s several thousand 1 terabyte disks — and produces an index that’s about 100 terabytes in size.

Greg starts with the storage aspects of the blekko search engine before taking on crawling in part 2 of this series.

Pay special attention to the combinators. You will be glad you did.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress