The Scalable Hyperlink Store

The Scalable Hyperlink Store by Marc Najork.

Abstract:

This paper describes the Scalable Hyperlink Store, a distributed in-memory “database” for storing large portions of the web graph. SHS is an enabler for research on structural properties of the web graph as well as new link-based ranking algorithms. Previous work on specialized hyperlink databases focused on finding efficient compression algorithms for web graphs. By contrast, this work focuses on the systems issues of building such a database. Specifically, it describes how to build a hyperlink database that is fast, scalable, fault-tolerant, and incrementally updateable.

The design goals call for partitioning because:

…the maximum memory size on commodity machines is limited to a few tens of gigabytes….

So the paper is a bit dated but still instructive in terms of building a hyperlink store.

Consider this background to the notion of a hyperlink store that doesn’t offer a user transit to another site but could return the user the content pointed to by a hyperlink.

The Scalable Hyperlink Store at MS Research has more details and software.

Comments are closed.