Front-end view generation with Hadoop by Pere Ferrera.
From the post:
One of the most common uses for Hadoop is building “views”. The usual case is that of websites serving data in a front-end that uses a search index. Why do we want to use Hadoop to generate the index being served by the website? There are several reasons:
- Parallelism: When the front-end needs to serve a lot of data, it is a good idea to divide them into “shards”. With Hadoop we can parallelize the creation of each of these shards so that both the generation of the view and service of it will be scaled and efficient.
- Efficiency: In order to maximize the efficiency and the speed of a front-end, it is convenient to separate the generation from the serving of the view. The generation will be done by a back-end process whereas the serving will be done by a front-end; in this way we are freeing the front-end from the load that can be generated while indexing.
- Atomicity: It is often convenient to have a method for generating and deploying views atomically. In this way, if the deployment fails, we can always go back to previous complete versions (rollback) easily. If the generation went badly we can always generate a new full view where the error will be solved in all the registers. Hadoop allows us to generate views atomically because it is batch-oriented. Some search engines / databases allow atomic deployment by doing a hot-swap of their data.
Covers use of Solr and Voldemort by example.
Concludes by noting this isn’t a solution for real-time updating but one suspects that isn’t a universal requirement across the web.
Plus see the additional resources suggested at the end of the post. You won’t (shouldn’t be) disappointed.