Managing Terabytes of Web Semantics Data Authors: Michele Catasta, Renaud Delbru, Nickolai Toupikov, and Giovanni Tummarello
Abstract:
A large amount of semi structured data is now made available on the Web in form of RDF, RDFa and Microformats. In this chapter, we discuss a general model for the Web of Data and, based on our experience in Sindice.com, we discuss how this is reflected in the architecture and components of a large scale infrastructure. Aspects such as data collection, processing, indexing, ranking are touched, and we give an ample example of an applications built on top of said infrastructure.
Appears as Chapter 6 in R. De Virgilio et al. (eds.), Semantic Web Information Management, © Springer-Verlag Berlin Heidelberg 2010.
Hopefully not too repetitious with the other Sindice.com material I have been posting.
It is a good overview of the area, in addition to specifics about Sindice.com.