ASTERIX: A Highly Scalable Parallel Platform for Semistructured Data Management and Analysis is one of the projects behind the self-similarity and MapReduce posting.
From the project page:
The ASTERIX project is developing new technologies for ingesting, storing, managing, indexing, querying, analyzing, and subscribing to vast quantities of semi-structured information. The project is combining ideas from three distinct areas – semi-structured data, parallel databases, and data-intensive computing – to create a next-generation, open source software platform that scales by running on large, shared-nothing computing clusters.
Home of Hydrax Hyrax: Demonstrating a New Foundation for Data-Parallel Computation, “out-of-the-box support for common distributed communication patterns and set-oriented data operators.” (Need I say more?)