Faceboook Gets Smarter with Graph Engine Optimization by Alex Woodie.
From the post:
Last fall, the folks in Facebook’s engineering team talked about how they employed the Apache Giraph engine to build a graph on its Hadoop platform that can host more than a trillion edges. While the Graph Search engine is capable of massive graphing tasks, there were some workloads that remained outside the company’s technical capabilities–until now.
Facebook turned to the Giraph engine to power its new Graph Search offering, which it unveiled in January 2013 as a way to let users perform searches on other users to determine, for example, what kind of music their Facebook friends like, what kinds of food they’re into, or what activities they’ve done recently. An API for Graph Search also provides advertisers with a new revenue source for Facebook. It’s likely the world’s largest graph implementation, and a showcase of what graph engines can do.
The company picked Giraph because it worked on their existing Hadoop implementation, including HDFS and its MapReduce infrastructure stack (known as Corona). Compared to running the computation workload on Hive, an internal Facebook test of a 400-billion edge graph ran 126x faster on Giraph, and had a 26x performance advantage, as we explained in a Datanami story last year.
When Facebook scaled its internal test graph up to 1 trillion edges, they were able to keep the processing of each iteration of the graph under four minutes on a 200-server cluster. That amazing feat was done without any optimization, the company claimed. “We didn’t cheat,” Facebook developer Avery Ching declared in a video. “This is a random hashing algorithm, so we’re randomly assigning the vertices to different machines in the system. Obviously, if we do some separation and locality optimization, we can get this number down quite a bit.”
…
High level view with technical references on how Facebook is optimizing its Apache Giraph engine.
If you are interested in graphs, this is much more of a real world scenario than building “big” graphs out of uniform time slices.