Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 8, 2011

Counting Triangles

Filed under: Hadoop,MPP,Vectors — Patrick Durusau @ 8:14 pm

Counting Triangles

From the post:

Recently I’ve heard from or read about people who use Hadoop because their analytic jobs can’t achieve the same level of performance in a database. In one case, a professor I visited said his group uses Hadoop to count triangles “because a database doesn’t perform the necessary joins efficiently.”

Perhaps I’m being dense but I don’t understand why a database doesn’t efficiently support these use-cases. In fact, I have a hard time believing they wouldn’t perform better in a columnar, MPP database like Vertica – where memory and storage are laid out and accessed efficiently, query jobs are automatically tuned by the optimizer, and expression execution is vectorized at run-time. There are additional benefits when several, similar jobs are run or data is updated and the same job is re-run multiple times. Of course, performance isn’t everything; ease-of-use and maintainability are important factors that Vertica excels at as well.

Since the “gauntlet was thrown down”, to steal a line from Good Will Hunting, I decided to take up the challenge of computing the number of triangles in a graph (and include the solutions in GitHub so others can experiment – more on this at the end of the post).

I don’t think you will surprised at the outcome but the exercise is instructive in a number of ways. Primarily don’t assume performance without testing. If all the bellowing leads to more competition and close work on software and algorithms, I think there will be some profit from it.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress