Joseph M. Hellerstein writes:
When the folks at ACM SIGMOD asked me to be a guest blogger this month, I figured I should highlight the most community-facing work I’m involved with. So I wrote up a discussion of MADlib, and that the fact that this open-source in-database analytics library is now open to community contributions. (A bunch of us recently wrote a paper on the design and use of MADlib, which made my writing job a bit easier.) I’m optimistic about MADlib closing a gap between algorithm researchers and working data scientists, using familiar SQL as a vector for adoption on both fronts.
I kicked off MADlib as a part-time consulting project for Greenplum during my sabbatical in 2010-2011. As I built out the first two methods (FM and CountMin sketches) and an installer, Greenplum started assembling a team of their own engineers and data scientists to overlap with and eventually replace me when I returned to campus. They also developed a roadmap of additional methods that their customers wanted in the field. Eighteen months later, Greenplum now contributes the bulk of the labor, management and expertise for the project, and has built bridges to leading academics as well.
Like they said at Woodstock, “if you don’t think SQL is all that weird….” you might want to stop by the MADlib project. (I will have to go listen to the soundtrack. That may not be an exact quote.)
This is an important project for database analytics in an SQL context.