Representing Solutions with PMML (ACM Data Mining Talk)
Dr. Alex Guazzelli’s talk on PMML and Predictive Analytics to the ACM Data Mining Bay Area/SF group at the LinkedIn auditorium in Sunnyvale, CA.
Abstract:
Data mining scientists work hard to analyze historical data and to build the best predictive solutions out of it. IT engineers, on the other hand, are usually responsible for bringing these solutions to life, by recoding them into a format suitable for operational deployment. Given that data mining scientists and engineers tend to inhabit different information worlds, the process of moving a predictive solution from the scientist’s desktop to the operational environment can get lost in translation and take months. The advent of data mining specific open standards such as the Predictive Model Markup Language (PMML) has turned this view upside down: the deployment of models can now be achieved by the same team who builds them, in a matter of minutes.
In this talk, Dr. Alex Guazzelli not only provides the business rationale behind PMML, but also describes its main components. Besides being able to describe the most common modeling techniques, as of version 4.0, released in 2009, PMML is also capable of handling complex pre-processing tasks. As of version 4.1, released in December 2011, PMML has also incorporated complex post-processing to its structure as well as the ability to represent model ensemble, segmentation, chaining, and composition within a single language element. This combined representation power, in which an entire predictive solution (from pre-processing to model(s) to post-processing) can be represented in a single PMML file, attests to the language’s refinement and maturity.
I hesitated at the story of replacing IT engineers with data scientists. Didn’t we try that one before?
But then it was programmers with business managers. And it was called COBOL. 😉
Nothing against COBOL, it is still in use today. Widespread use as a matter of fact.
But all tasks, including IT engineering, look easy from a distance. Only after getting poor results is that lesson learned. Again.
What have your experiences been with PMML?