Archive for the ‘Predictive Model Markup Language (PMML)’ Category

The Data Mining Group releases PMML v 4.2

Tuesday, February 25th, 2014

The Data Mining Group releases PMML v 4.2

From the announcement:

“As a standard, PMML provides the glue to unify data science and operational IT. With one common process and standard, PMML is the missing piece for Big Data initiatives to enable rapid deployment of data mining models. Broad vendor support and rapid customer adoption demonstrates that PMML delivers on its promise to reduce cost, complexity and risk of predictive analytics,” says Alex Guazzelli, Vice President of Analytics, Zementis. “You can not build and deploy predictive models over big data without using multiple models and no one should build multiple models without PMML,” says Bob Grossman, Founder and Partner at Open Data Group.

Some of the elements that are new to PMML v4.2 include:

  • Improved support for post-processing, model types, and model elements
  • A completely new element for text mining
  • Scorecards now introduce the ability to compute points based on expressions
  • New built-in functions, including “matches” and “replace” for the use of regular expressions

(emphasis added)

Hmmm, do you think they meant before 4.2 they didn’t have “matches” and “replace?” (I checked, they didn’t.)

However, kudos on the presentation of their schema, both current and prior versions.

More XML schemas had such documentation/presentation.

See PMML v.42 General Structure.

I first saw this at: The Data Mining Group releases PMML v4.2 Predictive Modeling Standard.

Representing Solutions with PMML (ACM Data Mining Talk)

Friday, September 28th, 2012

Representing Solutions with PMML (ACM Data Mining Talk)

Dr. Alex Guazzelli’s talk on PMML and Predictive Analytics to the ACM Data Mining Bay Area/SF group at the LinkedIn auditorium in Sunnyvale, CA.

Abstract:

Data mining scientists work hard to analyze historical data and to build the best predictive solutions out of it. IT engineers, on the other hand, are usually responsible for bringing these solutions to life, by recoding them into a format suitable for operational deployment. Given that data mining scientists and engineers tend to inhabit different information worlds, the process of moving a predictive solution from the scientist’s desktop to the operational environment can get lost in translation and take months. The advent of data mining specific open standards such as the Predictive Model Markup Language (PMML) has turned this view upside down: the deployment of models can now be achieved by the same team who builds them, in a matter of minutes.

In this talk, Dr. Alex Guazzelli not only provides the business rationale behind PMML, but also describes its main components. Besides being able to describe the most common modeling techniques, as of version 4.0, released in 2009, PMML is also capable of handling complex pre-processing tasks. As of version 4.1, released in December 2011, PMML has also incorporated complex post-processing to its structure as well as the ability to represent model ensemble, segmentation, chaining, and composition within a single language element. This combined representation power, in which an entire predictive solution (from pre-processing to model(s) to post-processing) can be represented in a single PMML file, attests to the language’s refinement and maturity.

I hesitated at the story of replacing IT engineers with data scientists. Didn’t we try that one before?

But then it was programmers with business managers. And it was called COBOL. 😉

Nothing against COBOL, it is still in use today. Widespread use as a matter of fact.

But all tasks, including IT engineering, look easy from a distance. Only after getting poor results is that lesson learned. Again.

What have your experiences been with PMML?

Google Prediction API graduates from labs

Sunday, October 16th, 2011

Google Prediction API graduates from labs, adds new features by Zachary Goldberg, Product Manager.

From the post:

Since the general availability launch of the Prediction API this year at Google I/O, we have been working hard to give every developer access to machine learning in the cloud to build smarter apps. We’ve also been working on adding new features, accuracy improvements, and feedback capability to the API.

Today we take another step by announcing Prediction v1.4. With the launch of this version, Prediction is graduating from Google Code Labs, reflecting Google’s commitment to the API’s development and stability. Version 1.4 also includes two new features:

  • Data Anomaly Analysis
    • One of the hardest parts of building an accurate predictive model is gathering and curating a high quality data set. With Prediction v1.4, we are providing a feature to help you identify problems with your data that we notice during the training process. This feedback makes it easier to build accurate predictive models with proper data.
  • PMML Import
    • PMML has become the de facto industry standard for transmitting predictive models and model data between systems. As of v1.4, the Google Prediction API can programmatically accept your PMML for data transformations and preprocessing.
    • The PMML spec is vast and covers many, many features. You can find more details about the specific features that the Google Prediction API supports here.

(I added a paragraph break in the first text block for readability. It should be re-written but I am quoting.)

Suggest you take a close look at the features of PMML that Google does not support. Quite an impressive array of non-support.

Predictive Model Markup Language

Tuesday, August 23rd, 2011

Predictive Model Markup Language

From the wiki page:

The Predictive Model Markup Language (PMML) is an XML-based markup language developed by the Data Mining Group (DMG) to provide a way for applications to define models related to predictive analytics and data mining and to share those models between PMML-compliant applications.

PMML provides applications a vendor-independent method of defining models so that proprietary issues and incompatibilities are no longer a barrier to the exchange of models between applications. It allows users to develop models within one vendor’s application and use other vendors’ applications to visualize, analyze, evaluate or otherwise use the models. Previously, this was very difficult, but with PMML, the exchange of models between compliant applications is straightforward.

Since PMML is an XML-based standard, the specification comes in the form of an XML schema.

Curious if anyone has experience with PMML with or without topic maps?