Archive for the ‘CRISP-DM’ Category

Cross-Industry Standard Process for Data Mining (CRISP-DM 1.0)

Sunday, November 13th, 2011

Cross-Industry Standard Process for Data Mining (CRISP-DM 1.0) (pdf file)

From the foreword:

CRISP-DM was conceived in late 1996 by three “veterans” of the young and immature data mining market. DaimlerChrysler (then Daimler-Benz) was already experienced, ahead of most industrial and commercial organizations, in applying data mining in its business operations. SPSS (then ISL) had been providing services based on data mining since 1990 and had launched the first commercial data mining workbench – Clementine – in 1994. NCR, as part of its aim to deliver added value to its Teradata data warehouse customers, had established teams of data mining consultants and technology specialists to service its clients’ requirements.

At that time, early market interest in data mining was showing signs of exploding into widespread uptake. This was both exciting and terrifying. All of us had developed our approaches to data mining as we went along. Were we doing it right? Was every new adopter of data mining going to have to learn, as we had initially, by trial and error? And from a
supplier’s perspective, how could we demonstrate to prospective customers that data mining was sufficiently mature to be adopted as a key part of their business processes? A standard process model, we reasoned, non-proprietary and freely available, would address these issues for us and for all practitioners.

CRISP-DM has not been built in a theoretical, academic manner working from technical principles, nor did elite committees of gurus create it behind closed doors. Both these approaches to developing methodologies have been tried in the past, but have seldom led to practical, successful and widely–adopted standards. CRISP-DM succeeds because it is soundly based on the practical, real-world experience of how people do data mining projects. And in that respect, we are overwhelmingly indebted to the many practitioners who contributed their efforts and their ideas throughout the project.

You might want to note that despite the issue date of 2000:

Eric King, founder and president of The Modeling Agency, a Pittsburgh-based consulting firm that focuses on analytics and data mining, [said]:

While King believes a guide in the form of a consultant is an invaluable resource for businesses in the planning phase, he noted that his firm follows the Cross Industry Standard Process for Data Mining, a public document he describes as “a cheat sheet,” when it’s working with clients. (emphasis added. Source: Developing a predictive analytics program doable on a limited budget