Archive for the ‘Regression’ Category

Classification and regression trees

Tuesday, July 15th, 2014

Classification and regression trees by Wei-Yin Loh.


Classification and regression trees are machine-learningmethods for constructing prediction models from data. The models are obtained by recursively partitioning the data space and fitting a simple prediction model within each partition. As a result, the partitioning can be represented graphically as a decision tree. Classification trees are designed for dependent variables that take a finite number of unordered values, with prediction error measured in terms of misclassification cost. Regression trees are for dependent variables that take continuous or ordered discrete values, with prediction error typically measured by the squared difference between the observed and predicted values. This article gives an introduction to the subject by reviewing some widely available algorithms and comparing their capabilities, strengths, and weakness in two examples. 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 14–23 DOI: 10.1002/widm.8.

A bit more challenging that CSV formats but also very useful.

I heard a joke many years ago but a then U.S. Assistant Attorney General who said:

To create a suspect list for a truck hijacking in New York, you choose files with certain name characteristics, delete the ones that are currently in prison and those that remain are your suspect list. (paraphrase)

If topic maps can represent any “subject” then they should be able to represent “group subjects” as well. We may know that our particular suspect is the member of a group, but we just don’t know which member of the group is our suspect.

Think of it as a topic map that evolves as more data/analysis is brought to the map and members of a group subject can be broken out into smaller groups or even individuals.

In fact, displaying summaries of characteristics of members of a group in response to classification/regression could well help with the subject analysis process. An interactive construction/mining of the topic map as it were.

Great paper whether you use it for topic map subject analysis or more traditional purposes.

The Evolution of Regression Modeling… [Webinar]

Wednesday, February 6th, 2013

The Evolution of Regression Modeling: From Classical Linear Regression to Modern Ensembles by Mikhail Golovnya and Illia Polosukhin.


Part 1: Fri March 1, 10 am, PST

Part 2: Friday, March 15, 10 am, PST

Part 3: Friday, March 29, 10 am, PST

Part 4: Friday, April 12, 10 am, PST

From the webpage:

Class Description: Regression is one of the most popular modeling methods, but the classical approach has significant problems. This webinar series address these problems. Are you are working with larger datasets? Is your data challenging? Does your data include missing values, nonlinear relationships, local patterns and interactions? This webinar series is for you! We will cover improvements to conventional and logistic regression, and will include a discussion of classical, regularized, and nonlinear regression, as well as modern ensemble and data mining approaches. This series will be of value to any classically trained statistician or modeler.


Part 1: March 1 – Regression methods discussed

  •     Classical Regression
  •     Logistic Regression
  •     Regularized Regression: GPS Generalized Path Seeker
  •     Nonlinear Regression: MARS Regression Splines

Part 2: March 15 – Hands-on demonstration of concepts discussed in Part 1

  •     Step-by-step demonstration
  •     Datasets and software available for download
  •     Instructions for reproducing demo at your leisure
  •     For the dedicated student: apply these methods to your own data (optional)

Part 3: March 29 – Regression methods discussed
*Part 1 is a recommended pre-requisite

  •     Nonlinear Ensemble Approaches: TreeNet Gradient Boosting; Random Forests; Gradient Boosting incorporating RF
  •     Ensemble Post-Processing: ISLE; RuleLearner

Part 4: April 12 – Hands-on demonstration of concepts discussed in part 3

  •     Step-by-step demonstration
  •     Datasets and software available for download
  •     Instructions for reproducing demo at your leisure
  •     For the dedicated student: apply these methods to your own data (optional)

Salford Systems offers other introductory videos, webinars and tutorial and case studies.

Regression modeling is a tool you will encounter in data analysis and is likely to be an important part of your exploration toolkit.

I first saw this at KDNuggets.

“All Models are Right, Most are Useless”

Sunday, March 11th, 2012

“All Models are Right, Most are Useless”

A counter to George Box saying: “all models are wrong, some are useful.” by Thad Tarpey. Pointer to slides for the presentation.

Covers the fallacy of “reification” (in the modeling sense) among other amusements.

Useful to remember that maps are approximations as well.

R Regression Diagnostics Part 1

Monday, January 23rd, 2012

R Regression Diagnostics Part 1 By Vik Paruchuri.

From the post:

Linear regression can be a fast and powerful tool to model complex phenomena. However, it makes several assumptions about your data, and quickly breaks down when these assumptions, such as the assumption that a linear relationship exists between the predictors and the dependent variable, break down. In this post, I will introduce some diagnostics that you can perform to ensure that your regression does not violate these basic assumptions. To begin with, I highly suggest reading this article on the major assumptions that linear regression is predicated on.

Just like any other tool, the more you know about it, the better use you will make of it.

Domain Adaptation with Hierarchical Logistic Regression

Thursday, October 6th, 2011

Domain Adaptation with Hierarchical Logistic Regression

Bob Carpenter continues his series on domain adaptation:

Last post, I explained how to build hierarchical naive Bayes models for domain adaptation. That post covered the basic problem setup and motivation for hierarchical models.

Hierarchical Logistic Regression

Today, we’ll look at the so-called (in NLP) “discriminative” version of the domain adaptation problem. Specifically, using logistic regression. For simplicity, we’ll stick to the binary case, though this could all be generalized to K-way classifiers.

Logistic regression is more flexible than naive Bayes in allowing other features (aka predictors) to be brought in along with the words themselves. We’ll start with just the words, so the basic setup look more like naive Bayes.

Machine Learning

Wednesday, March 30th, 2011

Machine Learning

From the site:

This page documents all the machine learning algorithms present in the library. In particular, there are algorithms for performing classification, regression, clustering, anomaly detection, and feature ranking, as well as algorithms for doing more specialized computations.

A good tutorial and introduction to the general concepts used by most of the objects in this part of the library can be found in the svm example program. After reading this example another good one to consult would be the model selection example program. Finally, if you came here looking for a binary classification or regression tool then I would try the krr_trainer first as it is generally the easiest method to use.

The major design goal of this portion of the library is to provide a highly modular and simple architecture for dealing with kernel algorithms….

Update: Dlib – machine learning. Why I left out the library name I cannot say. Sorry!