Big-data Naive Bayes and Classification Trees with R and Netezza
From the post:
The IBM Netezza analytics appliances combine high-capacity storage for Big Data with a massively-parallel processing platform for high-performance computing. With the addition of Revolution R Enterprise for IBM Netezza, you can use the power of the R language to build predictive models on Big Data.
In the demonstration below, Revolution Analytics’ Derek Norton analyzes loan approval data stored on the IBM appliance. You’ll see the R code used to:
- Explore the raw data (with summary statistics and charts)
- Prepare the data for statistical analysis, and create training and test sets
- Create predictive models using classificiation trees and Naïve Bayes
- Predict using the models, and evaluate model performance using confusion matrices
[embedded presentation omitted]
Note that while R code is being run on Derek’s laptop, the raw data is never moved from the appliance, and the analytic computations take place “in-database” within the appliance itself (where the Revolution R Enterprise engine is also running on each parallel core).
Another incentive for you to be learning R.
Does it sound to you like “Derek’s computer” is a terminal entering instructions that are executed elsewhere? 😉 (If the computing fabric develops fast enough, we may lose the distinction of a “personal” computer. There will simply be computing.)
Meant to mention this the other day. Enjoy!