Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 19, 2012

Big-data Naive Bayes and Classification Trees with R and Netezza

Filed under: Bayesian Data Analysis,Classification Trees,Netezza,R — Patrick Durusau @ 6:54 pm

Big-data Naive Bayes and Classification Trees with R and Netezza

From the post:

The IBM Netezza analytics appliances combine high-capacity storage for Big Data with a massively-parallel processing platform for high-performance computing. With the addition of Revolution R Enterprise for IBM Netezza, you can use the power of the R language to build predictive models on Big Data.

In the demonstration below, Revolution Analytics’ Derek Norton analyzes loan approval data stored on the IBM appliance. You’ll see the R code used to:

  • Explore the raw data (with summary statistics and charts)
  • Prepare the data for statistical analysis, and create training and test sets
  • Create predictive models using classificiation trees and Naïve Bayes
  • Predict using the models, and evaluate model performance using confusion matrices

[embedded presentation omitted]

Note that while R code is being run on Derek’s laptop, the raw data is never moved from the appliance, and the analytic computations take place “in-database” within the appliance itself (where the Revolution R Enterprise engine is also running on each parallel core).

Another incentive for you to be learning R.

Does it sound to you like “Derek’s computer” is a terminal entering instructions that are executed elsewhere? 😉 (If the computing fabric develops fast enough, we may lose the distinction of a “personal” computer. There will simply be computing.)

Meant to mention this the other day. Enjoy!

June 22, 2011

Cloudera – Apache Hadoop Connector for Netezza

Filed under: Hadoop,Netezza — Patrick Durusau @ 6:40 pm

Cloudera Delivers Apache Hadoop Connector for Netezza

From the announcement:

Cloudera Inc., a leading provider of Apache Hadoop-based data management software and services, today announced the immediate general availability of the Cloudera Connector for IBM Netezza appliances. The connector allows Netezza users to leverage Cloudera’s Distribution including Apache Hadoop (CDH) and Cloudera Enterprise services, support and management tools to derive highly articulated analytical insights from large unstructured data sets. The Cloudera Connector, which is the first of its kind for CDH and Cloudera Enterprise, enables high-speed, bilateral data transfer between CDH and Netezza environments.

“As the amount of data that organizations need to process, especially for analytics, continues to increase, Apache Hadoop is increasingly becoming an important data integration tool to enhance performance in reducing very large amounts of data to only what is needed in the data warehouse,” said Donald Feinberg, VP and distinguished analyst at Gartner. “Hadoop presents a viable solution for organizations looking to address the challenges presented by large scale data and has the potential to extend the capabilities of a company’s data warehouse by providing expanded opportunities for analysis and storage for complex data sets.”

See also: VoltDB Announces Hadoop Integration.

I can’t imagine a better environment for promotion of topic maps than “big data.” The more data there is processed, the more semantic integration issues will come to the fore. At least to clients paying the bills for sensible answers. It is sorta like putting teenagers in Indy race cars. It won’t take all that long before some of them will decide they need driving lessons.

Powered by WordPress