From the post:
0xdata (www.0xdata.com), the open source machine learning and predictive analytics company for big data, today announced general availability of the latest release of H2O, the industry’s fastest prediction engine for big data users of Hadoop, R and Excel. H2O delivers parallel and distributed advanced algorithms on big data at speeds up to 100X faster than other predictive analytics providers.
The second generation H2O “Fluid Vector” release — currently in use at two of the largest insurance companies in the world, the largest provider of streaming video entertainment and the largest online real estate services company — delivers new levels of performance, ease of use and integration with R. Early H2O customers include Netflix, Trulia and Vendavo.
“We developed H2O to unlock the predictive power of big data through better algorithms,” said SriSatish Ambati, CEO and co-founder of 0xdata. “H2O is simple, extensible and easy to use and deploy from R, Excel and Hadoop. The big data science world is one of algorithm-haves and have-nots. Amazon, Goldman Sachs, Google and Netflix have proven the power of algorithms on data. With our viral and open Apache software license philosophy, along with close ties into the math, Hadoop and R communities, we bring the power of Google-scale machine learning and modeling without sampling to the rest of the world.”
“Big data by itself is useless. It is only when you have big data plus big analytics that one has the capability to achieve big business impact. H2O is the platform for big analytics that we have found gives us the biggest advantage compared with other alternatives,” said Chris Pouliot, Director of Algorithms and Analytics at Netflix and advisor to 0xdata. “Our data scientists can build sophisticated models, minimizing their worries about data shape and size on commodity machines. Over the past year, we partnered with the talented 0xdata team to work with them on building a great product that will meet and exceed our algorithm needs in the cloud.”
…
From the H2O Github page:
H2O makes hadoop do math!
H2O scales statistics, machine learning and math over BigData. H2O is extensible and users can build blocks using simple math legos in the core.
H2O keeps familiar interfaces like R, Excel & JSON so that big data enthusiasts & & experts can explore, munge, model and score datasets using a range of simple to advanced algorithms.
Data collection is easy. Decision making is hard. H2O makes it fast and easy to derive insights from your data through faster and better predictive modelingProduct Vision for first cut:
- H2O, the Analytics Engine will scale Classification and Regression.
- RandomForest, Generalized Linear Modeling (GLM), logistic regression, k-Means, available over R / REST/ JSON-API
- Basic Linear Algebra as building blocks for custom algorithms
- High predictive power of the models
- High speed and scale for modeling and validation over BigData
- Data Sources:
- We read and write from/to HDFS, S3
- We ingest data in CSV format from local and distributed filesystems (nfs)
- A JDBC driver for SQL and DataAdapters for NoSQL datasources is in the roadmap. (v2)
- Adhoc Data Analytics at scale via R-like Parser on BigData
Machine learning is not as ubiquitous as Excel, yet.
But like Excel, the quality of results depends on the skills of the user, not the technology.