From HPCC Systems:
An extensible set of Machine Learning (ML) and Matrix processing algorithms to assist with business intelligence; covering supervised and unsupervised learning, document and text analysis, statistics and probabilities, and general inductive inference related problems.
The ML project is designed to create an extensible library of fully parallel machine learning routines; the early stages of a bottom up implementation of a set of algorithms which are easy to use and efficient to execute. This library leverages the distributed nature of the HPCC Systems architecture, providing for extreme scalability to both, the high level implementation of the machine learning algorithms and the underlying matrix algebra library, extensible to tens of thousands of features on billions of training examples.
Some of the most representative algorithms in the different areas of machine learning have been implemented, including k-means for clustering, naive bayes classifiers, ordinary linear regression, logistic regression, correlations (including Pearson and Kendalls Tau), and association routines to perform association analysis and pattern prediction. The document tokenization and text classifiers included, with n-gram extraction and analysis, provide the basis to perform statistical grammar inference based natural language processing. Univariate statistics such as mean, median, mode, variance and percentile ranking are supported along with standard statistical measures such as Student-t, Normal, Poisson, Binomial, Negative Binomial and Exponential.
In case you need reminding, this is the open sourced Lexis/Nexis engine.
Unlike algorithms that run on top of summarized big data, these algorithms run on big data.
See if that makes a difference for your use cases.