From the webpage:
MPI LIBLINEAR is an extension of LIBLINEAR on distributed environments. The usage and the data format are the same as LIBLINEAR. Currently only two solvers are supported:
- L2-regularized logistic regression (LR)
- L2-regularized L2-loss linear SVM
NOTICE: This extension can only run on Unix-like systems. (We test it on Ubuntu 13.10.) Python and Matlab interfaces are not supported.
Spark LIBLINEAR is a Spark implementation based on LIBLINEAR and integrated with Hadoop distributed file system. This package is developed using Scala. Currently it supports the same two solvers as MPI LIBLINEAR.
If you are unfamiliar with LIBLINEAR:
LIBLINEAR is a linear classifier for data with millions of instances and features. It supports
- L2-regularized classifiers
L2-loss linear SVM, L1-loss linear SVM, and logistic regression (LR)
- L1-regularized classifiers (after version 1.4)
L2-loss linear SVM and logistic regression (LR)
- L2-regularized support vector regression (after version 1.9)
L2-loss linear SVR and L1-loss linear SVR.
Main features of LIBLINEAR include
- Same data format as LIBSVM, our general-purpose SVM solver, and also similar usage
- Multi-class classification: 1) one-vs-the rest, 2) Crammer & Singer
- Cross validation for model selection
- Probability estimates (logistic regression only)
- Weights for unbalanced data
- MATLAB/Octave, Java, Python, Ruby interfaces
You will also find instructions for creating distributed environments using VirtualBox for both MPI LIBLINEAR and Spark LIBLINEAR. I am going to post on that separately to draw attention to it.
The phrase “standalone computer” is rapidly becoming a misnomer. Forward looking algorithm designers and power users will begin gaining experience with the new distributed “normal,” at every opportunity.
I first saw this in a tweet by Reynold Xin.