Shogun – A Large Scale Machine Learning Toolbox (3.0.0 release)
Highlights of the Shogun 3.0.0 release:
This release features 8 successful Google Summer of Code projects and it is the result of an incredible effort by our students. All projects come with very cool ipython-notebooks that contain background, code examples and visualizations. These can be found on our webpage!
The projects are:
- Gaussian Processes for binary classification [Roman Votjakov]
- Sampling log-determinants for large sparse matrices [Soumyajit De]
- Metric Learning via LMNN [Fernando Iglesias]
- Independent Component Analysis (ICA) [Kevin Hughes]
- Hashing Feature Framework [Evangelos Anagnostopoulos]
- Structured Output Learning [Hu Shell]
- A web-demo framework [Liu Zhengyang] Other important changes are the change of our build-system to cmake and the addition of clone/equals methods to our base-class. In addition, you get the usual ton of bugfixes, new unit-tests, and new mini-features.
Features:
- In addition, the following features have been added:
- Added method to importance sample the (true) marginal likelihood of a Gaussian Process using a posterior approximation.
- Added a new class for classical probability distribution that can be sampled and whose log-pdf can be evaluated. Added the multivariate Gaussian with various numerical flavours.
- Cross-validation framework works now with Gaussian Processes
- Added nu-SVR for LibSVR class
- Modelselection is now supported for parameters of sub-kernels of combined kernels in the MKL context. Thanks to Evangelos Anagnostopoulos
- Probability output for multi-class SVMs is now supported using various heuristics. Thanks to Shell Xu Hu.
- Added an “equals” method to all Shogun objects that recursively compares all registered parameters with those of another instance — up to a specified accuracy.
- Added a “clone” method to all Shogun objects that creates a deep copy
- Multiclass LDA. Thanks to Kevin Hughes.
- Added a new datatype, complex128_t, for complex numbers. Math functions, support for SGVector/Matrix, SGSparseVector/Matrix, and serialization with Ascii and Xml files added. [Soumyajit De].
- Added mini-framework for numerical integration in one variable. Implemented Gauss-Kronrod and Gauss-Hermite quadrature formulas.
- Changed from configure script to CMake by Viktor Gal.
- Add C++0x and C++11 cmake detection scripts
- ND-Array typmap support for python and octave modular.
Toolbox machine learning lacks the bells and whistles of custom code but it is a great way to experiment with data and machine learning techniques.
Experimenting with data and techniques will help immunize you from the common frauds and deceptions using machine learning techniques.
David Huff wrote How to Lie with Statistics in the 1950’s.
Is there anything equivalent to that for machine learning? Given the technical nature of many of the techniques a guide to what questions to ask, etc., could be a real boon. To one side of machine learning based discussions at least.