Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 17, 2014

Scikit-learn 0.15 release

Filed under: Machine Learning,Python,Scikit-Learn — Patrick Durusau @ 6:16 pm

Scikit-learn 0.15 release by Gaël Varoquaux.

From the post:

Highlights:

Quality— Looking at the commit log, there has been a huge amount of work to fix minor annoying issues.

Speed— There has been a huge effort put in making many parts of scikit-learn faster. Little details all over the codebase. We do hope that you’ll find that your applications run faster. For instance, we find that the worst case speed of Ward clustering is 1.5 times faster in 0.15 than 0.14. K-means clustering is often 1.1 times faster. KNN, when used in brute-force mode, got faster by a factor of 2 or 3.

Random Forest and various tree methods— The random forest and various tree methods are much much faster, use parallel computing much better, and use less memory. For instance, the picture on the right shows the scikit-learn random forest running in parallel on a fat Amazon node, and nicely using all the CPUs with little RAM usage.

Hierarchical aglomerative clusteringComplete linkage and average linkage clustering have been added. The benefit of these approach compared to the existing Ward clustering is that they can take an arbitrary distance matrix.

Robust linear models— Scikit-learn now includes RANSAC for robust linear regression.

HMM are deprecated— We have been discussing for a long time removing HMMs, that do not fit in the focus of scikit-learn on predictive modeling. We have created a separate hmmlearn repository for the HMM code. It is looking for maintainers.

And much more— plenty of “minor things”, such as better support for sparse data, better support for multi-label data…

Get thee to Scikit-learn!

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress