Archive for the ‘Ensemble Methods’ Category

Random Forests…

Monday, July 7th, 2014

Random Forests of Very Fast Decision Trees on GPU for Mining Evolving Big Data Streams by Diego Marron, Albert Bifet, Gianmarco De Francisci Morales.


Random Forests is a classical ensemble method used to improve the performance of single tree classifiers. It is able to obtain superior performance by increasing the diversity of the single classifiers. However, in the more challenging context of evolving data streams, the classifier has also to be adaptive and work under very strict constraints of space and time. Furthermore, the computational load of using a large number of classifiers can make its application extremely expensive. In this work, we present a method for building Random Forests that use Very Fast Decision Trees for data streams on GPUs. We show how this method can benefit from the massive parallel architecture of GPUs, which are becoming an efficient hardware alternative to large clusters of computers. Moreover, our algorithm minimizes the communication between CPU and GPU by building the trees directly inside the GPU. We run an empirical evaluation and compare our method to two well know machine learning frameworks, VFML and MOA. Random Forests on the GPU are at least 300x faster while maintaining a similar accuracy.

The authors should get a special mention for honesty in research publishing. Figure 11 shows their GPU Random Forest algorithm seeming to scale almost constantly. The authors explain:

In this dataset MOA scales linearly while GPU Random Forests seems to scale almost constantly. This is an effect of the scale, as GPU Random Forests runs in milliseconds instead of minutes.

How fast/large are your data streams?

I first saw this in a tweet by Stefano Bertolo.

Predictive Analytics: Decision Tree and Ensembles [part 5]

Thursday, June 7th, 2012

Predictive Analytics: Decision Tree and Ensembles by Ricky Ho.

From the post:

Continue from my last post of walking down the list of machine learning technique. In this post, I will covered Decision Tree and Ensemble methods. We’ll continue using the iris data we prepare in this earlier post.

Ricky covers Decision Tree to illustrate early machine learning and continue under Ensemble methods to cover Random Forest and Gradient Boosted Trees.

Ricky’s next post will cover performance of the methods he has discussed in this series of posts.

Machine Learning: Ensemble Methods

Sunday, January 15th, 2012

Machine Learning: Ensemble Methods by Ricky Ho.

Ricky gives a brief overview of ensemble methods in machine learning.

Not enough for practical application but enough to orient yourself to learn more.

From the post:

Ensemble Method is a popular approach in Machine Learning based on the idea of combining multiple models. For example, by mixing different machine learning algorithms (e.g. SVM, Logistic regression, Bayesian network), ensemble method can automatically pick the best algorithmic model that fits the data the best. On the other hand, by mixing different parameter set of the same algorithmic model (e.g. Random forest, Boosting tree), it can pick the best set of parameters of the same algorithmic model.

Efficient P2P Ensemble Learning with Linear Models on Fully Distributed Data

Sunday, September 11th, 2011

Efficient P2P Ensemble Learning with Linear Models on Fully Distributed Data by Róbert Ormándi, István Hegedűs, and Márk Jelasity.


Machine learning over fully distributed data poses an important problem in peer-to-peer (P2P) applications. In this model we have one data record at each network node, but without the possibility to move raw data due to privacy considerations. For example, user profiles, ratings, history, or sensor readings can represent this case. This problem is difficult, because there is no possibility to learn local models, yet the communication cost needs to be kept low. Here we propose gossip learning, a generic approach that is based on multiple models taking random walks over the network in parallel, while applying an online learning algorithm to improve themselves, and getting combined via ensemble learning methods. We present an instantiation of this approach for the case of classification with linear models. Our main contribution is an ensemble learning method which-through the continuous combination of the models in the network-implements a virtual weighted voting mechanism over an exponential number of models at practically no extra cost as compared to independent random walks. Our experimental analysis demonstrates the performance and robustness of the proposed approach.

Interesting. In a topic map context, I wonder about creating associations based on information that is not revealed to the peer making the association? Or the peer suggesting the association?

Taxonomy for Characterizing Ensemble Methods in Classification Tasks

Wednesday, September 15th, 2010

Taxonomy for Characterizing Ensemble Methods in Classification Tasks Author: Lior Rokach Keywords: Ensemble-methods; Classification; Boosting; Bagging; Partitioning; Decision trees; Neural networks. Review and annotated bibliography of work on ensemble methods.

Ensemble methods, I like the sound of that.

Extend it to mean human authors + other methods creating a topic map.