Classification and Novel Class Detection in Data Streams with Active Mining Authors(s): Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei Han, Bhavani Thuraisingham
We present ActMiner, which addresses four major challenges to data stream classification, namely, infinite length, concept-drift, concept-evolution, and limited labeled data. Most of the existing data stream classification techniques address only the infinite length and concept-drift problems. Our previous work, MineClass, addresses the concept-evolution problem in addition to addressing the infinite length and concept-drift problems. Concept-evolution occurs in the stream when novel classes arrive. However, most of the existing data stream classification techniques, including MineClass, require that all the instances in a data stream be labeled by human experts and become available for training. This assumption is impractical, since data labeling is both time consuming and costly. Therefore, it is impossible to label a majority of the data points in a high-speed data stream. This scarcity of labeled data naturally leads to poorly trained classifiers. ActMiner actively selects only those data points for labeling for which the expected classification error is high. Therefore, ActMiner extends MineClass, and addresses the limited labeled data problem in addition to addressing the other three problems. It outperforms the state-of-the-art data stream classification techniques that use ten times or more labeled data than ActMiner.
I would have liked this article better had it not said that the details of the test data could be found in another article.
Specifically: Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.M.: “Integrating novel class detection with classification for concept-drifting data streams.” In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD c 2009. LNCS, vol. 5782, pp. 79–94. Springer, Heidelberg (2009)
Which directed me to: “Integrating Novel Class Detection with Classification for Concept-Drifting Data Streams,” http://www.utdallas.edu/?mmm058000/reports/UTDCS-13-09.pdf
I leave it as an exercise for the readers to guess the names of the authors of the last paper.
Otherwise interesting research marred by presentation in dribs and drabs.
Now that I have all three papers I will have to see what questions arise, other than questionable publishing practices.