Orthogonal Nonnegative Matrix Tri-factorization for Semi-supervised Document Co-clustering Authors: Huifang Ma, Weizhong Zhao, Qing Tan and Zhongzhi Shi Keywords: Semi-supervised Clustering, Pairwise Constraints, Word-Level Constraints, Nonnegative Matrix tri-Factorization
Abstract:
Semi-supervised clustering is often viewed as using labeled data to aid the clustering process. However, existing algorithms fail to consider dual constraints between data points (e.g. documents) and features (e.g. words). To address this problem, in this paper, we propose a novel semi-supervised document co-clustering model OSS-NMF via orthogonal nonnegative matrix tri-factorization. Our model incorporates prior knowledge both on document and word side to aid the new word-category and document-cluster matrices construction. Besides, we prove the correctness and convergence of our model to demonstrate its mathematical rigorous. Our experimental evaluations show that the proposed document clustering model presents remarkable performance improvements with certain constraints.
Questions:
- Relies on user input, but is the user input transferable? Or is it document/collection specific? (3-5 pages, no citations)
- Is document level retrieval too coarse? (discussion)
- Subset selection, understandable for testing/development. Doesn’t it seem odd no tests were done against entire collections? (discussion)
- What of the exclusion of words that occur less than 3 times? Aren’t infrequent terms more likely to be significant? (3-5 pages, no citations)