Normalized Kernels as Similarity Indices Authors(s): Julien Ah-Pine Keywords Kernels normalization, similarity indices, kernel PCA based clustering
Abstract:
Measuring similarity between objects is a fundamental issue for numerous applications in data-mining and machine learning domains. In this paper, we are interested in kernels. We particularly focus on kernel normalization methods that aim at designing proximity measures that better fit the definition and the intuition of a similarity index. To this end, we introduce a new family of normalization techniques which extends the cosine normalization. Our approach aims at refining the cosine measure between vectors in the feature space by considering another geometrical based score which is the mapped vectors’ norm ratio. We show that the designed normalized kernels satisfy the basic axioms of a similarity index unlike most unnormalized kernels. Furthermore, we prove that the proposed normalized kernels are also kernels. Finally, we assess these different similarity measures in the context of clustering tasks by using a kernel PCA based clustering approach. Our experiments employing several real-world datasets show the potential benefits of normalized kernels over the cosine normalization and the Gaussian RBF kernel.
Points out that some methods don’t result in an object being found to be most similar to…itself. What an odd result.
Moreover, it is possible for vectors the represent different scores to be treated as identical.
Questions:
- What axioms of similarity indexes should we take notice of? (3-5 pages, citations)
- What methods treat vectors with different scores as identical? (3-5 pages, citations)
- Are geometric based similarity indices measuring semantic or geometric similarity? Are those the same concepts or different concepts? (10-15 pages, citations, you can make this a final paper if you like.)