From datasets to algorithms in R by John Johnson.
From the post:
Many statistical algorithms are taught and implemented in terms of linear algebra. Statistical packages often borrow heavily from optimized linear algebra libraries such as LINPACK, LAPACK, or BLAS. When implementing these algorithms in systems such as Octave or MATLAB, it is up to you to translate the data from the use case terms (factors, categories, numerical variables) into matrices.
In R, much of the heavy lifting is done for you through the formula interface. Formulas resemble y ~ x1 + x2 + …, and are defined in relation to a data.frame….
Interesting to consider if R would be useful language for exploring similarity measures? After all, in Analysis of Amphibian Biodiversity Data I pointed out work that reviewed forty-six (46) similarity measures. I suspect that is a small percentage of all similarity measures. I remember a report that said (in an astronomy context) that more than 100 algorithms/data models for data integration appear every month.
Obviously a rough guess/estimate but one that should give us pause in terms of being too wedded to one measure of similarity or another.
Suggestions of existing collections of similarity measures? Either in literature or code?
Thinking it would be instructive to throw some of the open government data sources against similarity measures.