The history of the vector space model by Suresh Venkatasubramanian.
From the post:
Gerald Salton is generally credited with the invention of the vector space model: the idea that we could represent a document as a vector of keywords and use things like cosine similarity and dimensionality reduction to compare documents and represent them.
But the path to this modern interpretation was a lot twistier than one might think. David Dubin wrote an article in 2004 titled ‘The Most Influential Paper Gerard Salton Never Wrote‘. In it, he points out that most citations that refer to the vector space model refer to a paper that doesn’t actually exist (hence the title). Taking that as a starting point, he then traces the lineage of the ideas in Salton’s work.
…
Suresh summarizes some of the discoveries made by Dubin in his post but this sounds like an interesting research project. To take Dubin’s article as a starting point and follow the development of the vector space model.
Particularly since it is used so often for “similarity.” Understanding the mathematics is good, understanding how that particular model was arrived at would be even better.
Enjoy!