Only a small part of Arxiv appears at: http://www.arxiv-sanity.com/ but it is enough to show the feasibility of this approach.
What captures my interest is the potential to substitute/extend the program to use other similarity measures.
Bearing in mind that searching is only the first step towards the acquisition and preservation of knowledge.
PS: I first saw this in a tweet by Data Science Renee.