Random Walks on the Click Graph by Nick Craswell and Martin Szummer.
Search engines can record which documents were clicked for which query, and use these query-document pairs as ‘soft’ relevance judgments. However, compared to the true judgments, click logs give noisy and sparse relevance information. We apply a Markov random walk model to a large click log, producing a probabilistic ranking of documents for a given query. A key advantage of the model is its ability to retrieve relevant documents that have not yet been clicked for that query and rank those effectively. We conduct experiments on click logs from image search, comparing our (‘backward’) random walk model to a different (‘forward’) random walk, varying parameters such as walk length and self-transition probability. The most effective combination is a long backward walk with high self-transition probability.
Two points that may capture your interest:
- The model does not consider query or document content. “Just the clicks, Ma’am.”
- Image data is said to have “less noise” since users can see thumbnails before they follow a link. (True?)
I saw this cited quite recently but it is about five years old now (2007). Any recent literature on click graphs that you would point out?