The Britney Spears Problem by Brian Hayes.
From the article:
Back in 1999, the operators of the Lycos Internet portal began publishing a weekly list of the 50 most popular queries submitted to their Web search engine. Britney Spears—initially tagged a “teen songstress,” later a “pop tart“—was No. 2 on that first weekly tabulation. She has never fallen off the list since then—440 consecutive appearances when I last checked. Other perennials include Pamela Anderson and Paris Hilton. What explains the enduring popularity of these celebrities, so famous for being famous? That’s a fascinating question, and the answer would doubtless tell us something deep about modern culture. But it’s not the question I’m going to take up here. What I’m trying to understand is how we can know Britney’s ranking from week to week. How are all those queries counted and categorized? What algorithm tallies them up to see which terms are the most frequent? (emphasis added)
Deeply interesting discussion on the analysis of stream data and algorithms for the same. Very much worth a close read if you are working on or interested in such issues.
The article concludes:
All this mathematics and algorithmic engineering seems like a lot of work just for following the exploits of a famous “pop tart.” But I like to think the effort might be justified. Years from now, someone will type “Britney Spears” into a search engine and will stumble upon this article listed among the results. Perhaps then a curious reader will be led into new lines of inquiry. (emphasis added)
But what if the user enters “pop tart?” Will they still find this article? Or will it be “hit” number 100,000, which almost no one reaches? As of 20 July 2011, there were some 13 million “hits” for “pop tart” on a popular search engine. I suspect at least some of them are not about Britney Spears.
So, should I encounter a resource about Britney Spears, using the term “pop tart,” how am I going to accumulate those up for posterity?
Or do we all have to winnow search chaff for ourselves?*
*Question for office managers: How much time do you think your staff spends winnowing search chaff already winnowed by another user in your office?