Puzzling outcomes in A/B testing by Greg Linden.
Greg writes:
“Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained” (PDF), has a lot of great insights into A/B testing and real issues you hit with A/B testing.
I like where Greg quotes the paper as saying:
When Bing had a bug in an experiment, which resulted in very poor results being shown to users, two key organizational metrics improved significantly: distinct queries per user went up over 10%, and revenue per user went up over 30%! …. Degrading algorithmic results shown on a search engine result page gives users an obviously worse search experience but causes users to click more on ads, whose relative relevance increases, which increases short-term revenue … [This shows] it’s critical to understand that long-term goals do not always align with short-term metrics.
I am not real sure what an “obviously worse search experience” would look like. Maybe I don’t want to know. 😉
Anyway, kudos to Greg for finding an amusing and useful paper on testing.