The reality is most A/B tests fail, and Facebook is here to help by Kaiser Fung.
From the post:
Two years ago, Wired breathlessly extolled the virtues of A/B testing (link). A lot of Web companies are in the forefront of running hundreds or thousands of tests daily. The reality is that most A/B tests fail.
A/B tests fail for many reasons. Typically, business leaders consider a test to have failed when the analysis fails to support their hypothesis. “We ran all these tests varying the color of the buttons, and nothing significant ever surfaced, and it was all a waste of time!” For smaller websites, it may take weeks or even months to collect enough samples to read a test, and so business managers are understandably upset when no action can be taken at its conclusion. It feels like waiting for the train which is running behind schedule.
Bad outcome isn’t the primary reason for A/B test failure. The main ways in which A/B tests fail are:
- Bad design (or no design);
- Bad execution;
- Bad measurement.
These issues are often ignored or dismissed. They may not even be noticed if the engineers running the tests have not taken a proper design of experiments class. However, even though I earned an A at school, it wasn’t until I started running real-world experiments that I really learned the subject. This is an area in which theory and practice are both necessary.
The Facebook Data Science team just launched an open platform for running online experiments, called PlanOut. This looks like a helpful tool to avoid design and execution problems. I highly recommend looking into how to integrate it with your website. An overview is here, and a more technical paper (PDF) is also available. There is a github page.
The rest of this post gets into some technical, sausage-factory stuff, so be warned.
For all of your software tests, do you run any A/B tests on your interface?
Or is your response to UI criticism, “…well, but all of us like it.” That’s a great test for a UI.
If you don’t read any other blog post this weekend, read Kaiser’s take on A/B testing.