Archive for the ‘Randomness’ Category

7 Traps to Avoid Being Fooled by Statistical Randomness

Monday, February 16th, 2015

7 Traps to Avoid Being Fooled by Statistical Randomness by Kirk Borne.

From the post:

Randomness is all around us. Its existence sends fear into the hearts of predictive analytics specialists everywhere — if a process is truly random, then it is not predictable, in the analytic sense of that term. Randomness refers to the absence of patterns, order, coherence, and predictability in a system.

Unfortunately, we are often fooled by random events whenever apparent order emerges in the system. In moments of statistical weakness, some folks even develop theories to explain such “ordered” patterns. However, if the events are truly random, then any correlation is purely coincidental and not causal. I remember learning in graduate school a simple joke about erroneous scientific data analysis related to this concept: “Two points in a monotonic sequence display a tendency. Three points in a monotonic sequence display a trend. Four points in a monotonic sequence define a theory.” The message was clear — beware of apparent order in a random process, and don’t be tricked into developing a theory to explain random data.

Suppose I have a fair coin (with a head or a tail being equally likely to appear when I toss the coin). Of the following 3 sequences (each representing 12 sequential tosses of the fair coin), which sequence corresponds to a bogus sequence (i.e., a sequence that I manually typed on the computer)?




(d) None of the above.

In each case, a coin toss of head is listed as “H”, and a coin toss of tail is listed as “T”.

The answer is “(d) None of the Above.”

None of the above sequences was generated manually. They were all actual subsequences extracted from a larger sequence of random coin tosses. I admit that I selected these 3 subsequences non-randomly (which induces a statistical bias known as a selection effect) in order to try to fool you. The small-numbers phenomenon is evident here — it corresponds to the fact that when only 12 coin tosses are considered, the occurrence of any “improbable result” may lead us (incorrectly) to believe that it is statistically significant. Conversely, if we saw answer (b) continuing for dozens of more coin tosses (nothing but Tails, all the way down), then that would be truly significant.

Great post on randomness where Kirk references a fun example using Nobel Prize winners with various statistical “facts” for your amusement.

Kirk suggests a reading pack for partial avoidance of this issue in your work:

  1. Fooled By Randomness“, by Nassim Nicholas Taleb.
  2. The Flaw of Averages“, by Sam L. Savage.
  3. The Drunkard’s Walk – How Randomness Rules Our Lives, by Leonard Mlodinow.

I wonder if you could get Amazon to create a also-bought-with package of those three books? Something you could buy for your friends in big data and intelligence work. ­čśë

Interesting that I saw this just after posting Structuredness coefficient to find patterns and associations. The call on “likely” or “unlikely” comes down to human agency. Yes?