The structural virality of online diffusion by Sharad Goel, Ashton Anderson, Jake Hofman, and Duncan J. Watts.
Viral products and ideas are intuitively understood to grow through a person-to-person diffusion process analogous to the spread of an infectious disease; however, until recently it has been prohibitively difficult to directly observe purportedly viral events, and thus to rigorously quantify or characterize their structural properties. Here we propose a formal measure of what we label “structural virality” that interpolates between two conceptual extremes: content that gains its popularity through a single, large broadcast, and that which grows through multiple generations with any one individual directly responsible for only a fraction of the total adoption. We use this notion of structural virality to analyze a unique dataset of a billion diffusion events on Twitter, including the propagation of news stories, videos, images, and petitions. We find that across all domains and all sizes of events, online diffusion is characterized by surprising structural diversity. Popular events, that is, regularly grow via both broadcast and viral mechanisms, as well as essentially all conceivable combinations of the two. Correspondingly, we find that the correlation between the size of an event and its structural virality is surprisingly low, meaning that knowing how popular a piece of content is tells one little about how it spread. Finally, we attempt to replicate these findings with a model of contagion characterized by a low infection rate spreading on a scale-free network. We find that while several of our empirical findings are consistent with such a model, it does not replicate the observed diversity of structural virality.
Before you get too excited, the authors do not provide a how-to-go-viral manual.
In part because:
Large and potentially viral cascades are therefore necessarily very rare events; hence one must observe a correspondingly large number of events in order to find just one popular example, and many times that number to observe many such events. As we will describe later, in fact, even moderately popular events occur in our data at a rate of only about one in a thousand, while “viral hits” appear at a rate closer to one in a million. Consequently, in order to obtain a representative sample of a few hundred viral hits arguably just large enough to estimate statistical patterns reliably one requires an initial sample on the order of a billion events, an extraordinary data requirement that is difficult to satisfy even with contemporary data sources.
The authors clearly advance the state of research on “viral hits” and conclude with suggestions for future modeling work.
You can imagine the reaction of marketing departments should anyone get closer to designing successful viral advertising.
A good illustration that something we can observe, “viral hits,” in an environment where the spread can be tracked (Twitter), can still resist our best efforts to model and/or explain how to repeat the “viral hit” on command.
A good story to remember when a client claims that some action is transparent. It may well be, but that doesn’t mean there are enough instances to draw any useful conclusions.
I first saw this in a tweet by Steven Strogatz.