Mining the Web to Predict Future Events by Kira Radinsky and Eric Horvitz.
Abstract:
We describe and evaluate methods for learning to forecast forthcoming events of interest from a corpus containing 22 years of news stories. We consider the examples of identifying significant increases in the likelihood of disease outbreaks, deaths, and riots in advance of the occurrence of these events in the world. We provide details of methods and studies, including the automated extraction and generalization of sequences of events from news corpora and multiple web resources. We evaluate the predictive power of the approach on real-world events withheld from the system.
The paper starts off well enough:
Mark Twain famously said that “the past does not repeat itself, but it rhymes.” In the spirit of this reflection, we develop and test methods for leveraging large-scale digital histories captured from 22 years of news reports from the New York Times (NYT) archive to make real-time predictions about the likelihoods of future human and natural events of interest. We describe how we can learn to predict the future by generalizing sets of specific transitions in sequences of reported news events, extracted from a news archive spanning the years 1986–2008. In addition to the news corpora, we leverage data from freely available Web resources, including Wikipedia, FreeBase, OpenCyc, and GeoNames, via the LinkedData platform [6]. The goal is to build predictive models that generalize from specific sets of sequences of events to provide likelihoods of future outcomes, based on patterns of evidence observed in near-term newsfeeds. We propose the methods as a means of generating actionable forecasts in advance of the occurrence of target events in the world.
But when it gets down to actual predictions, the experiment predicts:
- Cholera following flooding in Bangladesh.
- Riots following police shootings in immigrant/poor neighborhoods.
Both are generally true but I don’t need 22 years worth of New York Times (NYT) archives to make those predictions.
Test offers of predictive advice by asking for specific predictions relevant to your enterprise. Also ask long time staff to make their predictions. Compare the predictions.
Unless the automated solution is significantly better, reward the staff and drive on.
I first saw this in Nat Torkington’s Four short links: 26 December 2013.