DM SIG “Bayesian Statistical Reasoning ” 5/23/2011 by Prof. David Draper, PhD.
I think you will be surprised at how interesting and even compelling this presentation becomes at points. Particularly his comments early in the presentation about needing an analogy machine, to find things not expressed in the way you usually look for them. And he has concrete examples of where that has been needed.
Title: Bayesian Statistical Reasoning: an inferential, predictive and decision-making paradigm for the 21st century
Professor Draper gives examples of Bayesian inference, prediction and decision-making in the context of several case studies from medicine and health policy. There will be points of potential technical interest for applied mathematicians, statisticians, and computer scientists. Broadly speaking, statistics is the study of uncertainty: how to measure it well, and how to make good choices in the face of it. Statistical activities are of four main types: description of a data set, inference about the underlying process generating the data, prediction of future data, and decision-making under uncertainty. The last three of these activities are probability based. Two main probability paradigms are in current use: the frequentist (or relative-frequency) approach, in which you restrict attention to phenomena that are inherently repeatable under “identical” conditions and define P(A) to be the limiting relative frequency with which A would occur in hypothetical repetitions, as n goes to infinity; and the Bayesian approach, in which the arguments A and B of the probability operator P(A|B) are true-false propositions (with the truth status of A unknown to you and B assumed by you to be true), and P(A|B) represents the weight of evidence in favor of the truth of A, given the information in B. The Bayesian approach includes the frequentest paradigm as a special case,so you might think it would be the only version of probability used in statistical work today, but (a) in quantifying your uncertainty about something unknown to you, the Bayesian paradigm requires you to bring all relevant information to bear on the calculation; this involves combining information both internal and external to the data you’ve gathered, and (somewhat strangely) the external-information part of this approach was controversial in the 20th century, and (b) Bayesian calculations require approximating high-dimensional integrals (whereas the frequentist approach mainly relies on maximization rather than integration), and this was a severe limitation to the Bayesian paradigm for a long time (from the 1750s to the 1980s). The external-information problem has been solved by developing methods that separately handle the two main cases: (1) substantial external information, which is addressed by elicitation techniques, and (2) relatively little external information, which is covered by any of several methods for (in the jargon) specifying diffuse prior distributions. Good Bayesian work also involves sensitivity analysis: varying the manner in which you quantify the internal and external information across reasonable alternatives, and examining the stability of your conclusions. Around 1990 two things happened roughly simultaneously that completely changed the Bayesian computational picture: * Bayesian statisticians belatedly discovered that applied mathematicians (led by Metropolis), working at the intersection between chemistry and physics in the 1940s, had used Markov chains to develop a clever algorithm for approximating integrals arising in thermodynamics that are similar to the kinds of integrals that come up in Bayesian statistics, and * desk-top computers finally became fast enough to implement the Metropolis algorithm in a feasibly short amount of time. As a result of these developments, the Bayesian computational problem has been solved in a wide range of interesting application areas with small-to-moderate amounts of data; with large data sets, variational methods are available that offer a different approach to useful approximate solutions. The Bayesian paradigm for uncertainty quantification does appear to have one remaining weakness, which coincides with a strength of the frequentest paradigm: nothing in the Bayesian approach to inference and prediction requires you to pay attention to how often you get the right answer (thisis a form of calibration of your uncertainty assessments), which is an activity that’s (i) central to good science and decision-making and (ii) natural to emphasize from the frequentist point of view. However, it has recently been shown that calibration can readily be brought into the Bayesian story by means of decision theory, turning the Bayesian paradigm into an approach that is (in principle) both logically internally consistent and well-calibrated. In this talk I’ll (a) offer some historical notes about how we have arrived at the present situation and (b) give examples of Bayesian inference, prediction and decision-making in the context of several case studies from medicine and health policy. There will be points of potential technical interest for applied mathematicians, statisticians and computer scientists.