Ida Sim, MD, PhD
This article was published at www.annals.org on 26 January 2016.
Presented in part at the 3rd Annual Cochrane Lecture, Vienna, Austria, 4 October 2015 (available at www.youtube.com/watch?v=RgOgcs95fRk).
Disclosures: Disclosures can be viewed at www.acponline.org/authors/icmje/ConflictOfInterestForms.do?msNum=M15-2970.
Requests for Single Reprints: Ida Sim, MD, PhD, Division of General Internal Medicine, University of California, San Francisco, 1545 Divisadero Street, Suite 308, San Francisco, CA 94143-0320; e-mail, email@example.com.
Author Contributions: Conception and design: I. Sim.
Drafting of the article: I. Sim.
Critical revision of the article for important intellectual content: I. Sim.
Final approval of the article: I. Sim.
Administrative, technical, or logistic support: I. Sim.
Sim I.; Two Ways of Knowing: Big Data and Evidence-Based Medicine. Ann Intern Med. 2016;164:562-563. doi: 10.7326/M15-2970
Download citation file:
Published: Ann Intern Med. 2016;164(8):562-563.
Published at www.annals.org on 26 January 2016
Evidence-based medicine (EBM) is more than 20 years old (1). Although EBM's painstaking path of careful clinical studies, critical appraisal of published evidence, and methodologically rigorous systematic reviews has been the template for knowing what works in medicine, new “big data” approaches seem to offer a powerful and tempting alternative. Big data are a distinct “cultural, technological, and scholarly phenomenon” (2) centered on the application of machine learning algorithms to diverse, large-scale data. As clinics and hospitals generate huge amounts of electronic health record (EHR) data and systems like IBM's Watson system combine genomic data, published literature, and EHR data to guide cancer treatment (3), the pace, data sources, and methods for generating medical evidence are changing radically. Traditional clinical researchers rightly wonder whether, how, and why to engage with big data.
Gregory Mints, M.D., F.A.C.P., Deanna P. Jannat-Khah, DrPH, MSPH, Arthur Thomas Evans, M.D., M.P.H
Weill Cornell Medical College
May 10, 2016
Bayes is Back
We·agree·that·"big·data"·not·only·has·caused·substantial·disruption·to·statistical·science but also offers promises for improving EBM and the practice of medicine. The main impact, in our opinion, is in its challenge to the conventional frequentist statistics (exemplified by p-values), and its re-invigoration of the Bayesian approach. Consider the recent governmental promotion of hospital ratings on various objective (such as in-hospital mortality) and subjective (such as patient ratings of the quality of physician communication) performance measures. Under the classical frequentist paradigm, the individual hospital’s mean score is the best estimate of its performance. However, it has been known since the 1950s that individual mean scores are invalid as estimates of performance (1, 2). The best metric, paradoxically, depends not just on the individual, but on the performance of all other individuals being evaluated (1-3). What sense does this make? What do my communication skills have to do with yours? Simply, collective performance establishes the benchmark (or base rate) of what can be expected. Any deviation from the expected is due to the combination of pure chance and true difference in performance. Consequently, all individual scores should be adjusted to reflect the role of chance. Each score is thus “pulled” towards the overall mean, with the magnitude of the “pull” directly related to the deviation from the expected: wild outliers will be “pulled in” more; those closer to the overall mean, only a little. As a result, the variability—the spread—of the individual scores after adjustment is reduced and the distribution will be shrunken. Similar to the concept of “regression to the mean,” spectacular scores may not really represent spectacular performance, and horrific scores may not indicate truly terrible performance. This “big data” method of shrinking predictions to be closer to the overall mean has major practical implications in a healthcare system where rewards and punishments are tied directly to the above metrics. A similar logic powers a multitude of “big data” applications, from gene chips (4) to models of disease epidemics (5). At the heart of this reasoning is the Bayesian view that estimates and their certainty are determined not only by the current data sample, but also by prior expectations. Bayesian analysis is a very powerful tool, but its major limitation is that priors are commonly unknown and have to be either arbitrarily chosen or guessed, often yielding wildly inaccurate results. When the amount of data is large, however, this limitation can be overcome. The sample, itself, can be thought of as containing its own priors, which can now be measured with great accuracy. Bayes is back.1. Stein C. Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution. In: Neyman J, ed. the Third Berkeley Symposium on Mathematical Statistics and Probability, 1956 1955. Statistical Laboratory University of California, Berkeley, California. University of California Press, Berkeley, California: 197-206.2. Robbins H. An Empirical Bayes Approach to Statistics. In: Neyman J, ed. the Third Berkeley Symposium on Mathematical Statistics and Probability, 1956 1955. Statistical Laboratory University of California, Berkeley, California. University of California Press: 157-63.3. Efron B. Large-scale inference: empirical Bayes methods for estimation, testing, and prediction: Cambridge University Press; 2010.4. Efron B, Tibshirani R. Empirical Bayes methods and false discovery rates for microarrays. Genetic epidemiology. 2002;23(1):70-86.5. Brooks LC, Farrow DC, Hyun S, Tibshirani RJ, Rosenfeld R. Flexible Modeling of Epidemics with an Empirical Bayes Framework. PLoS Comput Biol. 2015;11(8):e1004382.
to gain full access to the content and tools.
Learn more about subscription options.
Register Now for a free account.
Results provided by:
Copyright © 2016 American College of Physicians. All Rights Reserved.
Print ISSN: 0003-4819 | Online ISSN: 1539-3704
Conditions of Use
This PDF is available to Subscribers Only