The full content of Annals is available to subscribers

Subscribe/Learn More  >
Ideas and Opinions |

Two Ways of Knowing: Big Data and Evidence-Based MedicineBig Data and Evidence-Based Medicine

Ida Sim, MD, PhD
[+] Article, Author, and Disclosure Information

This article was published at www.annals.org on 26 January 2016.

From University of California, San Francisco, San Francisco, California.

Presented in part at the 3rd Annual Cochrane Lecture, Vienna, Austria, 4 October 2015 (available at www.youtube.com/watch?v=RgOgcs95fRk).

Disclosures: Disclosures can be viewed at www.acponline.org/authors/icmje/ConflictOfInterestForms.do?msNum=M15-2970.

Requests for Single Reprints: Ida Sim, MD, PhD, Division of General Internal Medicine, University of California, San Francisco, 1545 Divisadero Street, Suite 308, San Francisco, CA 94143-0320; e-mail, ida.sim@ucsf.edu.

Author Contributions: Conception and design: I. Sim.

Drafting of the article: I. Sim.

Critical revision of the article for important intellectual content: I. Sim.

Final approval of the article: I. Sim.

Administrative, technical, or logistic support: I. Sim.

Ann Intern Med. 2016;164(8):562-563. doi:10.7326/M15-2970
© 2016 American College of Physicians
Text Size: A A A

Evidence-based medicine and big data are very different approaches to producing evidence. In this commentary, the author posits that “combining these 2 ways of knowing offers the best path for enlarging and strengthening the knowledge base of clinical medicine.”

First Page Preview

View Large
First page PDF preview


Grahic Jump Location
Appendix Figure.

Taxonomy of traditional and big data study types.

Clinical studies include descriptive studies, which aim to describe a state of affairs, and analytic studies, which aim to quantify a relationship. Blue boxes represent traditional clinical study designs. Orange boxes represent examples of big data methods. Both traditional and big data methods are applicable to modeling and simulation. Adapted from reference 6.

Grahic Jump Location




Citing articles are presented as examples only. In non-demo SCM6 implementation, integration with CrossRef’s "Cited By" API will populate this tab (http://www.crossref.org/citedby.html).


Submit a Comment/Letter
Bayes is Back
Posted on May 10, 2016
Gregory Mints, M.D., F.A.C.P., Deanna P. Jannat-Khah, DrPH, MSPH, Arthur Thomas Evans, M.D., M.P.H
Weill Cornell Medical College
Conflict of Interest: None Declared
We·agree·that·"big·data"·not·only·has·caused·substantial·disruption·to·statistical·science but also offers promises for improving EBM and the practice of medicine. The main impact, in our opinion, is in its challenge to the conventional frequentist statistics (exemplified by p-values), and its re-invigoration of the Bayesian approach.

Consider the recent governmental promotion of hospital ratings on various objective (such as in-hospital mortality) and subjective (such as patient ratings of the quality of physician communication) performance measures. Under the classical frequentist paradigm, the individual hospital’s mean score is the best estimate of its performance. However, it has been known since the 1950s that individual mean scores are invalid as estimates of performance (1, 2). The best metric, paradoxically, depends not just on the individual, but on the performance of all other individuals being evaluated (1-3). What sense does this make? What do my communication skills have to do with yours? Simply, collective performance establishes the benchmark (or base rate) of what can be expected. Any deviation from the expected is due to the combination of pure chance and true difference in performance. Consequently, all individual scores should be adjusted to reflect the role of chance. Each score is thus “pulled” towards the overall mean, with the magnitude of the “pull” directly related to the deviation from the expected: wild outliers will be “pulled in” more; those closer to the overall mean, only a little. As a result, the variability—the spread—of the individual scores after adjustment is reduced and the distribution will be shrunken. Similar to the concept of “regression to the mean,” spectacular scores may not really represent spectacular performance, and horrific scores may not indicate truly terrible performance.

This “big data” method of shrinking predictions to be closer to the overall mean has major practical implications in a healthcare system where rewards and punishments are tied directly to the above metrics. A similar logic powers a multitude of “big data” applications, from gene chips (4) to models of disease epidemics (5). At the heart of this reasoning is the Bayesian view that estimates and their certainty are determined not only by the current data sample, but also by prior expectations. Bayesian analysis is a very powerful tool, but its major limitation is that priors are commonly unknown and have to be either arbitrarily chosen or guessed, often yielding wildly inaccurate results. When the amount of data is large, however, this limitation can be overcome. The sample, itself, can be thought of as containing its own priors, which can now be measured with great accuracy.

Bayes is back.

1. Stein C. Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution. In: Neyman J, ed. the Third Berkeley Symposium on Mathematical Statistics and Probability, 1956 1955. Statistical Laboratory University of California, Berkeley, California. University of California Press, Berkeley, California: 197-206.
2. Robbins H. An Empirical Bayes Approach to Statistics. In: Neyman J, ed. the Third Berkeley Symposium on Mathematical Statistics and Probability, 1956 1955. Statistical Laboratory University of California, Berkeley, California. University of California Press: 157-63.
3. Efron B. Large-scale inference: empirical Bayes methods for estimation, testing, and prediction: Cambridge University Press; 2010.
4. Efron B, Tibshirani R. Empirical Bayes methods and false discovery rates for microarrays. Genetic epidemiology. 2002;23(1):70-86.
5. Brooks LC, Farrow DC, Hyun S, Tibshirani RJ, Rosenfeld R. Flexible Modeling of Epidemics with an Empirical Bayes Framework. PLoS Comput Biol. 2015;11(8):e1004382.
Submit a Comment/Letter

Summary for Patients

Clinical Slide Sets

Terms of Use

The In the Clinic® slide sets are owned and copyrighted by the American College of Physicians (ACP). All text, graphics, trademarks, and other intellectual property incorporated into the slide sets remain the sole and exclusive property of the ACP. The slide sets may be used only by the person who downloads or purchases them and only for the purpose of presenting them during not-for-profit educational activities. Users may incorporate the entire slide set or selected individual slides into their own teaching presentations but may not alter the content of the slides in any way or remove the ACP copyright notice. Users may make print copies for use as hand-outs for the audience the user is personally addressing but may not otherwise reproduce or distribute the slides by any means or media, including but not limited to sending them as e-mail attachments, posting them on Internet or Intranet sites, publishing them in meeting proceedings, or making them available for sale or distribution in any unauthorized form, without the express written permission of the ACP. Unauthorized use of the In the Clinic slide sets will constitute copyright infringement.


Buy Now for $32.00

to gain full access to the content and tools.

Want to Subscribe?

Learn more about subscription options

Related Articles
Topic Collections
PubMed Articles
Forgot your password?
Enter your username and email address. We'll send you a reminder to the email address on record.