The full content of Annals is available to subscribers

Subscribe/Learn More  >
Editorials |

Beyond the Usual Prediction Accuracy Metrics: Reporting Results for Clinical Decision Making

A. Russell Localio, PhD; and Steven Goodman, MD, MHS, PhD
[+] Article, Author, and Disclosure Information

From University of Pennsylvania, Philadelphia, PA 19104, and Stanford University School of Medicine, Stanford, CA 94305.

Potential Conflicts of Interest: Disclosures can be viewed at www.acponline.org/authors/icmje/ConflictOfInterestForms.do?msNum=M12-1744.

Requests for Single Reprints: A. Russell Localio, PhD, Center for Clinical Epidemiology and Biostatistics, University of Pennsylvania, 635 Blockley Hall, 423 Guardian Drive, Philadelphia, PA 19104; e-mail, rlocalio@mail.med.upenn.edu.

Current Author Addresses: Dr. Localio: Center for Clinical Epidemiology and Biostatistics, University of Pennsylvania, 635 Blockley Hall, 423 Guardian Drive, Philadelphia, PA 19104.

Dr. Goodman: Stanford University, 259 Campus Drive, HRP/Redwood Building, Stanford, CA 94305.

Ann Intern Med. 2012;157(4):294-295. doi:10.7326/0003-4819-157-4-201208210-00014
Text Size: A A A

In this issue, Raji and colleagues evaluate strategies based on alternative risk-prediction models for the use of computed tomography screening for lung cancer. The editorialists applaud the study because it illustrates that a test's ability to predict or diagnose a disease is not an adequate determination of its clinical usefulness. The relevant question is whether people are better or worse off if the test is used as part of clinical care.

First Page Preview

View Large
First page PDF preview





Citing articles are presented as examples only. In non-demo SCM6 implementation, integration with CrossRef’s "Cited By" API will populate this tab (http://www.crossref.org/citedby.html).


Submit a Comment/Letter
Using relative utility curves to evaluate a new marker for risk prediction
Posted on August 21, 2012
Stuart G. Baker, ScD; Olaide Y. Raji, PhD; Stephen W. Duffy, MSc; Olorunshola F. Agbaje, PhD; David C. Christiani, MD, MPH; Adrian Cassidy, PhD; and John K. Field, PhD
University of Liverpool
Conflict of Interest: Primary Funding Source - Roy Castle Lung Cancer Foundation

We thank Drs. Localio and Goodman for their editorial based on our recent publication on ‘Predictive accuracy of the Liverpool Lung Project (LLP) risk model for stratifying patients for computed tomography screening for lung cancer (1). We would like to clarify and expand on their discussion of reporting results for clinical decision making when evaluating markers for risk prediction, particularly with regard to relative utility curves. Drs. Localio and Goodman note the relevant question is “are people better or worse off if the test is used as part of clinical care?” We agree. Many studies aiming to address the relevant question are observational, involving a risk model fit to data from persons not receiving treatment. Evaluation involves an anticipated benefit of a true positive and an anticipated cost (or harm) of a false positive. Net benefit is the total benefit minus the total harm measured in the same units as benefit. The goal in formal decision making is to select the alternative that produces the greatest net benefit (2). The net benefit of risk prediction is a function of the (anticipated) benefit of a true positive, the (anticipated) cost of a false positive, and the data in the study. A key simplification involves using the cost-benefit ratio instead of separate cost of false positive and benefit of true positive. In a 1975 paper, Pauker and Kassirer showed that the cost-benefit ratio can be transformed into a risk threshold, the risk level at which a person would be indifferent between treatment and no treatment (3). Decision curves (4) and relative utility curves (5, 6) involve a maximum net benefit of risk prediction as a function of the risk threshold. In decision curves, the benefit of a true positive is set to a reference value of 1, so that units of net benefit are the number of true positive equivalents. Relative utility is the ratio of the maximum net benefit of risk prediction to the net benefit of perfect prediction. Conclusions from decision and relative utility curves are similar but the emphasis differs, with the relative utility curve adding perspective. Also relative utility curves have a close link to ROC curves, and it is informative to plot both ROC and relative utility curves (6). An important use of decision and relative utility curves is to evaluate a new marker for risk prediction. A simple calculation that does not require specialized software has been developed in the context of relative utility curves (although it could apply to decision curves). Let Model 1 denote a baseline risk prediction model, and let Model 2 denote a risk prediction model with the same predictors as in Model 1 supplemented by a predictor involving the new marker. Computations begin with risk stratification tables for cases and controls, which is also a starting point for some purely statistical evaluations of risk prediction (7). The rows of the risk stratification table are intervals of predicted risk for Model 1, and the columns are intervals of predicted risk for Model 2. Ideally Models 1 and 2 are derived from separate data and then applied to persons used to form the risk stratification tables. Counts are the number of persons falling into each cell of the risk stratification table. The margins of the tables, which sum the counts, are used to compute risks, false and true positive rates, and relative utilities for Models 1 and 2 (5, 6). This computational approach is easily extended to survival data, case-control data, and confidence intervals (5, 6). With this approach, the evaluation of a new marker is based on the number needed to test (NNTest) which has been previously called the test tradeoff or test threshold. NNTest (at a particular risk threshold) is the minimum number of marker tests per true positive such that risk prediction yields a positive net benefit ; it is a function of the difference in relative utility curves multiplied by the probability of developing disease (5, 6). A large NNTest would be acceptable with an inexpensive, noninvasive ascertainment of a new marker, while only a small NNTest would be acceptable when the ascertainment of a new marker is invasive (5, 6). We agree with Russell Localio and Steven Goodwin’s comment in their final paragraph. First the LLP risk model needs to be adapted to each population, depending on the ethnic and cultural setting, as well as the clinical information available to the participants. Second decision and relative utility curves provide more useful information than measures of predictive accuracy Lastly, we note that there are at least two ways risk prediction models can used with a randomized trial. One approach is to use the risk prediction model developed previously from persons not receiving treatment as a criterion for eligibility in both arms of the trial. For example, participants in the UK Lung cancer CT screening trial (UKLS) will be selected on the basis of their LLP risk (8). Another approach, with more ambitious data requirements, is the following modification of the adaptive signature design (9, 10). Data from a randomized trial are randomly split into a training and test sample. A risk prediction model is fit to data from each arm in a training sample. The difference in risk between the two arms is used to identify a promising subgroup and evaluate treatment in that subgroup.


1. Raji YR, Duffy SW, Agbaje OF, Baker SG, Christiani DC, Cassidy A, Field JK. Predictive accuracy of the Liverpool Lung Project Risk Model for stratifying patients for computed tomography screening for lung cancer: A case-control and cohort validation study. Ann Intern Med. 2012, 21:128-138.

2. Stokey E, Zechauser R. A primer for policy analysis. New York: W.W. Norton Company; 1978.

3. Pauker SG, Kassirer JP. Therapeutic decision making: a cost-benefit analysis. N Engl J Med. 1975; 293:229-234.

4. Vickers AJ, Cronin AM, Elkin EB, and Gonen M. Extensions to decision curve analysis, a novel method for evaluating diagnostic tests, prediction models and molecular markers. BMC Med Inform Decis Mak. 2008; 8:53.

5. Baker SG. Putting risk prediction in perspective: relative utility curves. J Natl Cancer Inst. 2009; 101:1538-154.

6. Baker SG, Van Calster B, Steyerberg EW. Evaluating a new marker for risk prediction using the test tradeoff: An update. I J Biostat. 2012; 8:5.

7. Janes H, Pepe MS, Gu W. Assessing the value of risk predictions by using risk stratification tables. Ann Intern Med. 2008; 149:751 - 760.

8. Baldwin DR, Duffy SW, Wald NJ, Page R, Hansell DM, Field JK: UK Lung Screen (UKLS) nodule management protocol: modelling of a single screen randomised controlled trial of low-dose CT screening for lung cancer. Thorax 2011; 66:308-313.

9. Freidlin B, Simon R. Adaptive signature design: an adaptive clinical trial design for generating and prospectively testing a gene expression signature for sensitive patients. Clin Cancer Res. 2005; 11:7872-7878.

 10. Baker SG, Kramer BS, Sargent DJ, Bonetti M. Biomarkers, subgroup evaluation, and trial design. Discovery Medicine 2012; 13:187-192.

Submit a Comment/Letter

Summary for Patients

Clinical Slide Sets

Terms of Use

The In the Clinic® slide sets are owned and copyrighted by the American College of Physicians (ACP). All text, graphics, trademarks, and other intellectual property incorporated into the slide sets remain the sole and exclusive property of the ACP. The slide sets may be used only by the person who downloads or purchases them and only for the purpose of presenting them during not-for-profit educational activities. Users may incorporate the entire slide set or selected individual slides into their own teaching presentations but may not alter the content of the slides in any way or remove the ACP copyright notice. Users may make print copies for use as hand-outs for the audience the user is personally addressing but may not otherwise reproduce or distribute the slides by any means or media, including but not limited to sending them as e-mail attachments, posting them on Internet or Intranet sites, publishing them in meeting proceedings, or making them available for sale or distribution in any unauthorized form, without the express written permission of the ACP. Unauthorized use of the In the Clinic slide sets will constitute copyright infringement.


Buy Now for $32.00

to gain full access to the content and tools.

Want to Subscribe?

Learn more about subscription options

Related Articles
Topic Collections
PubMed Articles
Forgot your password?
Enter your username and email address. We'll send you a reminder to the email address on record.