The full content of Annals is available to subscribers

Subscribe/Learn More  >
Academia and the Profession |

When Is Measuring Sensitivity and Specificity Sufficient To Evaluate a Diagnostic Test, and When Do We Need Randomized Trials?

Sarah J. Lord, MBBS, MS; Les Irwig, MBBCh, PhD; and R. John Simes, MBBS, MS, MD
[+] Article, Author, and Disclosure Information

From The University of Sydney, Sydney, Australia.

Grant Support: The authors have received funding from the Australian Medical Services Advisory Committee for the development of guidelines for the assessment of diagnostic technologies and National Health and Medical Research Council Program Grants (253602, 402764)

Potential Financial Conflicts of Interest: None disclosed.

Requests for Single Reprints: Sarah J. Lord, MBBS, MS, National Health and Medical Research Council Clinical Trials Centre, University of Sydney, 88 Mallett Street, Camperdown NSW, 2050 Australia; e-mail, sally.lord@ctc.usyd.edu.au.

Current Author Addresses: Drs. Lord and Simes: National Health and Medical Research Council Clinical Trials Centre, University of Sydney, 88 Mallett Street, Camperdown NSW, 2050 Australia.

Dr. Irwig: School of Public Health, A27, Edward Ford Building, University of Sydney, NSW 2006 Australia.

Ann Intern Med. 2006;144(11):850-855. doi:10.7326/0003-4819-144-11-200606060-00011
Text Size: A A A

The clinical value of using a new diagnostic test depends on whether it improves patient outcomes beyond the outcomes achieved using an old diagnostic test. When can studies of diagnostic test accuracy provide sufficient information to infer clinical value, and when do clinicians need to wait for results from randomized trials? The authors argue that accuracy studies suffice if a new diagnostic test is safer or more specific than, but of similar sensitivity to, an old test. However, if a new test is more sensitive than an old test, it leads to the detection of extra cases of disease. Results from treatment trials that enrolled only patients detected by the old test may not apply to these extra cases. Clinicians need to wait for results from randomized trials assessing treatment efficacy in cases detected by the new diagnostic test, unless they can be satisfied that the new test detects the same spectrum and subtype of disease as the old test or that treatment response is similar across the spectrum of disease.


Grahic Jump Location
Figure 1.
Trial evidence versus linked evidence of test accuracy and treatment efficacy.

*Cases detected by the new and old test may not show similar response to treatment.

Grahic Jump Location
Grahic Jump Location
Figure 2.
Assessing new tests using evidence of test accuracy, given that treatment is effective for cases detected by the old test.

RCT = randomized, controlled trial. * New test = diagnostic strategies that include the new test; old test = standard diagnostic strategies that do not include the new test.

Grahic Jump Location




Citing articles are presented as examples only. In non-demo SCM6 implementation, integration with CrossRef’s "Cited By" API will populate this tab (http://www.crossref.org/citedby.html).


Submit a Comment/Letter
Comment for Lord
Posted on December 11, 2006
Gabe S Sonke
Netherlands Cancer Institute
Conflict of Interest: None Declared

A Philosophical Approach Solves The Controversy Between Accuracy And Patient Outcome Studies In Diagnostic Test Evaluation

TO THE EDITOR: Lord et al. propose a framework to decide when measuring diagnostic test accuracy is sufficient to evaluate a diagnostic test and when a randomized trial is needed (1). Their work, in accordance with most previous papers on this subject, consider accuracy and patient outcome studies as separate phases in the evaluation of a diagnostic test: First accuracy studies show how well a diagnostic test identifies the true presence or absence of disease. Next, patient outcome studies show whether a patient classification based on the new test better predicts patient outcome. We feel that this phased evaluation of diagnostic tests lacks appreciation of the fact that by definition disease is a prognostic statement and consequently that accuracy studies cannot be interpreted independent of patient outcome.

The prognostic character of disease is intuitively attractive. All diseases, without exception, in some way influence a patient's ability to function according to his or her aims or goals. Having red hair clearly is not a disease, nor is the presence of variations on normal anatomy such as an accessory pancreatic duct, or genetic polymorphisms, that have no bearing on the individual' s functioning. This is what Christopher Boorse referred to when he defined a disease entity as the state of an individual that interferes with or prevents the normal function of some organ or system of organs (2). The normal function of an organ can be seen in relation to value-free goals such as the individual's survival and reproduction (also called an essentialist approach) or to value-laden goals such as welfare and quality of life (holistic approach) (3). Importantly, however, the realization of all these goals takes place in the future. Consequently, disease cannot be defined as a current state of being but must include a reference to a future state.

Accuracy studies correlate a diagnostic test or test regime with the current presence or absence of disease as measured with a gold standard or reference test. However, accepting that disease cannot be defined without reference to a future state logically implies that patient outcome is the ultimate gold standard test. In other words, the ideal diagnostic test perfectly predicts patient outcome and the ideal accuracy study is therefore a patient outcome study. A situation in which the evaluation of each new test requires lengthy follow-up, however, is not desirable. Accuracy studies may bypass the need for follow-up by using a reference test as an approximation of patient outcome. This approximation is only appropriate if the test's relation with patient outcome is known with acceptable certainty and can be quantitatively described.

In conclusion, we believe that accuracy and patient outcome studies have the same objective, i.e., determining how well a diagnostic test classifies patients according to patient outcome (instead of disease presence). Accuracy studies are a means to bypass the need for lengthy follow-up by using a less than perfect reference test that approximates patient outcome. This approximation is only appropriate if its relation with patient outcome can be quantitatively described. In all other circumstances, patient outcome studies must be performed.

Gabe S Sonke, MD, PhD Netherlands Cancer Institute Amsterdam, the Netherlands

André LM Verbeek, MD, PhD Radboud University Nijmegen Medical Centre Nijmegen, the Netherlands

Lambertus ALM Kiemeney, PhD Radboud University Nijmegen Medical Centre Nijmegen, the Netherlands


Lord SJ, Irwig L, Simes RJ. When is measuring sensitivity and specificity sufficient to evaluate a diagnostic test, and when do we need randomized trials? Ann Intern Med. 2006;144:850-855.

Boorse C. Health as a theoretical concept. Philosophy of Science. 1977;44:552-573. Gammelgaard A. Evolutionary biology and the concept of disease. Med Health Care Philos. 2000;3:109"“116.

Conflict of Interest:

None declared

In Response
Posted on January 10, 2007
Sarah J. Lord
NHMRC Clinical Trials Centre, The University of Sydney
Conflict of Interest: None Declared


We agree with Sonke et al that one purpose of test accuracy studies is to determine whether the test provides accurate information about longer term clinical outcomes, ie. what is the prognostic value of the test information? We also agree that an ideal study to achieve this purpose is one in which the reference standard is a good proxy for patient outcome. In the absence of effective treatment, this information would be all that is required. However, if treatment is available, we also need to ask whether the test identifies patients whose outcomes would be improved by using this treatment, ie what is the therapeutic value of testing? This is the issue we addressed in our article. Assessing the therapeutic value of a test does not just involve asking whether it provides accurate prognostic information. We also need to consider whether it identifies patients whose prognosis will improve with treatment (Lijmer et al 2002). In some cases accuracy studies suffice because we already have evidence about effective treatments for the cases detected, in other situations we require new randomized controlled trials to assess the impact of test-and- treatment on patient outcomes (Lord et al 2006).

To illustrate the difference between these two questions, consider a new test, such as a more detailed ultrasound study for detecting acute deep venous thrombosis (DVT). As Sonke et al suggest, an accuracy study would suffice for conclusions about the prognostic value of the test if we are satisfied that the relationship between the spectrum of disease detected by an abnormal venogram (the reference standard) and subsequent patient morbidity and mortality has been adequately established. If not, we would need a patient outcome study to assess the new test. However, in order to assess the therapeutic value of the test, we still need to consider whether detection improves patient outcomes. If there is existing evidence that anticoagulant therapy is effective treatment for the spectrum of disease detected by the new test, a comparison of the accuracy and safety of the new test versus standard testing will suffice for conclusions about its therapeutic value. Now consider the use of D-dimer to identify patients at high risk of DVT recurrence who may benefit from a longer course of anticoagulant therapy. Patient outcome studies have shown the test is prognostic for DVT recurrence, but does this evidence suffice for conclusions about its therapeutic value? Neither simple prognostic studies, nor existing treatment trials allow conclusions about whether using the test and subsequent treatment in this population would improve patient outcomes and therefore a new randomized controlled trial is required. Such a trial has been done and provides evidence about both the prognostic value of testing as well as the effectiveness of treatment in patients testing positive (Palareti et al 2006). Hence, sometimes new trials are required to determine the therapeutic value of testing, while in other circumstances linkage of accuracy studies to existing treatment trials may suffice.

1. Lijmer, J. G. 2002, 'Diagnostic testing and prognosis: the randomised controlled trial in diagnostic research', In: Knottnerus, J.A. (eds), The Evidence Base of Clinical Diagnosis. London: BMJ Books, 61-80.

2. Lord, S. J., Irwig, L., Simes, R. J., 2006. 'When is measuring sensitivity and specificity sufficient to evaluate a diagnostic test, and when do we need randomized trials?', Annals of Internal Medicine 144(11):850-5.

3. Palareti, G., Cosmi, B. et al, 2006. 'D-dimer testing to determine the duration of anticoagulation therapy', New England Journal of Medicine 355(17):1780-9.

Conflict of Interest:

None declared

Submit a Comment/Letter

Summary for Patients

Clinical Slide Sets

Terms of Use

The In the Clinic® slide sets are owned and copyrighted by the American College of Physicians (ACP). All text, graphics, trademarks, and other intellectual property incorporated into the slide sets remain the sole and exclusive property of the ACP. The slide sets may be used only by the person who downloads or purchases them and only for the purpose of presenting them during not-for-profit educational activities. Users may incorporate the entire slide set or selected individual slides into their own teaching presentations but may not alter the content of the slides in any way or remove the ACP copyright notice. Users may make print copies for use as hand-outs for the audience the user is personally addressing but may not otherwise reproduce or distribute the slides by any means or media, including but not limited to sending them as e-mail attachments, posting them on Internet or Intranet sites, publishing them in meeting proceedings, or making them available for sale or distribution in any unauthorized form, without the express written permission of the ACP. Unauthorized use of the In the Clinic slide sets will constitute copyright infringement.


Buy Now for $32.00

to gain full access to the content and tools.

Want to Subscribe?

Learn more about subscription options

Related Articles
Topic Collections
PubMed Articles
Forgot your password?
Enter your username and email address. We'll send you a reminder to the email address on record.