The full content of Annals is available to subscribers

Subscribe/Learn More  >
Academia and the Profession |

When Should a New Test Become the Current Reference Standard?

Paul Glasziou, MB, BS, PhD; Les Irwig, MB, BCh, PhD; and Jonathan J. Deeks, PhD
[+] Article, Author, and Disclosure Information

From the University of Oxford, Oxford, United Kingdom; University of Sydney, Sydney, New South Wales, Australia; and University of Birmingham, Edgbaston, Birmingham, United Kingdom.

Acknowledgment: The authors thank Gordon Guyatt, Ajit Lalvani, Sally Lord, Jenny Doust, and Chris Hyde for their helpful comments on drafts.

Grant Support: In part by funding from a UK National Institute for Health Research program grant and from the Australian National Health and Medical Research Council Program grant 402764 to the Screening and Test Evaluation Program.

Potential Financial Conflicts of Interest: None disclosed.

Requests for Single Reprints: Paul Glasziou, MB, BS, PhD, Centre for Evidence-Based Medicine, Department of Primary Health Care, University of Oxford, Oxford OX3 7LF, United Kingdom; e-mail, paul.glasziou@dphpc.ox.ac.uk.

Current Author Addresses: Dr. Glasziou: Centre for Evidence-Based Medicine, Department of Primary Health Care, University of Oxford, Oxford OX3 7LF, United Kingdom.

Dr. Irwig: Screening and Test Evaluation Program, School of Public Health, and University of Sydney, Sydney, New South Wales 2006, Australia.

Dr. Deeks: Unit of Public Health, Epidemiology, and Biostatistics; University of Birmingham, Edgbaston, Birmingham B15 2TT, United Kingdom.

Ann Intern Med. 2008;149(11):816-821. doi:10.7326/0003-4819-149-11-200812020-00009
Text Size: A A A

The evaluation of claims that a new diagnostic test is better than the current gold standard test is hindered by the lack of a perfect reference judge. However, this problem may be sidestepped by focusing on the clinical consequences of the decision rather than on estimation of accuracy. Consequences can be assessed by use of a “fair umpire” test that is not perfect yet can discriminate between disease and nondisease cases and is not biased in favor of 1 test.This article discusses 3 principles to aid judgments about the value of new tests. First, the consequences are best examined in cases with disagreement between the current and new tests. Second, resolving these disagreements requires a fair, but not necessarily perfect, umpire test. Finally, umpire tests include consequences, such as prognosis and response to treatment, as well as causal exposures and other test results.


Grahic Jump Location
Figure 1.
Results of 2 tests for tuberculosis, stratified by exposure to the index case.

The tuberculin skin test (TST) is the old test, and the enzyme-linked immunospot (ELI) assay is the new test. E+++ = class of index case; E++ = classes of students who regularly shared classes with the index case; E+ = students in 4 classes of the same year who did not regularly share classes with the index case; Non-E = students in different years. (Reproduced from Ewer and colleagues [20], with permission of The Lancet.)

Grahic Jump Location
Grahic Jump Location
Figure 2.
Possible comparison points for additional cases.
Grahic Jump Location




Citing articles are presented as examples only. In non-demo SCM6 implementation, integration with CrossRef’s "Cited By" API will populate this tab (http://www.crossref.org/citedby.html).


Submit a Comment/Letter
Forget about the gold standard
Posted on December 21, 2008
Yvo M Smulders
VU University Medical Center
Conflict of Interest: None Declared

We read with interest the paper by Glasziou and colleagues, addressing criteria for new diagnostic tests to become reference standards for specific diseases [1]. We believe the emphasis in the evaluation of new tests should not be on how to incorporate them in the existing diagnostic framework, but also, and increasingly more, on their potential to alter disease classification, in particular severity grading and taxonomy. The term "˜reference (gold) standard' distracts from these opportunities and its use is best increasingly avoided.

A new test can alter diagnosis in 3 ways. Firstly, it might diagnose exactly the same disease with better accuracy, in which case it qualifies as a new reference test. Secondly, the test may detect the disease in a different (usually earlier) stage, in which case it essentially reclassifies disease based on severity. Finally, a new test might detect abnormalities which change existing disease taxonomy. Arguably, the second and third effects of new tests are more important for progress in medicine than the first.

Traditionally, we think of disease as being absent or present. Based on this binary perception of diagnosis, we calculate properties for diagnostic tests, such as sensitivity and specificity. These properties enable us to use the Bayesian approach to calculate what we believe are precise probabilities of a specific disease being present. It is in this context where the traditional concepts of gold standard and reference tests apply. In reality, however, many if not most diseases are continuous spectra ranging from slightly abnormal to traditional classifying diagnoses. New diagnostic tests, which generally pick up milder abnormalities, reveal the continuous nature of disease, and our response should be not to calculate diagnostic properties of these novel tests, but rather to let go of our binary perception of disease. Many diseases could serve as an example for this. Myocardial infarction is by many still considered as a binary diagnosis, but novel biochemical and imaging techniques have shown a continuous range of effects of ischemia on cardiomyocytes with a gradual decrease in reversibility. The same is true for almost every disease you can think of, ranging from asthma (a gradual increase in airway reactivity) to cancer (progressive dysplasia and invasive growth). Rather than quarrelling about diagnostic properties of new tests or calling for more precise nominal definitions of disease, a more challenging and fruitful approach would be to acknowledge that novel test information confronts us with the continuous nature of disease.

In addition, new diagnostic tests can alter our view on aetiology or pathogenesis of a disease. This sometimes leads to fundamental changes in disease taxonomy (e.g. peptic ulcer from a stress-induced to an infectious disease by Helicobacter Pylori testing).

If we think outside the box of traditional diagnostic convention, results of new tests, provided they are reproducible and prove to be prognostically or therapeutically relevant, in fact become new diagnoses themselves. The new diagnoses fill previously unexplored parts of the disease spectrum, or change existing disease taxonomy. By embracing this approach to interpretation of new test results, we slowly let go of old diagnoses. Consequentially, we must be willing to say goodbye to the concept of the reference (gold) standard, since the reference disease may no longer exist. This puts us on the track layed out for us 15 years ago in this journal by Alvin Feinstein [2]. He argued that it is not our lack of technical progress, but rather our inflexibility in disease taxonomy and classification which represents the main impedement to progress in medicine.


1. Glasziou P, Irwig L, Deeks JJ. When should a new test become the current reference standard? Ann Intern Med 2008;149:816-821

2. A.R. Feinstein, Clinical judgement revisited: the distraction of quantitative models, Ann Intern Med 1994; 120: 799-805

Conflict of Interest:

None declared

Keep the reference standard, but add spectrum
Posted on January 29, 2009
Paul Glasziou
University of Oxford
Conflict of Interest: None Declared

While we agree with Dr Smulders that researchers and clinicians should pay more attention to the changing spectrum that may occur with new tests, we think that the ideas of reference standard and diagnosis will remain useful. Some new tests do clearly alter the spectrum of illness, which is why we emphasized the need to consider the clinical consequences of such additional cases. Indeed, we have been particularly concerned about the detection of "inconsequential" disease, that is test-detected diseases that will have no impact on the patient during their life(1).

However, not all new tests purport to be more accurate or detect different spectrums of disease. Many are simply safer, less invasive, quicker or cheaper than the current best test. Hence we will always need to have a reference standard against which we should evaluate these tests. Discarding the idea of the "reference standard" would hinder our ability to evaluate these new tests. Discarding the idea of "disease" is also problematic. While disease may be a (multidimensional) spectrum, there is rarely a "continuous spectrum" of therapeutic options but a limited set of discrete options. Decisions require two elements: first the dichotomous diagnostic category, and second the subdivision into distinct groups. These are shortcuts to allow the best treatment allocations that maximise benefit whilst minimise harm. Interestingly all Dr Smulders examples involve this initial disease dichotomy plus a spectrum as a subsidiary element. The middle ground here is to retain diagnostic categories, but to be more aware that disease is not homogeneous, but requires subdivision or quantification of degree to guide decision making.


1. Irwig L, Houssami N, Armstrong B, Glasziou P. Evaluating new screening tests for breast cancer. BMJ. 2006;332(7543):678-9.

Conflict of Interest:

None declared

Submit a Comment/Letter

Summary for Patients

Clinical Slide Sets

Terms of Use

The In the Clinic® slide sets are owned and copyrighted by the American College of Physicians (ACP). All text, graphics, trademarks, and other intellectual property incorporated into the slide sets remain the sole and exclusive property of the ACP. The slide sets may be used only by the person who downloads or purchases them and only for the purpose of presenting them during not-for-profit educational activities. Users may incorporate the entire slide set or selected individual slides into their own teaching presentations but may not alter the content of the slides in any way or remove the ACP copyright notice. Users may make print copies for use as hand-outs for the audience the user is personally addressing but may not otherwise reproduce or distribute the slides by any means or media, including but not limited to sending them as e-mail attachments, posting them on Internet or Intranet sites, publishing them in meeting proceedings, or making them available for sale or distribution in any unauthorized form, without the express written permission of the ACP. Unauthorized use of the In the Clinic slide sets will constitute copyright infringement.


Buy Now for $32.00

to gain full access to the content and tools.

Want to Subscribe?

Learn more about subscription options

Related Articles
Related Point of Care
Topic Collections
PubMed Articles
Forgot your password?
Enter your username and email address. We'll send you a reminder to the email address on record.