The full content of Annals is available to subscribers

Subscribe/Learn More  >
Articles |

A Randomized Trial of Ways To Describe Test Accuracy: The Effect on Physicians' Post-Test Probability Estimates

Milo A. Puhan, MD; Johann Steurer, MD, MME; Lucas M. Bachmann, MD, PhD; and Gerben ter Riet, MD, PhD
[+] Article, Author, and Disclosure Information

From the University of Zurich, Zurich, Switzerland.

Acknowledgments: The authors thank Dr. Otto Brändli for the opportunity to conduct this study during the congress of the Zurich Lung League in Davos, Switzerland.

Grant Support: By the Helmut Horten Foundation. Dr. Bachmann's work was supported by the Swiss National Science Foundation (grants 3233B0-103182 and 3200B0-103183).

Potential Financial Conflicts of Interest: None disclosed.

Requests for Single Reprints: Milo A. Puhan, MD, Horten Centre, University Hospital of Zurich, Postfach Nord, 8091 Zurich, Switzerland; e-mail, milo.puhan@evimed.ch.

Current Author Addresses: Drs. Puhan and Steurer: Horten Centre, University Hospital of Zurich, Postfach Nord, 8091 Zurich, Switzerland.

Dr. Bachmann: University of Berne, Department of Social and Preventive Medicine, Finkenhubelweg 11, 3012 Berne, Switzerland.

Dr. ter Riet: Academic Medical Center, Deptment of General Practice, Room J2-118, 1105 AZ Amsterdam, the Netherlands.

Author Contributions: Conception and design: M.A. Puhan, J. Steurer, G. ter Riet.

Analysis and interpretation of the data: M.A. Puhan, L.M. Bachmann, J. Steurer, G. ter Riet.

Drafting of the article: M.A. Puhan.

Critical revision of the article for important intellectual content: L.M. Bachmann, J. Steurer, G. ter Riet.

Final approval of the article: M.A. Puhan, L.M. Bachmann, J. Steurer, G. ter Riet.

Provision of study materials or patients: M.A. Puhan.

Statistical expertise: M.A. Puhan, G. ter Riet.

Collection and assembly of data: M.A. Puhan.

Ann Intern Med. 2005;143(3):184-189. doi:10.7326/0003-4819-143-3-200508020-00004
Text Size: A A A

We randomly assigned experienced specialists in family and internal medicine attending a continuing medical education conference to complete 1 of 3 self-administered questionnaires that presented vignettes representative of scenarios commonly encountered early in the diagnostic work-up. The vignettes differed only in how diagnostic test accuracy was presented. The questionnaires were sealed within 576 envelopes that had been randomized in 32 blocks of 18 to avoid order effects (6 different vignette orders for each of the 3 test accuracy presentation formats). We distributed the envelopes before participants entered the lecture hall so that we could not predict which participant received which envelope, thus concealing the randomization. The moderator of the lecture (which was unrelated to material presented in the vignettes) advised those present to complete the questionnaire during small breaks or at the end of the lecture. We collected 183 questionnaires 15 minutes after the lecture and excluded 5 questionnaires that were sent to us later.

First Page Preview

View Large
First page PDF preview





Citing articles are presented as examples only. In non-demo SCM6 implementation, integration with CrossRef’s "Cited By" API will populate this tab (http://www.crossref.org/citedby.html).


Submit a Comment/Letter
Against the odds
Posted on August 5, 2005
Daniel J. Brotman
Johns Hopkins Hospital
Conflict of Interest: None Declared

Puhan and colleagues deserve praise for their creative assessment of how well physicians interpret diagnostic test results, but I disagree with their conclusion that likelihood ratios are no more informative than sensitivity and specificity. The authors asked clinicians to estimate the post-test probabilities of various conditions based upon pre-test probabilities and diagnostic test results. The operating characteristics of each diagnostic test were provided in terms of likelihood ratios, sensitivity/specificity, or a graphic display. There are 2 problems with this approach.

First, familiarity with the method is required. When I finished medical school in the 1990s, I had been taught to think in probabilities (sensitivity, specificity and predictive values). In contrast, I was taught to think in odds (likelihood ratios) only later in my career, during biostatistical training. To assess the relative merits of likelihood ratios (versus sensitivity and specificity) on the basis of how well clinicians know how to use them is like determining the utility of the metric system based on whether New Englanders can more accurately estimate the length of their strides in inches versus centimeters.

Second, the scenarios the authors formulated failed to expose the most serious conceptual error surrounding sensitivity and specificity. Clinicians are often not taught that both sensitivity and specificity are needed to assess post-test probability (whether the test result is positive or negative). Indeed, many medical students are taught that high sensitivity "rules out" a diagnosis if the test is negative and that high specificity "rules in" the diagnosis when the test is positive. The limitations of this rule-of-thumb are exposed by an inexpensive and quite versatile laboratory test that I have created. It has 97.2% sensitivity for pulmonary emboli, myocardial infarctions, and even erectile dysfunction. It is called the 2-dice test. I roll 2 dice, each with 6 sides, and add together the values showing. Anything 3 or higher is a positive test result. The problem is that the specificity is only 2.8%. Had Puhan et al presented a hypothetical scenario in which a test has 97.2% sensitivity and 2.8% specificity, the pre-test probability of disease was 50%, and the test was negative, I suspect that many of the physicians would have deemed the diagnosis very unlikely. In contrast, present the same physicians with a test that has a negative likelihood ratio of 1.0, and they will not be fooled. These same physicians are out there misinterpreting negative D-dimer tests in critically ill patients (1).

1. Brotman DJ, Segal JB, Jani JT, Petty BG, Kickler TS. Limitations of D-dimer testing in unselected inpatients with suspected venous thromboembolism. Am J Med. 2003;114(4):276-82.

Conflict of Interest:

None declared

Almost no room for improvement!
Posted on August 17, 2005
Mike Broce
CAMC Institute
Conflict of Interest: None Declared

We reviewed with interest the work published by Dr. Putan and his group in the August 2, 2005 issue of this journal. We agree that it is essential for physicians to interpret the real value of diagnostic testing to confirm clinical suspicions for a better practice of medicine to occur. After reading their conclusions, we believe that replication of their results is needed, perhaps considering our suggestions. These suggestions are not intended to diminish the findings of what we consider is an excellent study.

First, although a table for the clinical vignettes was provided, it was not clear if equations to calculate illness probability changes were provided to surveyed physicians. Perhaps physicians are less likely to remember complex equations not commonly used in clinical practice. If equations were not available to the physicians, then the authors could have been testing knowledge and recall of bio-statistical methods rather than the ability to calculate post-test probability.

Next, even though the researchers were able to determine if actual calculations had been made, we speculate that the authors were not able to determine the reason(s) for not providing the correct post-test probability. Was it because the physicians simply did not know how to do the calculations (therefore they guessed the answer), or was it because they did not agree with the logic of the diagnostic testing? If the latter is true, then it is likely that the physicians based their answers on what they think the post-test probability would be regardless of the testing.

Regarding the survey instrument, we postulate to avoid mixing test results with the findings of physical exams or medical histories. The aim of this suggestion is simply to avoid confusing scenarios that could possibly influence the results of any post-test probability calculations. Furthermore, in order to reduce unexplained errors, we recommend selecting a team of medical experts (familiar with the medical conditions of interest) to help design and validate the instrument before implementation. Along these same lines, after survey construction, it could be wise to test the validity and reliability of the instrument before administering it to a survey group. Thus, any conclusions or generalizations about research findings would be sound.

Finally, for researchers who wish to replicate this or a similarly designed study, detailed information about the methods and procedures, especially participant instructions and a copy of the actual survey instrument included in the manuscript would be most beneficial.

Conflict of Interest:

None declared

Reply to letters
Posted on October 1, 2005
Milo A. Puhan
Horten Centre, University Hospital of Zurich, Switzerland
Conflict of Interest: None Declared


Drs. Broce and Reyes state that it remains unclear how the surveyed physicians derived posttest probabilities. They wondered whether we provided the relevant equations and whether we recorded if participants calculated or guessed posttest probabilities. In fact, we did neither. Thus we could not determine how the physicians arrived at their posttest probabilities. However, our assumption based on experience and research, was that the vast majority of physicians do not formally calculate posttest probabilities but use quantitative information about a test's informativeness in an inexact way.(1;2) Along the same line, we developed the inexact numerical graphical format. The setting of our trial, a lecture hall at a continuous medical education conference, was not conducive to study the physicians' cognitive processes. However, we would welcome any studies investigating physicians' cognitive processes when they are confronted with quantitative information about a test's informativeness.

Dr. Brotman argues that our study did not test if at extreme combinations of sensitivities and specificities, for example a sensitivity of 0.97 and a specificity of 0.03, the likelihood ratio (equal to 1) was the superior measure of association. His hypothesis might be correct, but in commonly encountered diagnostic situations these extreme values of sensitivity and specificity are rare. We decided to present vignettes of more common clinical scenarios. Nevertheless, in our vignettes 1 and 4, which are closest to the situation that Dr. Brotman had preferred (sensitivity-specificity combinations of 0.93, 0.45 (LR=1.7), and 0.40, 0.79 (LR=0.8), respectively), the differences between the 2 numerical formats on posttest probability estimates were negligible.

We agree that relevant experts should be involved in the design of survey instruments before administering them. Therefore we pilot tested and revised our vignettes with the help of 21 internists. We cannot exclude that a more sophisticated development process might have resulted in better vignettes. We agree that the vignettes' test-retest reliability could, and perhaps should, have been tested before use. However, we are not sure how the validity of the vignettes might be assessed beyond face validity through the eyes of experienced clinicians.

We welcome the suggestions for further studies aimed at refuting our findings while taking on board additional methodological aspects as pointed out by Drs. Broce, Reyes, and Brotman. Or in the spirit of Karl Popper: design carefully, aim to refute in order to be able to corroborate convincingly.

For anyone interested, a copy of our questionnaire is available from the corresponding author.

From the University Hospital of Zurich, Horten Centre, University Hospital, Postfach Nord, CH-8091 Zurich, Switzerland and the Department of General Practice at the Academic Medical Center, Amsterdam, The Netherlands. Potential Financial Conflicts of Interest: None disclosed.


(1) Reid MC, Lane DA, Feinstein AR. Academic calculations versus clinical judgments: practicing physicians' use of quantitative measures of test accuracy. Am J Med. 1998;104:374-80.

(2) Steurer J, Fischer JE, Bachmann LM, Koller M, ter Riet G. Communicating accuracy of tests to general practitioners: a controlled study. BMJ. 2002;324:824-26.

Conflict of Interest:

None declared

Submit a Comment/Letter

Summary for Patients

Clinical Slide Sets

Terms of Use

The In the Clinic® slide sets are owned and copyrighted by the American College of Physicians (ACP). All text, graphics, trademarks, and other intellectual property incorporated into the slide sets remain the sole and exclusive property of the ACP. The slide sets may be used only by the person who downloads or purchases them and only for the purpose of presenting them during not-for-profit educational activities. Users may incorporate the entire slide set or selected individual slides into their own teaching presentations but may not alter the content of the slides in any way or remove the ACP copyright notice. Users may make print copies for use as hand-outs for the audience the user is personally addressing but may not otherwise reproduce or distribute the slides by any means or media, including but not limited to sending them as e-mail attachments, posting them on Internet or Intranet sites, publishing them in meeting proceedings, or making them available for sale or distribution in any unauthorized form, without the express written permission of the ACP. Unauthorized use of the In the Clinic slide sets will constitute copyright infringement.


Buy Now for $32.00

to gain full access to the content and tools.

Want to Subscribe?

Learn more about subscription options

Related Articles
Related Point of Care
Topic Collections
Forgot your password?
Enter your username and email address. We'll send you a reminder to the email address on record.