0
Editorials |

Raising the Bar for the U.S. Preventive Services Task Force FREE

Peter B. Bach, MD, MAPP
[+] Article and Author Information

This article was published online first at www.annals.org on 31 December 2013.


From Memorial Sloan-Kettering Cancer Center, New York, New York.

Acknowledgment: The author thanks Geoffrey Schnorr, BS, from the Memorial Sloan-Kettering Cancer Center, who provided research, editorial, and administrative assistance.

Potential Conflicts of Interest: None disclosed. The form can be viewed at www.acponline.org/authors/icmje/ConflictOfInterestForms.do?msNum=M13-2926.

Requests for Single Reprints: Peter B. Bach, MD, MAPP, 1275 York Avenue, New York, NY 10065.


Ann Intern Med. 2014;160(5):365-366. doi:10.7326/M13-2926
Text Size: A A A

Since its inception in 1984, the U.S. Preventive Services Task Force (USPSTF) has acted as a highly credible opinion-rendering body. It has sometimes carried the day and sometimes been drowned out. It has never recommended prostate-specific antigen screening for prostate cancer (1), but physicians keep ordering the test; when it proposed reducing the frequency and increasing the age of mammography screening in 2009, it was vilified, then ignored.

A 2008 law gave Medicare the power (but not the requirement) to add new preventive services if the Task Force gave them an “A” or a “B” (2). In the 2010 Patient Protection and Affordable Care Act, those services that the Task Force recommended with an “A” or a “B” earned waivers of copayments and deductibles in Medicare and private insurance. New “A” or “B” recommendations will be included in the annually updated standards from the U.S. Department of Health and Human Services for private health plans. Medicare will not face the same mandate.

The USPSTF's expanded role since passage of the Affordable Care Act and this latest recommendation on lung cancer screening (3) provide an opportunity to take stock of the Task Force's processes. Many things are to be commended. The committee membership is broadly representative, and the evidence reviews that underlie the recommendations are comprehensive and unbiased. However, the Task Force could break out its recommendations and the grades that accompany them to the level of granularity that the available evidence enables. It could also be more cautious about relying on modeling data to fill in gaps in the evidence, particularly when the models do not match the empirical data that are available.

Today's grading system considers both the magnitude of the net benefit delivered by the service and the certainty of that estimate. An “A” only goes to those services for which there is high certainty that the net benefit is large. A “B” is earned when there is high certainty of a moderate net benefit, moderate certainty of a high net benefit, or even only moderate certainty of a moderate net benefit. Lung cancer screening fell into the last category.

However, the expected degree of net benefit or level of certainty about the evidence is rarely uniform, even for selected populations. In lung cancer screening, even among persons who are deemed to be “high-risk” and were eligible for the NLST (National Lung Screening Trial), there is a predictable and broad spectrum of both anticipated benefit and anticipated benefit–harm tradeoff (what the Task Force would call “net benefit”) (4). Across quintiles of lung cancer risk within the NLST, the number of participants who needed to be screened to prevent a lung cancer death, which is a measure of the probability of benefit for a person, varied by 33-fold from the lowest- to highest-risk group (5276 vs. 161 needed to screen). The number of false-positive results per prevented lung cancer death, which is a measure of the expected benefit–harm tradeoff for a person, varied 25-fold from 1648 false-positive results per prevented death to only 65 (5). Perhaps the high-risk group should have qualified for an “A”; perhaps the latter should get only a “C,” a service that should be only selectively offered.

Then there is the matter of the Task Force relying heavily on disease state models to extrapolate beyond the empirical data (6). On the basis of models, the Task Force chose to lengthen the duration of screening to a maximum of 26 years and increase the upper age of eligibility for screening to 80 years, even though NLST participants were screened for only 3 years and were ineligible to enroll if they were older than 74 years (only 8.8% of participants were aged 70 years or older at enrollment) (7). This may be appropriate, but here, too, the grading of this extrapolation should match the low level of evidence supporting it. The American College of Chest Physicians grades extrapolations outside of studied populations as a “C” (8). Most hierarchies of evidence would place modeling studies, even those with great rigor, in the category of expert opinion, the lowest level of evidence.

In this specific case, I found the Task Force's reliance on the modeling dismaying, particularly now that its “B” rating will be converted into insurance mandates. Lung cancer is a poorly understood and highly heterogeneous condition. Even the highly accomplished Cancer Intervention and Surveillance Modeling Network (CISNET) researchers who generated the models do not seem to have been able to generate models of lung cancer that parallel its natural history or simulate the empirical pattern of benefit seen from computed tomography screening in the NLST (6). A cumulative plot of the lung cancer mortality ratio between computed tomography and chest radiography screening (see Appendix Figure 2 in the article by de Koning and colleagues [6]) seen in the NLST is flat at an approximate 20% benefit. Some CISNET models predict increasing benefits, and others predict decreasing benefits. The models only match the data approximately at the 6-year time point, and this is because they were calibrated to do so.

Seeing this, the Task Force might have stopped short of relying on these models for extrapolation well beyond the empirical data. In addition, it might have considered how little is known about the net benefit of screening annually over many years. Benefits may increase, plateau, or decrease; the harms from false-positive results may decrease per year of screening, but overdiagnosis would be expected to compound (910).

Likewise, using the models to estimate the magnitude of the benefit or benefit–harm tradeoff under different screening scenarios seems problematic. Between the models, an important estimate of benefit of computed tomography screening, the number of life-years gained per 100 000 persons, ranged from 2020 to 10 153 (6). An important measure of harm, the number of persons overdiagnosed with lung cancer, varied almost 6-fold (from 72 to 426) (6). The Task Force seems to have looked for findings where there was “consensus” between the models as a way of overcoming the heterogeneity between them. However, because they are starkly different on so many fronts, looking only for the overlap is reminiscent of the Texas sharpshooter and the fallacy that accompanies him. The sharpshooter shoots first at the barn and then draws the target around the greatest cluster of hits.

References

Moyer VA, U.S. Preventive Services Task Force. Screening for prostate cancer: U.S. Preventive Services Task Force recommendation statement. Ann Intern Med. 2012; 157:120-34.
PubMed
 
Pub L No. 110-275, 122 Stat 2494.
 
Moyer VA, U.S. Preventive Services Task Force. Screening for lung cancer: U.S. Preventive Services Task Force recommendation statement. Ann Intern Med. 2014..
 
Bach PB, Gould MK. When the average applies to no one: personalized decision making about potential benefits of lung cancer screening. Ann Intern Med. 2012; 157:571-3.
PubMed
CrossRef
 
Kovalchik SA, Tammemagi M, Berg CD, Caporaso NE, Riley TL, Korch M, et al. Targeting of low-dose CT screening according to the risk of lung-cancer death. N Engl J Med. 2013; 369:245-54.
PubMed
CrossRef
 
de Koning HJ, Meza R, Plevritis SK, ten Haaf K, Munshi VN, Jeon J, et al. Benefits and harms of computed tomography lung cancer screening strategies: a comparative modeling study for the U.S. Preventive Services Task Force. Ann Intern Med. 2014..
 
Aberle DR, Adams AM, Berg CD, Black WC, Clapp JD, Fagerstrom RM, et al, National Lung Screening Trial Research Team. Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med. 2011; 365:395-409.
PubMed
 
Atkins D, Eccles M, Flottorp S, Guyatt GH, Henry D, Hill S, et al, GRADE Working Group. Systems for grading the quality of evidence and the strength of recommendations I: critical appraisal of existing approaches The GRADE Working Group. BMC Health Serv Res. 2004; 4:38.
PubMed
CrossRef
 
Bach PB, Mirkin JN, Oliver TK, Azzoli CG, Berry DA, Brawley OW, et al. Benefits and harms of CT screening for lung cancer: a systematic review. JAMA. 2012; 307:2418-29.
PubMed
CrossRef
 
Patz EF Jr, Pinsky P, Gatsonis C, Sicks JD, Kramer BS, Tammemägi MC, et al, for the NLST Overdiagnosis Manuscript Writing Team. Overdiagnosis in Low-Dose Computed Tomography Screening for Lung Cancer. JAMA Intern Med. 2013..
PubMed
 

Figures

Tables

References

Moyer VA, U.S. Preventive Services Task Force. Screening for prostate cancer: U.S. Preventive Services Task Force recommendation statement. Ann Intern Med. 2012; 157:120-34.
PubMed
 
Pub L No. 110-275, 122 Stat 2494.
 
Moyer VA, U.S. Preventive Services Task Force. Screening for lung cancer: U.S. Preventive Services Task Force recommendation statement. Ann Intern Med. 2014..
 
Bach PB, Gould MK. When the average applies to no one: personalized decision making about potential benefits of lung cancer screening. Ann Intern Med. 2012; 157:571-3.
PubMed
CrossRef
 
Kovalchik SA, Tammemagi M, Berg CD, Caporaso NE, Riley TL, Korch M, et al. Targeting of low-dose CT screening according to the risk of lung-cancer death. N Engl J Med. 2013; 369:245-54.
PubMed
CrossRef
 
de Koning HJ, Meza R, Plevritis SK, ten Haaf K, Munshi VN, Jeon J, et al. Benefits and harms of computed tomography lung cancer screening strategies: a comparative modeling study for the U.S. Preventive Services Task Force. Ann Intern Med. 2014..
 
Aberle DR, Adams AM, Berg CD, Black WC, Clapp JD, Fagerstrom RM, et al, National Lung Screening Trial Research Team. Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med. 2011; 365:395-409.
PubMed
 
Atkins D, Eccles M, Flottorp S, Guyatt GH, Henry D, Hill S, et al, GRADE Working Group. Systems for grading the quality of evidence and the strength of recommendations I: critical appraisal of existing approaches The GRADE Working Group. BMC Health Serv Res. 2004; 4:38.
PubMed
CrossRef
 
Bach PB, Mirkin JN, Oliver TK, Azzoli CG, Berry DA, Brawley OW, et al. Benefits and harms of CT screening for lung cancer: a systematic review. JAMA. 2012; 307:2418-29.
PubMed
CrossRef
 
Patz EF Jr, Pinsky P, Gatsonis C, Sicks JD, Kramer BS, Tammemägi MC, et al, for the NLST Overdiagnosis Manuscript Writing Team. Overdiagnosis in Low-Dose Computed Tomography Screening for Lung Cancer. JAMA Intern Med. 2013..
PubMed
 

Letters

NOTE:
Citing articles are presented as examples only. In non-demo SCM6 implementation, integration with CrossRef’s "Cited By" API will populate this tab (http://www.crossref.org/citedby.html).

Comments

Submit a Comment
Reply to Bach’s editorial
Posted on February 7, 2014
Harry J. de Koning, MD, PhD
Erasmus MC
Conflict of Interest: None Declared
The recent U.S. Preventive Services Task Force recommendation on lung cancer screening represents a major synthesis of trial evidence, model-based outcomes, and expert judgment to quantify the trade-offs of CT screening for the millions of people at high risk for lung cancer (1, 2). Dr. Bach, however, states in his editorial that one could have been more cautious about relying on modeling for extrapolation well beyond the empirical data to fill in gaps in the evidence (3). He questions the net benefit of screening annually over many years. Does he mean there is only evidence to screen 3 times, as was done in the NLST? Does he mean women should only receive 5 breast cancer screens, since this was the average number of screens in the breast-screening trials?
Randomized trials are set up to prove efficacy of an intervention. In translating that evidence to public health, much more is needed, especially to estimate the long-term benefits and harms for the target population. In fact many would argue that modeling is essential to make that translation from trials to population guidelines (4), particularly as we face an ever-increasing pace of technology where questions far outpace our ability to conduct multiple trials.
Our model-based analyses (2) required a joint consideration of numerous factors, including smoking-dose response, and age-specific incidence and other cause mortality by smoking behavior and birth cohort. These factors were superimposed onto over 1,000 schedules of screening examinations using the NLST as a guide, something too complex to evaluate without aid of a model. Dr. Bach, however, states that we were unable to generate models that parallel the natural history of lung cancer and that our models produced inconsistent mortality benefits in reproducing the early years of NLST. It should not be surprising that model variability would produce results that differ in the early years when event rates are low and variability large. Even Data Monitoring Committees place low value on the early years. Dr. Bach also points out that the models differed in their predictions of the absolute number of cases and deaths prevented. Absolute counts have considerable natural variability, and are more difficult to estimate accurately, but in the ranking of competing scenarios, all five models rank the 27 scenarios consistently. Moreover, the models reproduce the outcomes observed in the trials (5), and we showed the range of absolute effects in the table on harms and benefits of the advantageous scenario.
At the end Dr. Bach mentions the term sharpshooter. Although it is a dismaying example, the analogue with the sharpshooter is striking. With the models, we indeed draw the target around the greatest cluster of data: based on 200,000 persons enrolled in the screening trials, we can give the best estimate about the screen-detectable preclinical period, test sensitivity, and improvements in prognosis by screening and early treatment, by gender, age and histology. We therefore hope that clinical researchers will engage more closely with modelers and contribute to deliberations about the best use of models, with a deeper understanding of the model development and validation process.

Harry J. de Koning, R. Meza & S.K. Plevritis

References
1. Moyer VA, on behalf of the U.S. Preventive Services Task Force. Screening for Lung Cancer: U.S. Preventive Services Task Force Recommendation Statement. Ann Intern Med. Published online 31 December 2013 doi:10.7326/M13-2771
2. Koning HJ de, Meza R, Plevritis SK, ten Haaf K, Munshi VN, Jeon J, Erdogan SA, Kong CY, Han SS, van Rosmalen J, Choi SE, Pinsky PF, Berrington de Gonzalez A, Berg ChD, Black WC, Tammemägi MC, Hazelton WD, Feuer EJ, McMahon P. Benefits and Harms of Computed Tomography Lung Cancer Screening Strategies: A Comparative Modeling Study for the U.S. Preventive Services Task Force. Ann Intern Med. Published online 31 December 2013 doi:10.7326/M13-2316
3. Bach PB. Editorial. Raising the Bar for the U.S. Preventive Services Task Force. Ann Intern Med. Published online 31 December 2013 doi:10.7326/M13-2926
4. Heijnsdijk EA, Wever EM, Auvinen A, Hugosson J, Ciatto S, Nelen V, Kwiatkowski M, Villers A, Páez A, Moss SM, Zappa M, Tammela TL, Mäkinen T, Carlsson S, Korfage IJ, Essink-Bot ML, Otto SJ, Draisma G, Bangma CH, Roobol MJ, Schröder FH, de Koning HJ. Quality-of-life effects of prostate-specific antigen screening. N Engl J Med. 2012 Aug 16;367(7):595-605. doi: 10.1056/NEJMoa1201637.
5. Meza R, ten Haaf K, Kong CY, Erdogan A, Hazelton WD, Black W, Tammemagi M, Choi S, Jeon J, Han S, Munshi V, van Rosmalen J, Pinsky P, McMahon PM, de Koning H, Feuer EJ, Hazelton WD, Plevritis SK (In Press). Comparative analysis of five lung cancer natural history and screening models that reproduce outcomes of the NLST and PLCO trial. Cancer, in press.

Submit a Comment

Summary for Patients

Clinical Slide Sets

Terms of Use

The In the Clinic® slide sets are owned and copyrighted by the American College of Physicians (ACP). All text, graphics, trademarks, and other intellectual property incorporated into the slide sets remain the sole and exclusive property of the ACP. The slide sets may be used only by the person who downloads or purchases them and only for the purpose of presenting them during not-for-profit educational activities. Users may incorporate the entire slide set or selected individual slides into their own teaching presentations but may not alter the content of the slides in any way or remove the ACP copyright notice. Users may make print copies for use as hand-outs for the audience the user is personally addressing but may not otherwise reproduce or distribute the slides by any means or media, including but not limited to sending them as e-mail attachments, posting them on Internet or Intranet sites, publishing them in meeting proceedings, or making them available for sale or distribution in any unauthorized form, without the express written permission of the ACP. Unauthorized use of the In the Clinic slide sets will constitute copyright infringement.

Toolkit

Want to Subscribe?

Learn more about subscription options

Advertisement
Related Articles
Related Point of Care
Topic Collections
Forgot your password?
Enter your username and email address. We'll send you a reminder to the email address on record.
(Required)
(Required)