0
Academia and the Profession |

Cost-Effectiveness of Alternative Management Strategies for Patients with Solitary Pulmonary Nodules FREE

Michael K. Gould, MD, MS; Gillian D. Sanders, PhD; Paul G. Barnett, PhD; Chara E. Rydzak, BA; Courtney C. Maclean, BA; Mark B. McClellan, MD, PhD; and Douglas K. Owens, MD, MS
[+] Article and Author Information

From Veterans Affairs Palo Alto Health Care System, Palo Alto, California, and Stanford University, Stanford, California.


Disclaimer: The views expressed in this article are those of the authors and do not necessarily represent the views of the Department of Veterans Affairs.

Acknowledgments: The authors thank Alan M. Garber, MD, PhD, and James Jett, MD, for reviewing previous versions of this manuscript.

Grant Support: Drs. Gould and Owens received Career Development Awards from the Department of Veterans Affairs, Veterans Health Administration, Health Services Research and Development Service. This research was also supported by the Veterans Affairs Cooperative Studies Program, project no. 27: “18-Fluorodeoxyglucose (FDG) Positron Emission Tomography (PET) Imaging in the Management of Patients with Solitary Pulmonary Nodules.”

Potential Financial Conflicts of Interest: None disclosed.

Requests for Single Reprints: Michael K. Gould, MD, MS, Pulmonary Section (111P), Veterans Affairs Palo Alto Health Care System, 3801 Miranda Avenue, Palo Alto, CA 94304; e-mail, gould@stanford.edu.

Current Author Addresses: Dr. Gould: Pulmonary Section (111P), Veterans Affairs Palo Alto Health Care System, 3801 Miranda Avenue, Palo Alto, CA 94304.

Drs. Sanders, McClellan, and Owens and Ms. Rydzak: Center for Primary Care and Outcomes Research/Center for Health Policy, Stanford University, 117 Encina Commons, Stanford, CA 94305-6019.

Dr. Barnett: Veterans Affairs Palo Alto Health Care System, 795 Willow Road (152), Menlo Park, CA 94025.

Ms. Maclean: 2614 Cedar Creek Drive, Durham, NC 27705.


Ann Intern Med. 2003;138(9):724-735. doi:10.7326/0003-4819-138-9-200305060-00009
Text Size: A A A

Background: Positron emission tomography (PET) with 18-fluorodeoxyglucose (FDG) is a potentially useful but expensive test to diagnose solitary pulmonary nodules.

Objective: To evaluate the cost-effectiveness of strategies for pulmonary nodule diagnosis and to specifically compare strategies that did and did not include FDG-PET.

Design: Decision model.

Data Sources: Accuracy and complications of diagnostic tests were estimated by using meta-analysis and literature review. Modeled survival was based on data from a large tumor registry. Cost estimates were derived from Medicare reimbursement and other sources.

Target Population: All adult patients with a new, noncalcified pulmonary nodule seen on chest radiograph.

Time Horizon: Patient lifetime.

Perspective: Societal.

Intervention: 40 clinically plausible combinations of 5 diagnostic interventions, including computed tomography, FDG-PET, transthoracic needle biopsy, surgery, and watchful waiting.

Outcome Measures: Costs, quality-adjusted life-years (QALYs), and incremental cost-effectiveness ratios.

Results of Base-Case Analysis: The cost-effectiveness of strategies depended critically on the pretest probability of malignancy. For patients with low pretest probability (26%), strategies that used FDG-PET selectively when computed tomography results were possibly malignant cost as little as $20 000 per QALY gained. For patients with high pretest probability (79%), strategies that used FDG-PET selectively when computed tomography results were benign cost as little as $16 000 per QALY gained. For patients with intermediate pretest probability (55%), FDG-PET strategies cost more than $220 000 per QALY gained because they were more costly but only marginally more effective than computed tomography-based strategies.

Results of Sensitivity Analysis: The choice of strategy also depended on the risk for surgical complications, the probability of nondiagnostic needle biopsy, the sensitivity of computed tomography, and patient preferences for time spent in watchful waiting. In probabilistic sensitivity analysis, FDG-PET strategies were cost saving or cost less than $100 000 per QALY gained in 76.7%, 24.4%, and 99.9% of computer simulations for patients with low, intermediate, and high pretest probability, respectively.

Conclusions: FDG-PET should be used selectively when pretest probability and computed tomography findings are discordant or in patients with intermediate pretest probability who are at high risk for surgical complications. In most other circumstances, computed tomography-based strategies result in similar quality-adjusted life-years and lower costs.

The solitary pulmonary nodule is a single, well-circumscribed, spherical radiographic opacity that measures less than 3 to 4 cm in diameter and is surrounded completely by aerated lung (1). There is no associated atelectasis, hilar enlargement, or pleural effusion. Most pulmonary nodules are discovered incidentally on chest radiographs, and 15% to 75% of such nodules are malignant, depending on the population studied (23). Patients with pulmonary nodules and their physicians confront difficult decisions about the risks and rewards of different management strategies. When present, malignancy must be promptly identified to permit timely resection.

Pulmonary nodule evaluation typically begins with imaging studies. Computed tomography (CT) localizes the nodule within the lung parenchyma, and CT density characteristics sometimes indicate occult calcification that suggests a benign cause (4). Other CT findings, such as spiculation, are strongly associated with malignancy (5). Positron emission tomography (PET) with the glucose analogue 18-fluorodeoxyglucose (FDG) identifies malignant tumors on the basis of their increased metabolic rate. The use of FDG-PET is rapidly gaining acceptance in clinical oncology to diagnose tumors, stage disease, and evaluate treatment response (67). Because FDG-PET is believed to be highly sensitive for identifying malignant nodules, proponents argue that observation with serial chest radiographs is safe when PET results are negative (8).

Management alternatives for patients with pulmonary nodules include surgical resection, transthoracic needle biopsy, and watchful waiting (9). Surgery is the diagnostic gold standard and the definitive treatment for malignant nodules that are resectable, but surgery should be avoided in patients with benign nodules. Needle biopsy often establishes a specific malignant or benign diagnosis, but biopsy is invasive, potentially risky, and frequently nondiagnostic. Watchful waiting avoids unnecessary surgery for benign nodules but may delay diagnosis and treatment of malignant nodules.

We developed a decision analytic model to identify the most effective approaches to diagnose and manage solitary pulmonary nodules. We performed a cost-effectiveness analysis to quantify the health effects and economic costs associated with various management strategies. Because of the considerable recent interest in the use of FDG-PET, we specifically compared diagnostic strategies that used FDG-PET with strategies that did not include this potentially useful but expensive test.

We performed a cost-effectiveness analysis by following the recommendations of the Panel on Cost-Effectiveness in Health and Medicine for conducting and reporting a reference-case analysis (10). The analysis adopted a societal perspective that would permit comparisons across different health care interventions. We expressed our results in terms of costs, quality-adjusted life-years (QALYs), and incremental cost-effectiveness ratios. All costs and health effects were discounted at an annual rate of 3%. Additional details on our methods, data sources, and results can be found in the Appendix. An electronic decision aid that is based on our results will be Available at http://www.annals.org in July 2003.

Clinical Problem

The target population for this analysis was all adult patients with a new, noncalcified solitary pulmonary nodule on chest radiograph and no known extra-thoracic malignancy. Our base-case analysis considered a hypothetical cohort of 62-year-old men and women.

Decision Model Structure and Assumptions

We considered 40 clinically plausible sequences of five diagnostic interventions: CT, FDG-PET, transthoracic needle biopsy, surgery, and watchful waiting (Appendix Figures 1 and 2. We assumed that CT and FDG-PET were never performed after needle biopsy or surgery because performing an imaging test after an invasive diagnostic procedure is unusual. Similarly, needle biopsy and observation were never performed after surgery.

Needle biopsy was considered to be nondiagnostic unless a specific benign or malignant diagnosis was obtained. We assumed that surgery would be performed if the biopsy revealed malignancy. If the biopsy revealed a specific benign diagnosis, the patient would be managed accordingly. After a nondiagnostic needle biopsy, either surgery or watchful waiting could be selected as the next diagnostic intervention.

A final diagnosis was established at the time of surgery or, alternatively, after 24 months of observation. In the watchful waiting strategy, serial chest radiographs were obtained at 1, 2, 4, and 6 months and every 3 months thereafter. We assumed that surgery would be performed if nodule growth was observed at any time. If no growth was observed after 24 months, we assumed that the nodule was benign.

Modeling Long-Term Costs and Clinical Outcomes

We developed a Markov model to estimate long-term outcomes and costs for patients with malignant and benign pulmonary nodules (Appendix Figure 3). The model followed patients in the hypothetical study cohort over their remaining life span. We estimated the monthly probability of cancer recurrence after surgical treatment for patients with malignant nodules by using survival data from the Surveillance, Epidemiology and End Results (SEER) tumor registry (11). We used SEER data and a model of the natural history of untreated lung cancer to estimate the probability of disease progression in patients with malignant nodules who were managed by watchful waiting.

Data and Assumptions

We derived estimates for patient and nodule characteristics, diagnostic testing variables, costs and utilities from clinical and administrative sources (Appendix Table 1).

Patient and Nodule Characteristics

In the base-case analysis, we assumed that the pulmonary nodule measured 2 cm in diameter. We assumed that 12.5% of patients with malignant nodules would have regional lymph node involvement, the median prevalence of mediastinal metastases in eight studies of CT for staging in patients with T1 tumors (1219). We derived the distribution of tumor growth rates for patients with malignant nodules from the Veterans Administration–Armed Forces Cooperative Study on Asymptomatic Pulmonary Nodules (20).

Pretest Probability

We performed separate analyses for representative patients with low (26%), intermediate (55%), and high (79%) pretest probabilities of malignancy. Although most clinicians assess this intuitively, investigators have developed quantitative models to estimate the probability of cancer. One model that has undergone preliminary validation used logistic regression to identify six independent predictors of malignancy: age, smoking status, history of cancer, nodule diameter, spiculation, and upper lobe location (21). Additional information about this model, including the prediction equation, can be found in the Appendix.

Diagnostic Test Performance

We performed a meta-analysis to estimate the diagnostic accuracy of FDG-PET; our methods and results have been published elsewhere (22). We identified 13 studies of FDG-PET that enrolled 450 patients with pulmonary nodules (2335). We used the method of Moses and colleagues (3637) to construct a summary receiver-operating characteristic (ROC) curve for FDG-PET. For our base-case estimates, we selected an operating point on the ROC curve that corresponded to the median specificity of FDG-PET in the 13 studies. At this point on the curve, sensitivity and specificity for identifying malignancy were 94.2% and 83.3%, respectively.

To estimate diagnostic performance for CT and transthoracic needle biopsy, we searched MEDLINE and applied Moses and colleagues' method to construct summary ROC curves for these tests. For CT, base-case estimates of sensitivity and specificity for identifying malignancy were 96.5% and 55.8%, respectively (24, 3847). In this report, the terms “possibly malignant” and “benign” describe CT results that are positive and negative for malignancy, respectively.

We estimated that CT-guided needle biopsy would not reveal a specific diagnosis in 8% of patients with malignant nodules and 44% of patients with benign nodules (27, 4855). When fluoroscopic guidance was used, we assumed that the frequency of nondiagnostic biopsy results would be 10% higher (5658). When needle biopsy revealed a specific benign or malignant diagnosis, we estimated that the false-negative and false-positive rates were 3.7% and 2.0%, respectively (27, 4855). Base-case estimates of the probability of minor pneumothorax and major pneumothorax requiring chest tube drainage were 24% and 5%, respectively (27, 4855).

We assumed that video-assisted thoracoscopy would be used to perform surgical biopsy and that the procedure would be converted to a thoracotomy with lobectomy if the frozen section revealed malignancy. We derived estimates for probabilities of fatal and non-fatal surgical complications from sources in the clinical literature (5967).

Costs

We converted all costs to 2001 U.S. dollars by using the gross domestic product deflator (6869). To derive costs for imaging tests and needle biopsy, we added procedure costs and professional fees that were based on Medicare reimbursement rates (7071). To estimate costs for surgical procedures and complications, we added professional fees (70) and median cost-adjusted charges from the 1996 Health Care Utilization Project database (72). To estimate long-term costs for patients with local, regional, and distant-stage lung cancer, we analyzed Medicare claims files linked with data from the SEER tumor registry for the years 1990 to 1993 (73). To estimate health care costs for patients with benign nodules and for patients with malignant nodules who survived more than 5 years after diagnosis, we used age-specific, annual health care expenditures from the Consumer Expenditures Study (74).

Utilities

We adjusted life expectancy for quality of life by using age- and sex-specific utilities (preference-based weights for health states) from the Beaver Dam Health Outcomes study (75) and available data to estimate reductions in utility associated with regional and distant-stage lung cancer (76). We also adjusted life expectancy for time spent in the hospital and time spent having diagnostic procedures. When possible, we used data on average length of hospital stay to make these adjustments (71). Because we could not identify studies that measured utilities in patients with undiagnosed pulmonary nodules, we assumed that the relative utility for time spent during observation was normal and we used age- and sex-specific values. To account for the possibility that some patients might be uncomfortable not knowing whether a nodule was benign or malignant, we tested lower values in a sensitivity analysis.

Sensitivity Analysis

One-way, multiway, and probabilistic sensitivity analyses were performed to identify important model uncertainties. When possible, ranges for variables were based on reported or calculated 95% CIs for means and interquartile ranges for medians. For diagnostic accuracy, several points on summary ROC curves and their 95% confidence intervals were evaluated. Clinical judgment was used to determine ranges for utilities. For costs, ranges were determined by adding or subtracting 25% from the base-case estimate. To determine ranges for transition probabilities in the Markov model, 50% was added or subtracted from the base-case value because these estimates were highly uncertain.

We performed probabilistic sensitivity analysis by stratifying patients according to pretest probability and risk for surgical complications. We assigned logit-normal distributions to all costs and probabilities for all diagnostic test variables by using the method of Doubilet and colleagues (77) and performed 10 000 simulations by randomly sampling values from these distributions. We then recorded the number of simulations in which the strategy under consideration was cost saving (more effective and less expensive than the alternative) or economically attractive (more effective and with an incremental cost < $100 000 per QALY gained). For description of the software used in our analysis, see the Appendix.

Role of the Funding Source

The funding source had no role in the design, conduct, or reporting of the study or in the decision to publish the manuscript.

Because of the complexity of the analysis, we begin by summarizing our major findings. First, we found that the effectiveness and cost-effectiveness of management strategies depended critically on the pretest probability of malignancy and, to a lesser extent, the risk for surgical complications. Second, we found that CT was recommended as the initial test in nearly all circumstances, except when pretest probability was extremely high. Third, while nonselective use of FDG-PET was highly effective for pulmonary nodule diagnosis, we found that it was most cost-effective to use FDG-PET selectively, typically when pretest probability and CT results were discordant. Finally, we found that it was both highly effective and highly cost-effective to use surgery and needle biopsy aggressively once the results of imaging tests were known.

In patients with low pretest probability (26%), watchful waiting was the least effective and least expensive strategy (Table 1). A strategy that used CT but not FDG-PET was much more effective and cost less than $11 000 per QALY gained relative to watchful waiting. However, two strategies that used FDG-PET selectively were even more effective and cost less than $50 000 per QALY gained. In both of these strategies, CT was performed as the initial test and FDG-PET was used when CT results were possibly malignant. Surgery was recommended when the results of FDG-PET were positive, and needle biopsy was recommended when FDG-PET results were negative. Another strategy that used CT and FDG-PET nonselectively in all patients was most effective, but it cost almost $300 000 per QALY gained.

Table Jump PlaceholderTable 1.  Expected Costs, Quality-Adjusted Life-Years, and Incremental Cost-Effectiveness Ratios for Nondominated Strategies in Patients with Low, Intermediate, and High Pretest Probability of Malignancy

In patients with intermediate pretest probability (55%), watchful waiting was the least effective and least expensive approach (Table 1). Three strategies that used CT without FDG-PET cost less than $20 000 per QALY gained. In the most effective of these strategies, surgery was performed when CT results were possibly malignant and needle biopsy was performed when CT results were benign. Two strategies that included FDG-PET were more expensive and only marginally more effective than CT-based approaches and therefore cost more than $220 000 per QALY gained.

In patients with high pretest probability (79%), watchful waiting was again the least effective and least expensive approach (Table 1). A strategy that used CT but not FDG-PET was much more effective and cost less than $7000 per QALY gained relative to watchful waiting. Three strategies that used FDG-PET selectively were even more effective and cost less than $70 000 per QALY gained. In all three strategies, surgery was recommended when CT results were possibly malignant and FDG-PET was recommended when CT results were benign. The most effective of these strategies used surgery when FDG-PET results were positive and needle biopsy when FDG-PET results were negative.

Sensitivity Analysis

Figure 1 shows the importance of pretest probability in greater detail. Use of FDG-PET cost less than $100 000 per QALY gained when pretest probability was low (10% to 50%) and CT results were possibly malignant or when pretest probability was high (77% to 89%) and CT results were benign. Both of these situations resulted in intermediate post-test probabilities (20% to 69%). Surgery was favored at higher post-test probabilities (≥ 70%), biopsy was preferred at lower post-test probabilities (2% to 20%), and watchful waiting was preferred only at very low post-test probabilities (<2%). Surgery without any preliminary diagnostic testing was preferred when pretest probability was at least 90%. In patients at high risk for surgical complications, FDG-PET strategies cost less than $100 000 per QALY gained when post-test probability ranged between 35% and 84%.

Grahic Jump Location
Figure 1.
Recommended sequence of diagnostic testing in patients who are at average risk for surgical complications, according to pretest probability and the results of computed tomography (CT).topbottom

The recommended sequence of tests when CT results are possibly malignant ( ) and when CT results are benign ( ) is shown. Subsequent test selection is shown to be a function of pretest probability and the corresponding post-test probability once the results of CT are known. Note that surgery is preferred when positron emission tomography (PET) results are positive, biopsy is preferred when PET results are negative, and watchful waiting is preferred when biopsy results are nondiagnostic. Recommendations are based on the assumption that society is willing to pay $100 000 per quality-adjusted life-year gained. Results were very similar when willingness to pay was assumed to be $25 000 or $50 000 per quality-adjusted life-year gained. FDG-PET = positron emission tomography with 18-fluorodeoxyglucose.

Grahic Jump Location

Several other variables affected the choice of strategy for patients with intermediate pretest probability, including the sensitivity (but not the specificity) of CT, the probability of nondiagnostic needle biopsy in patients with malignant nodules, and patient preferences for time spent under observation. Uncertainty exists about the sensitivity of CT, in part because there are no widely accepted CT criteria for determining whether a nodule is possibly malignant or benign. When we assumed that the true sensitivity of CT was less than 92.5% (compared with the base-case value of 96.5%), a strategy that used FDG-PET selectively cost less than $70 000 per QALY gained. In this strategy, FDG-PET was used when CT results were benign and surgery was recommended when CT results were possibly malignant.

There is also uncertainty about the diagnostic yield of needle biopsy (because this depends on operator experience) and patient preferences for time spent in observation. Selective use of FDG-PET (when CT results were benign) cost less than $25 000 per QALY gained when we assumed that the probability of nondiagnostic needle biopsy in patients with malignant nodules was 19% (base-case value of 8%), or when the relative utility of the time spent in observation was assumed to be 0.97 or less (base-case value of 1.00). A relative utility of 0.97 implies that an individual would accept a 3% risk for instant death in order to know whether the nodule was malignant or benign.

The choice of strategy was not affected by varying other model parameters within the ranges tested, including the discount rate, the diagnostic accuracy of FDG-PET, or the cost of diagnostic tests (including FDG-PET). Similarly, the choice of strategy was not affected when we assumed that FDG-PET was 25% less sensitive and 16% more specific when CT results were benign, as has been observed in studies of FDG-PET and CT for mediastinal staging in patients with non–small-cell lung cancer (78).

Probabilistic sensitivity analysis showed that in patients with low and high pretest probability, FDG-PET strategies were cost saving or cost less than $100 000 per QALY gained in 76.7% and 99.9% of all simulations, respectively. For patients with intermediate pretest probability, FDG-PET strategies were cost saving or economically attractive in fewer than 25% of all simulations.

Clinical Recommendations

Figure 2 outlines a clinical algorithm for managing patients with new, noncalcified pulmonary nodules that is based on our results. In patients with low pretest probability (10% to 50%), FDG-PET should be used selectively when CT results are possibly malignant. When FDG-PET results are positive, surgery is both highly cost-effective and slightly more effective than needle biopsy. When FDG-PET results are negative, needle biopsy is more effective than observation. This is because, although uncommon, false-negative results of FDG-PET have potentially serious consequences (such as delayed diagnosis and missed opportunities for curative surgery). When CT results are benign, observation or needle biopsy should be used. Our analysis suggests that the latter approach is slightly more effective.

Grahic Jump Location
Figure 2.
Suggested algorithm for clinical management of patients with solitary pulmonary nodules who are at average risk for surgical complications.CT

The algorithm pertains to patients with low (10% to 50%), intermediate (51% to 76%), and high (77% to 90%) pretest probability of malignancy. Note that in patients with very low pretest probability (<10%), biopsy is preferred when computed tomography ( ) results are possibly malignant and watchful waiting is preferred when CT results suggest a benign diagnosis. In patients with very high pretest probability (>90%), surgery without diagnostic testing is the preferred strategy. FDG-PET = positron emission tomography with 18-fluorodeoxyglucose.

Grahic Jump Location

For patients with intermediate pretest probability (51% to 76%), we recommend surgery or needle biopsy when CT results are possibly malignant and needle biopsy or observation when CT results are benign (Figure 2). More aggressive use of surgery and needle biopsy results in slightly better health outcomes and slightly higher costs. The choice between more or less aggressive approaches should depend on factors such as the risk for surgical complications, the expected yield of needle biopsy, and patient preferences. For example, in patients who have severe comorbid conditions that increase the risk of surgery, it may be preferable to establish a malignant diagnosis with needle biopsy before sending the patient to surgery.

For patients with high pretest probability (77% to 90%), we recommend surgery when CT results are possibly malignant, unless the patient is at very high risk for operative complications (Figure 2). Patients should undergo FDG-PET when CT results are benign. When FDG-PET results are positive, surgery should be performed. When FDG-PET results are negative, needle biopsy is marginally more effective than watchful waiting, although some clinicians might prefer observation in this situation.

Pulmonary nodule diagnosis is challenging because the clinician and patient must consider many factors when discussing management options, including the risks and benefits of several possible diagnostic tests, patient preferences for invasive and noninvasive procedures, and the uncertain consequences of delayed diagnosis when watchful waiting is used as a management strategy. In this analysis, we used quantitative methods to synthesize these and other factors.

Pulmonary nodule diagnosis should always begin with a careful review of the chest radiograph and comparison with previous radiographs. Most experts agree that if a central, laminated, diffuse, or popcorn pattern of calcification is seen, a benign diagnosis is likely and observation with serial radiographs is appropriate (1, 79). Likewise, because doubling times for malignant nodules rarely exceed 700 days, 2-year radiographic stability strongly implies a benign cause. In the absence of benign calcification or documented radiographic stability, the clinician should estimate the pretest probability of malignancy. Once these steps have been taken, our findings can be used to guide subsequent management.

We confirmed that CT should be the initial test in the management of nearly all patients with pulmonary nodules. Computed tomography is inexpensive and noninvasive and may be highly specific for identifying some benign nodules. In contrast, FDG-PET as the initial test is never economically attractive. We also found that the choice of subsequent tests depends most critically on the pretest probability of malignancy and the risk for surgical complications. Other potentially important factors include the sensitivity of CT, the probability of nondiagnostic needle biopsy, and patient preferences for time spent in watchful waiting.

Although FDG-PET strategies were highly effective over a wide range of pretest probabilities, these strategies were not necessarily cost-effective. Our results indicate that FDG-PET should be used selectively when pretest probability and CT results are discordant; in these cases, post-test probability will be intermediate. Selective use of FDG-PET limits costs by reducing the total number of FDG-PET studies performed and ensures that FDG-PET is used when the diagnosis is most in doubt. Table 2 summarizes specific recommendations on when to use CT, FDG-PET, watchful waiting, needle biopsy, and surgery.

Table Jump PlaceholderTable 2.  Recommendations on the Use of Computed Tomography, Positron Emission Tomography with 18-Fluorodeoxyglucose, Watchful Waiting, Transthoracic Needle Biopsy, and Surgery

Our results support and extend the findings of other studies. In a decision analysis that did not consider costs, Cummings and colleagues (9) found that the choice of strategy depended on the pretest probability of malignancy. Watchful waiting was preferred over biopsy when the probability of cancer was less than 3%, and surgery was preferred over biopsy when the probability of cancer was greater than 68%. Our threshold values for watchful waiting and surgery were very similar to theirs, despite the fact that they made less pessimistic assumptions about the consequences of delayed diagnosis in patients with malignant nodules who were managed by observation. We extend their work by demonstrating that FDG-PET replaces needle biopsy when the post-test probability of malignancy ranges from 20% to 69%. In a recent cost-effectiveness analysis based on reimbursement rates in Germany, Dietlein and colleagues (80) reported that their threshold for performing surgery instead of FDG-PET occurred at a similar probability of 75%. In another cost-effectiveness analysis, results for patients with low pretest probability were similar to ours, although we recommend strategies in patients with intermediate and high pretest probability that these authors did not consider (81).

Our analysis has several limitations. First, the natural history of untreated malignant pulmonary nodules is not known. We modeled the consequences of delayed diagnosis when watchful waiting was used in patients with malignant nodules. Few empirical data exist to validate our model. However, our results were very similar when we adopted Cummings and colleagues' less pessimistic assumptions about the consequences of delayed diagnosis.

Second, our base-case analysis assumed that test performance was conditionally independent, or that the sensitivity and specificity of FDG-PET did not differ depending on the results of CT. Although FDG-PET and CT identify malignant nodules by different mechanisms, their results might be correlated. For example, our group and others have observed that, when used for mediastinal staging in patients with non–small-cell lung cancer, FDG-PET is less sensitive and more specific when CT findings are negative (78, 82). We are not aware of any data on the conditional performance of these tests for pulmonary nodule diagnosis. In fact, most studies of FDG-PET limited enrollment to participants with possibly malignant findings on CT. Thus, our base-case estimates of sensitivity and specificity for FDG-PET best reflect its performance when CT results are possibly malignant, and concerns about conditional test performance do not compromise our recommendation to use FDG-PET selectively when pretest probability is low and CT results are possibly malignant. Still, if FDG-PET is less sensitive and more specific than our base-case estimates when CT results are benign, this could raise doubts about our recommendation to use FDG-PET when pretest probability is high. However, sensitivity analysis showed that our results did not change when we assumed that the sensitivity of FDG-PET was as low as 50% in patients with benign results on CT. Improving the specificity of FDG-PET would only strengthen the recommendation to use it in this setting.

Our analysis did not consider several other potential benefits of FDG-PET imaging. Use of FDG-PET is more accurate than CT for detecting regional lymph node metastases (8283), which occur in approximately 12% of patients with T1 lung cancer (1219), and may also detect occult distant metastases (82). Finally, future advances in FDG-PET technology may improve accuracy or reduce costs. Several groups have recently described an FDG imaging technique that does not require a dedicated PET scanner but rather performs coincidence imaging by using a modified dual-detector γ camera (8487). While this technique is less expensive than FDG imaging with a dedicated PET scanner, its diagnostic accuracy has not been evaluated in large, well-designed studies.

We conclude that patients with new, noncalcified pulmonary nodules should first be classified according to pretest probability and the risk for surgical complications, after which CT should be performed. For patients who are at average risk for surgical complications, FDG-PET should be used selectively when pretest probability and CT results are discordant. For patients at high risk for surgical complications with low or intermediate pretest probability, FDG-PET should be used when CT results are possibly malignant. In most other circumstances, CT-based strategies result in similar quality-adjusted life expectancy and lower costs.

Appendix

In this appendix, we describe in greater detail the methods and results of our analysis of alternative management strategies for patients with solitary pulmonary nodules. Readers should consult the print version of the manuscript for background information, results of the base-case analysis, results of one-way sensitivity analysis, and a discussion of the results. We focus on describing the assumptions of the Markov model that we used to estimate long-term costs and outcomes for patients with malignant pulmonary nodules. In addition, we provide detailed information regarding studies of diagnostic test performance and a critique of their methods, as well as more information regarding our sources of data for cost and utility estimates. Finally, we present selected results that did not appear in the print version of the manuscript because of space limitations, including complete results of probabilistic sensitivity analysis.

Methods

The target population for this analysis was all adult patients found to have a new, noncalcified solitary pulmonary nodule on chest radiograph and no known extra-thoracic malignancy. We assumed that there was no absolute contraindication to invasive biopsy or surgery, because patients with such contraindications would not be likely to undergo aggressive diagnostic evaluation.

Decision Model Structure and Assumptions

Appendix Figures 1 and 2 illustrate the structure of the decision model. The model compared 40 clinically plausible sequences of five diagnostic interventions: CT, FDG-PET, transthoracic needle biopsy, surgery, and watchful waiting. Appendix Table 2 is a complete list of strategies. We evaluated all plausible sequences of diagnostic tests to compare strategies with the next most effective alternative when making cost-effectiveness comparisons. Comparing an intervention with a suboptimal alternative may result in overestimating the intervention's true cost-effectiveness (10). In addition, strategies that might seem to be counterintuitive often prove to be highly cost-effective under certain conditions.

Grahic Jump Location
Appendix (Figure 1). The square decision node ( ) indicates that computed tomography ( ), observation (watchful waiting), surgery, positron emission tomography with 18-fluorodeoxyglucose ( ), or transthoracic needle biopsy may be selected as the initial diagnostic test. If CT is selected first, the result may be possibly malignant or benign. Observation, surgery, biopsy, or FDG-PET may be the next diagnostic test, depending on the results of CT ( ). If FDG-PET is selected as the first test, results may be positive or negative; CT, observation, surgery, or biopsy may be selected as the next test, depending on results ( ). Observation, surgery, or biopsy may be selected after both CT and FDG-PET have been performed ( ).
Decision model.ACTFDG-PETBCD
Grahic Jump Location
Grahic Jump Location
Appendix (Figure 2). Needle biopsy may result in fatal or nonfatal complications or no complications (biopsy subtree). If no fatal complications occur, the biopsy may be diagnostic or nondiagnostic, depending on whether it yields a specific malignant or benign diagnosis. If the biopsy reveals malignancy, we assumed that surgery would be performed. If the biopsy reveals a specific benign diagnosis, we assumed that the patient would be treated accordingly and monitored with serial chest radiographs. After a nondiagnostic biopsy, surgery or observation may be selected as the next diagnostic option. Surgery may result in fatal or nonfatal complications, or no complication (surgery subtree). At surgery, most malignant nodules will be local-stage lung cancer, but metastases to regional lymph nodes may be detected in some cases. Some nodules will be benign, depending on the prevalence of benign disease in the target population.
Decision model subtrees.
Grahic Jump Location
Table Jump PlaceholderAppendix Table 2.  Alternative Strategies for Management of Patients with Solitary Pulmonary Nodules

The order of possible test sequences was unconstrained, with two exceptions. First, CT and FDG-PET were never performed after needle biopsy or surgery, because performing an imaging test after an invasive diagnostic procedure would be unusual. Similarly, needle biopsy and observation were never performed after surgery. Computed tomography, FDG-PET, needle biopsy, surgery, or watchful waiting could be selected as the initial diagnostic intervention (Appendix Figure 1).

We assumed that most biopsies were performed under CT guidance, but fluoroscopic guidance was used in test sequences that did not include CT. We considered needle biopsy to be nondiagnostic unless a specific benign or malignant diagnosis was obtained. We assumed that surgery would be performed if the biopsy revealed malignancy. If the biopsy revealed a specific benign diagnosis, we assumed that the patient would be managed accordingly. After a nondiagnostic needle biopsy, either surgery or watchful waiting could be selected as the next diagnostic intervention (Appendix Figure 2).

A final diagnosis was established at the time of surgery or, alternatively, after 24 months of observation. In the observation (watchful waiting) strategy, serial chest radiographs were obtained at 1, 2, 4, and 6 months, and every 3 months thereafter. We assumed that surgery would be performed if nodule growth was detected at any time. If no growth was observed after 24 months, we assumed that the nodule was benign. It is important to note that the optimal timing of serial radiographs has not been determined. However, in our protocol, imaging was used more frequently than the protocol recommended by the Early Lung Cancer Action Project (ELCAP) investigators, who recommended that CT be performed at 3, 6, 12, and 24 months after identification of nodules that measured less than 1 cm in diameter (88). Furthermore, we assumed that chest radiography had a sensitivity of 100% for detecting growth, defined as one doubling in tumor volume or a change in nodule size from 2 cm to 2.5 cm in diameter. We believe that by making this assumption, the modeled performance of chest radiography compares favorably with the actual performance of CT in everyday practice. In addition, we suspect that chest radiography is more widely used than CT in practice settings for watchful waiting in patients with nodules that measure 2 cm in diameter. However, because of the better spatial resolution of CT and the difficulty in detecting growth in small pulmonary nodules, we believe that CT should be used for watchful waiting in patients with nodules that measure less than 1 cm to 1.5 cm in diameter.

Modeling Long-Term Costs and Clinical Outcomes

We developed a state-transition (Markov) model to estimate long-term outcomes and costs for patients with malignant and benign pulmonary nodules (Appendix Figure 3). The model followed individual patients in the hypothetical study cohort over their remaining life span. Individuals were assumed to make transitions from one health state to another over time. Before the time of diagnosis, all patients were considered to be in the “unknown” health state, reflecting the unknown nature of the diagnosis. Within this state, patients with malignant nodules who were managed by watchful waiting were at risk for disease progression from local to regional disease and from regional to distant disease during the observation period. At the time of diagnosis, patients were assumed to transition from the “unknown” state to one of three other health states (“benign,” “local,” or “regional”), depending on the diagnosis and stage of disease. We assumed that all patients with malignant nodules eventually underwent surgery, although surgery was inevitably delayed in patients whose management strategy included watchful waiting. After surgery, all patients with local-stage malignant disease were at risk for recurrence. Similarly, patients with regional-stage disease were at risk for disease progression. Patients who remained in the “local” and “regional” health states for 5 years after the time of diagnosis were considered to be free of cancer (89). After this time, we assumed that life expectancy was normal for age. We assumed that all patients with distant-stage lung cancer eventually died of their cancer, if they did not die first of some other cause.

Determining Markov Model Transition Probabilities

To estimate the monthly probability of cancer recurrence for patients with malignant pulmonary nodules, we constructed survival curves for 1207 Medicare beneficiaries with surgically treated, local-stage, malignant pulmonary nodules (T1N0M0) from the SEER tumor registry for 1990 to 1993 (11). We assumed that some patients would have recurrent disease and then die of cancer and that the remainder would eventually die of other causes. We estimated the monthly probability of death from recurrent lung cancer by using SEER tumor registry data for 10 835 Medicare beneficiaries with distant-stage disease. We determined the monthly probability of death from other causes by using age-specific values from 1996 U.S. life tables (90). We assumed that the probability of recurrence decreased gradually over time. We then identified monthly probabilities of recurrence that produced a modeled survival curve that most closely approximated observed survival in the SEER cohort. We fit this curve by minimizing the sum of the squared differences between points representing the probability of survival at years 1 through 5. Appendix Figure 4 shows the survival curve for the observed cohort and modeled survival when probabilities of recurrence were set at our base-case values (Appendix Table 1).

Grahic Jump Location
Appendix (Figure 4). Survival curves for patients with pathologically staged lung cancer (T1N0M0), pathologically staged regional lung cancer (any T N1–3 M0), and distant lung cancer (any T any N M1) are from the linked Medicare claims–Surveillance, Epidemiology and End Results ( ) tumor registry. Modeled survival was based on Markov transition probabilities. For patients with local and regional lung cancer, modeled survival closely approximated observed survival.
Observed and modeled survival for patients with local, regional, and distant lung cancer.SEER
Grahic Jump Location

We used an identical procedure to estimate the monthly probability of progression from regional to distant-stage lung cancer. We approximated a survival curve for a cohort that included 1954 Medicare enrollees with pathologically confirmed, regional-stage lung cancer (any T N1–3 M0) from the SEER registry. Appendix Figure 4 shows observed and modeled survival when probabilities of progression from regional to distant disease were set at our base-case values (Appendix Table 1).

Determining the Probability of Disease Progression during Watchful Waiting

We used similar methods to model the monthly probability of disease progression during watchful waiting. We assumed that monthly probabilities for disease progression depended on the doubling time of the nodule, a measure of the tumor growth rate. We used data from the Veterans Administration–Armed Forces Cooperative Study on Asymptomatic Pulmonary Nodules to estimate the distribution of doubling times for malignant nodules (20). The mean doubling time was 5.24 months (median, 4 months).

To predict life expectancy for patients with malignant nodules with different doubling times, we adopted a simple model of the natural history of lung cancer (9193). The model assumes that a tumor starts as a single cell that measures 10 microns in diameter and doubles in volume at a constant rate. Under these assumptions, a nodule that measures 2 cm in diameter has doubled in volume 33 times. It is further assumed that death occurs, on average, after 40 tumor doublings when the diameter of the tumor measures 10 cm. The model predicts that in the absence of treatment, life expectancy is 36.7 months for a patient with a 2-cm nodule that doubles in volume every 5.24 months. Because empirical data on the natural history of untreated malignant nodules are lacking, we validated this prediction by surveying a group of academic clinicians for their expert opinion. We asked a convenience sample of 21 internists, pulmonary specialists, and thoracic surgeons to estimate the life expectancy of an otherwise healthy, 62-year-old man with a 2-cm malignant pulmonary nodule who declined treatment, assuming that the growth rate of the nodule was average. Estimated mean life expectancy (±SD) was 35.12 ± 16.33 months (median, 32 months), which closely agreed with what the model predicted (Gould MK. Unpublished data).

We used the declining exponential approximation of life expectancy (DEALE) to construct survival curves for patients who had nodules with different doubling times (9495). The DEALE assumes that survival is approximated by a simple declining exponential function. Under the assumptions of the DEALE, life expectancy is the reciprocal of the average compound mortality rate. A life expectancy of 36.7 months corresponds to a constant mortality rate of 0.327 per year. Predicted survival (S) at time (t) is given by the formula: S = ert, where r is the mortality rate, and t is measured in years. For example, when r = 0.327, the 1-year survival rate is 72.1%, and the 5-year survival rate is 19.5%.

To determine the monthly probability of disease progression, we assumed that untreated lung cancer progresses sequentially from local to regional to distant disease and then to death. We assumed that the transition probabilities for local to regional disease and regional to distant disease were equal. We determined the monthly probability of death from distant lung cancer and the probability of death from other causes by using data from the SEER tumor registry and U.S. life tables, respectively. We then identified transition probabilities that produced survival curves that most closely approximated the curves that we obtained by using the natural history model and the assumptions of the DEALE. To fit these curves, we minimized the sum of the squared differences between points representing the probability of survival at years 1 through 5. In the case of a malignant nodule with a doubling time of 5.24 months, we calculated that the monthly probability of progression from local to regional disease during the observation period was 8.4%. Appendix Figure 5 shows the distribution of doubling times for malignant pulmonary nodules and the corresponding transition probabilities that we derived.

Grahic Jump Location
Appendix (Figure 5). The frequency plot demonstrates the distribution of observed doubling times for 67 pulmonary nodules and mass lesions from the Veterans Administration–Armed Forces Cooperative Study on Asymptomatic Pulmonary Nodules . The monthly probability of disease progression ( ) was assumed to be a function of the tumor doubling time.
Distribution of tumor doubling times and corresponding probabilities of disease progression during the observation period.(20)black circles
Grahic Jump Location

Other investigators have used a different approach to estimate the negative consequences of delayed diagnosis in patients with malignant nodules who were managed by observation. Cummings and colleagues (9) observed that there was a linear relationship between tumor size and survival after resection of malignant lung tumors. The relationship that they observed was described by the equation r = (0.039 × d)–0.0145, where r is the disease-specific mortality rate (which is assumed to be constant over time) and d is the diameter of the nodule in centimeters. Like us, they assumed that growth would be detected when the nodule doubled once in volume (for example, in a patient with a nodule that measured 2 cm in diameter, growth would be detected when the nodule measured 2.5 cm in diameter). Under these assumptions, they estimated that 5-year survival would be reduced by 5% in patients who were managed by observation at some point in their evaluation. Using the same set of assumptions, Gambhir and colleagues (81) calculated that life expectancy for a 64-year-old man with a 2.5-cm nodule would be reduced by 14%, from 6.62 years to 5.67 years, if diagnosis and treatment were delayed by one doubling time. The main limitation of these previous analyses is that the linear relationship between tumor size and survival was derived from studies in which the definition of pulmonary nodules included lesions that measured up to 6 cm in diameter (9697). In one of the studies, almost 40% of patients had pulmonary masses 3.5 cm to 6 cm in diameter (98). Recent evidence suggests that nodule diameter may not predict survival within the subgroup of patients with malignant nodules that measure no more than 3 cm in diameter, who are the focus of our analysis (99).

Data and Assumptions

We derived estimates for patient and nodule characteristics, diagnostic testing variables, costs, and utilities from clinical and administrative data sources (Appendix Table 1). Estimates of the diagnostic accuracy of FDG-PET imaging were obtained from a published meta-analysis, whose methods and results are summarized below (22). A single reviewer evaluated most studies of CT, needle biopsy, and surgery. For these tests, we used a descriptive approach to highlight aspects of study quality.

Patient and Nodule Characteristics. Our base-case analysis considered a hypothetical cohort of 62-year-old men and women because in eight recent studies of FDG-PET for pulmonary nodule diagnosis, roughly 60% of the participants were men and the mean age was 61.8 years (2528, 3031, 33, 35). The base-case analysis assumed that the pulmonary nodule measured 2 cm in diameter. We assumed that 12.5% of patients with malignant nodules would have regional lymph node involvement, based on the median prevalence of mediastinal metastases in eight studies of CT for mediastinal staging in patients with stage T1 bronchogenic carcinoma (1219). The monthly probability that a benign nodule would grow in the first month was 28%, based on the median proportion of benign nodules that were caused by an acute or subacute inflammatory process in 10 recent studies of FDG-PET for pulmonary nodule diagnosis (23, 2528, 30, 3235). Because most benign nodules grow rapidly or not at all, the monthly probability that a benign nodule would grow in subsequent months was assumed to be 0.5%.

Diagnostic Test Performance: FDG-PET. To determine the diagnostic accuracy of FDG-PET, we performed a meta-analysis. Although our methods and results have been published elsewhere (22), we summarize them below. Our computerized search strategy is outlined in Appendix Table 3. We identified 13 studies of FDG-PET that enrolled 450 patients with pulmonary nodules (2335). The number of participants ranged between 19 and 100, and the mean age of participants ranged between 58 and 71 years (Appendix Table 4). The median prevalence of malignancy was 65.8% (range, 46% to 79%). Six studies limited enrollment to participants with pulmonary nodules (25, 3032, 3435). Seven other studies enrolled more heterogeneous groups of patients with pulmonary nodules and larger mass lesions but provided separate results for participants with pulmonary nodules (2324, 2629, 33). We only used data from participants with pulmonary nodules to perform quantitative analyses.

Table Jump PlaceholderAppendix Table 3.  MEDLINE Search for Studies of Positron Emission Tomography with 18-Fluorodeoxyglucose
Table Jump PlaceholderAppendix Table 4.  Studies of Positron Emission Tomography with 18-Fluorodeoxyglucose for Pulmonary Nodule Diagnosis

To identify high-quality studies, we adapted criteria for methodologic quality proposed by Kent and colleagues (100), who evaluated imaging tests to diagnose lumbar spinal stenosis. These criteria have also been used to assess the quality of studies of polymerase chain reaction to diagnose HIV infection (37, 101). The revised criteria cover seven dimensions: technical quality of the index test, technical quality of the reference test, independence of test interpretation, description of the study sample, cohort assembly, sample size, and unit of data analysis. Eleven of 13 studies satisfied at least 70% of our study quality criteria (2433, 35). The other two studies satisfied between 50% and 69% of our criteria (23, 34). Most studies met our criteria for the technical quality of FDG-PET, although three studies administered doses of FDG that were lower than recommended (3435, 102), and three studies did not specify whether participants were examined in the fasting state (23, 25, 27). All studies but one adequately described reference tests that were used to confirm the presence of malignancy or to establish a benign diagnosis (23). Three studies did not report whether FDG-PET readers were blinded to the results of the reference test (23, 30, 34), and several studies did not indicate whether FDG-PET readers were blinded to the clinical characteristics of the participants and other radiographic data. All studies prospectively enrolled a relevant cohort of participants, and all but two studies indicated that the individual patient was the unit of the analysis (27, 35).

To quantitatively summarize study results, we used a meta-analytic method to construct a summary ROC curve for FDG-PET (3637). The ROC curves illustrate the tradeoff between sensitivity and specificity as the threshold for defining a positive test result varies from most stringent to least stringent. Our meta-analytic method rests on the assumption that individual study estimates of sensitivity and specificity represent unique points on a common ROC curve.

For each study, we constructed 2 × 2 contingency tables in which all participants were classified as being FDG-PET–positive or–negative and as having a malignant or benign pulmonary nodule. We calculated the true-positive rate (TPR = sensitivity), the false-positive rate (FPR = 1–specificity) and the log odds ratio (log odds TPR–log odds FPR). The log odds ratio is a measure of diagnostic test performance that accounts for the fact that the TPR and FPR are positively correlated. Next, we logistically transformed the TPR and FPR and fit a summary ROC curve with linear regression, by using the log odds ratio as the dependent variable and an implied function of the test threshold (log odds TPR + log odds FPR) as the independent variable (36).

A limitation of this method is that the transformation requires the use of a correction factor when the 2 × 2 table for a study contains one or more zero values (that is, when reported sensitivity or specificity are perfect). An advantage of the method is that it provides a statistical test of the hypothesis that the variance in the group with malignant disease and the variance in the group without malignant disease are equal. The variances are not equal when the slope of the regression line is significantly different from zero. When the slope is not significantly different from zero, the resulting ROC curve is symmetrical and can be described by a common or summary log odds ratio. When this condition was met, we used the Mantel–Haenszel method for pooling odds ratios because this method does not require a correction factor (103). The two methods produced nearly identical results. Estimates of uncertainty were derived by using the Mantel–Haenszel method and were expressed in terms of 95% CIs.

In the 13 studies, the mean sensitivity and specificity were 93.9% and 85.8%, and the median sensitivity and specificity were 98.0% (interquartile range, 90% to 100%) and 83.3% (interquartile range, 80% to 100%). Appendix Figure 6 displays the summary ROC curve for FDG-PET. For our base-case estimates, we selected an operating point on the ROC curve that corresponded to the median specificity of FDG-PET in the 13 studies. We used the median specificity because we wanted to estimate where FDG-PET operates in current practice. Other approaches are possible, but not necessarily better. At this point on the ROC curve, sensitivity and specificity were 94.2% (95% CI, 89.1% to 97.0%) and 83.3%, respectively (22).

Grahic Jump Location
Appendix (Figure 6). The ROC curves illustrate the tradeoff between sensitivity and specificity as the threshold that defines a positive test result varies from most stringent to least stringent. The ROC curve for FDG-PET is shown with 95% CIs ( ). Black diamonds represent individual study estimates of sensitivity and specificity. Four studies reported perfect sensitivity and specificity ( ). The point on the summary ROC curve that corresponds to the median specificity reported in 13 studies of FDG-PET for pulmonary nodule diagnosis is shown ( ). At this point, sensitivity and specificity were 94.2% and 83.3%, respectively.
Summary receiver-operating characteristic (ROC) curve for positron emission tomography with 18-fluorodeoxyglucose (FDG-PET).dotted linesblack squareblack circle
Grahic Jump Location

Diagnostic Test Performance: CT. To determine the diagnostic accuracy of CT, we searched MEDLINE for English-language studies published before January 2000 by combining the MeSH term tomography, X-ray computed with a list of MeSH terms and keywords for lung cancer and pulmonary nodules. In addition, we scanned the reference lists of retrieved studies and review articles. We updated this literature search in November 2001. We identified 18 studies of CT for the diagnosis of focal pulmonary lesions (Appendix Tables 5, 6, and 7) (24, 3847, 104108). Interpreting the literature on CT is challenging because several different techniques have been described, including noncontrast CT, CT densitometry, high-resolution CT, and CT with nodule enhancement. In addition, many of the studies were performed more than 10 to 20 years ago.

Table Jump PlaceholderAppendix Table 5.  Studies of Computed Tomography Densitometry for Diagnosis of Pulmonary Nodules and Mass Lesions
Table Jump PlaceholderAppendix Table 6.  Studies of High-Resolution Computed Tomography for Pulmonary Nodule Diagnosis
Table Jump PlaceholderAppendix Table 7.  Studies of Dynamic Computed Tomography with Nodule Enhancement

We identified nine studies that evaluated noncontrast CT or CT densitometry (4, 3845), two studies that evaluated high-resolution CT (4647), and one study that evaluated both CT densitometry and high-resolution CT (2) (Appendix Tables 5 and 6). Computed tomography densitometry attempts to identify benign nodules on the basis of increased density characteristics that suggest occult calcification. In past years, a reference “phantom” was used to account for inter- and intrascanner differences in measuring nodule density, but this is no longer used in clinical practice. More recent studies of thin-section (high-resolution) CT have used different criteria to distinguish benign from malignant nodules (Appendix Table 6). One study used a set of criteria that proved to be sensitive but not specific for identifying malignancy (47), while another study used criteria that were more specific and less sensitive (46).

Studies of noncontrast CT and high-resolution CT enrolled between 35 and 720 participants, and most pulmonary lesions measured 3 cm or less in diameter (Appendix Tables 5 and 6). The prevalence of malignancy varied greatly, ranging between 15% and 78%. All but two studies reported the technical characteristics of the CT examinations in detail (43, 45). However, only three studies reported prospective enrollment of participants (2, 45, 47), and only one study reported that CT readers were blinded to the final diagnosis (47). In the other studies, these aspects of study design were not mentioned. All studies used acceptable reference tests to confirm a diagnosis of malignancy. Most studies required at least 18 months of clinical and radiographic follow-up for confirmation of benign disease without histologic proof, but four studies permitted shorter follow-up periods (3840, 45).

Several studies have examined functional or dynamic imaging with CT (3, 104108). In these studies, dynamic enhancement of lung nodules with iodinated contrast material is thought to identify increased vascularity that is strongly associated with malignancy (Appendix Table 7). In a recent multicenter study involving 356 participants, Swensen and colleagues (108) found that absence of enhancement strongly predicted a benign cause. The sensitivity and specificity of dynamic CT for identifying malignancy were 98% and 58%, respectively. Although this test is extremely promising, it is not widely used outside of research settings. In addition, it is used only when the nodule is radiographically indeterminate (for example, when thin-section CT shows no evidence of calcification). We chose to derive estimates for the sensitivity and specificity of CT from studies of CT densitometry and high-resolution CT, despite their limitations, because we aimed to examine the incremental benefits and costs of adding FDG-PET imaging to diagnostic strategies in current use. However, we also examined the potential role of dynamic CT with nodule enhancement in a sensitivity analysis.

In 12 studies of noncontrast CT and high-resolution CT, the mean sensitivity and specificity were 92.5% and 53.6% and the median sensitivity and specificity were 99.2% (interquartile range, 91% to 100%) and 55.8% (interquartile range, 45% to 60%). To derive base-case estimates of sensitivity and specificity, we used the same meta-analytic method that we used for studies of FDG-PET to construct a summary ROC curve for CT (Appendix Figure 7). Because the variance in the group with malignant disease and the variance in the group with benign disease were not equal, we used the method of Moses and colleagues (36) to obtain our base-case values and estimates of uncertainty. For our base-case estimates, we selected an operating point on the ROC curve that corresponded to the median specificity in the studies. At this point on the ROC curve, the sensitivity and specificity for identifying malignant nodules were 96.5% (CI, 80.9% to 99.5%) and 55.8%, respectively. We obtained similar results when we restricted our analysis to more recent studies of high-resolution CT. In this analysis, sensitivity and specificity were 96.5% (CI, 82.9% to 99.4%) and 58.5%, respectively.

Grahic Jump Location
Appendix (Figure 7). The ROC curve for CT is shown with 95% CIs ( ). Black diamonds represent individual study estimates of sensitivity and specificity. The black circle represents the point on the summary ROC curve that corresponds to the median specificity reported in 12 studies of noncontrast CT and high-resolution CT for pulmonary nodule diagnosis. At this point, sensitivity and specificity were 96.5% and 55.8%, respectively. Note that the summary ROC curve for CT is not symmetrical.
Summary receiver-operating characteristic (ROC) curve for computed tomography (CT).dotted lines
Grahic Jump Location

In the radiology literature, the term “indeterminate” designates a positive CT result and the term “benign” designates a negative CT result. Because “indeterminate” is a potentially confusing term, we use the phrase “possibly malignant” to describe indeterminate, or positive, results. Benign nodules typically have smooth borders and diffusely increased density, suggesting the presence of calcification. It is important to note that this definition implies that most pulmonary nodules will be indeterminate (or possibly malignant) by CT criteria.

Conditional Performance of FDG-PET and CT. In the base-case analysis, we assumed that the results of FDG-PET and CT were independent, or that the sensitivity and specificity of FDG-PET were the same regardless of the results of CT. This implies that the test results are not correlated. This is plausible because CT and FDG-PET characterize nodules by distinct mechanisms: FDG-PET identifies malignant nodules on the basis of increased glucose uptake and metabolism, while CT sometimes identifies benign lesions on the basis of density characteristics that suggest calcification. However, in previous work that examined imaging tests for mediastinal lymph node staging in patients with non–small-cell lung cancer, we found that the sensitivity and specificity of FDG-PET for identifying mediastinal metastases depended on the results of CT (78). More specifically, FDG-PET was 25% less sensitive and 16% more specific when CT revealed no lymph node enlargement relative to when CT detected enlarged nodes. Nevertheless, this relationship may not hold true for pulmonary nodule diagnosis because CT identifies lymph node metastases based on size criteria, rather than by density characteristics, as is true for pulmonary nodules.

Although data on the conditional performance of FDG-PET and CT for mediastinal staging have been published, we could not identify any studies that examined the conditional performance of FDG-PET and CT to diagnose pulmonary nodules. In fact, most studies of FDG-PET for pulmonary nodule diagnosis excluded participants with nodules that appeared benign on CT and limited enrollment to participants with CT findings that were possibly malignant. Thus, our base-case estimates of sensitivity and specificity apply directly when CT results are indeterminate and less well when CT results are benign. However, to explore the potential importance of the independent test assumption, we performed a sensitivity analysis in which we assumed that the conditional performance of FDG-PET and CT for pulmonary nodule diagnosis was similar to their conditional performance for mediastinal staging. We tested even more extreme values of the sensitivity and specificity of FDG-PET when CT results were benign.

Diagnostic Test Performance: Needle Biopsy. To determine the diagnostic accuracy of CT-guided needle biopsy and the risk for biopsy-related complications, we searched MEDLINE for English-language studies published before January 2000 by combining the MeSH term biopsy, needle with a list of MeSH terms and keywords for lung cancer and pulmonary nodules. We also scanned the reference lists of retrieved studies and review articles. We updated this literature search in November 2001. We included studies that limited enrollment to participants with pulmonary nodules that measured no more than 4 cm in diameter, as well as studies that reported results separately for participants with pulmonary nodules. We excluded studies that used means other than CT guidance for localizing nodules in more than 10% of participants. We identified nine studies that met these criteria (27, 4855) (Appendix Table 8). Another study was excluded because we strongly suspected that it presented previously reported data (109).

Table Jump PlaceholderAppendix Table 8.  Studies of Computed Tomography–Guided Needle Biopsy for Pulmonary Nodule Diagnosis

The studies enrolled between 22 and 220 participants with pulmonary nodules. Two studies included patients with pulmonary nodules that measured up to 4 cm in diameter (4849), while the remaining studies reported results for patients with nodules that measured less than 1.5 to 3 cm in diameter. The prevalence of malignancy was very high, ranging from 62% to 85%. All studies except one described in detail the technical aspects of how needle biopsies were performed (27). There was little heterogeneity across studies regarding biopsy methods. Specimens were obtained for cytologic examination in all of the studies, but core biopsies were also performed in two studies (5455). In four studies, benign diagnoses that were not verified by surgery or autopsy were confirmed by clinical follow-up of at least 15 to 24 months, and malignant diagnoses were established by surgery, autopsy, or, in the case of nodules due to metastasis, cytologic characteristics that were similar to those of the primary tumor (5052, 55). The remaining studies described less optimal reference standards. Three studies reported that consecutive patients were enrolled prospectively (51, 5455). In one study, 105 eligible patients were excluded from the analysis because the final diagnosis could not be established (50). None of the studies mentioned whether the pathologist who was responsible for interpreting the biopsy results was blinded to the final diagnosis. We used data from all nine studies to obtain base-case estimates of diagnostic accuracy and the risk for complications because results across studies were remarkably consistent (Appendix Table 8). Results of studies that used optimal reference tests were not significantly different from results of studies that used less optimal reference standards (P > 0.2).

We collected data on the percentage of biopsies that were diagnostic or nondiagnostic. Diagnostic biopsies were defined as those that yielded a specific malignant or benign diagnosis. Nondiagnostic biopsies included biopsy results that were atypical or suspicious for malignancy, as well as those that were described as “nonspecific benign.” The median percentage of diagnostic biopsies in the nine studies was 80% (interquartile range, 75% to 87%). The median percentage of diagnostic biopsies in nodules that proved to be malignant was 92% (interquartile range, 81% to 94%), and the median percentage of diagnostic biopsies in nodules that proved to be benign was 56% (interquartile range, 47% to 77%). Thus, nondiagnostic biopsies provided some diagnostic information because nondiagnostic results were much more likely to occur in benign lesions.

Because we thought it was important to account for nondiagnostic biopsies in our analysis, we modeled the diagnostic accuracy of transthoracic needle biopsy in two steps. The initial step determined whether the biopsy was diagnostic or nondiagnostic. If the biopsy was nondiagnostic, the model calculated the post-test probability that the nodule was benign or malignant, based on the conditional probability of these diagnoses after a nondiagnostic biopsy. If the biopsy yielded a specific benign or malignant diagnosis, a second step determined whether that diagnosis was correct (true-positive or true-negative) or not (false-positive or false-negative). Thus, estimates of sensitivity and specificity for needle biopsy were conditioned on obtaining a specific benign or malignant diagnosis.

We used the same meta-analytic method that we used for studies of FDG-PET to construct a summary ROC curve for CT-guided needle biopsy when a specific benign or malignant diagnosis was revealed. Because this ROC curve was symmetrical, we used the Mantel–Haenszel method to obtain base-case values and estimates of uncertainty. For our base-case estimates, we selected an operating point on the ROC curve that corresponded to the mean sensitivity of needle biopsy in the nine studies. At this point on the ROC curve, sensitivity and specificity were 96.3% (CI, 82.4% to 99.3%) and 98%, respectively. We used the mean sensitivity because the median sensitivity and specificity were 100%, which would have implied that specificity (or sensitivity) was 0%.

Our literature search yielded three studies of needle biopsy performed under fluoroscope guidance in which enrollment was limited to participants with pulmonary nodules (5658). In one study with a very high prevalence of malignancy (80%), the diagnostic yield of fluoroscope-guided needle biopsy was 84% for nodules that measured 2 to 4 cm in diameter (56). However, in two other studies with a lower prevalence of malignancy, the diagnostic yield was only 36% to 43% (5758). We assumed that the diagnostic yield of fluoroscope-guided needle biopsy was 10% lower than the yield of CT-guided biopsy, but that sensitivity and specificity were the same as for CT-guided biopsy, provided that the biopsy yielded a specific benign or malignant diagnosis.

The most important potential complication of transthoracic needle biopsy is pneumothorax. In the nine studies of CT-guided biopsy, the median prevalence of minor pneumothorax was 24% (interquartile range, 20% to 35%) and the median prevalence of major pneumothorax requiring chest tube drainage was 5% (interquartile range, 4% to 11%). We assumed that the probability of death from needle biopsy was 0.05% because no deaths were reported in more than 750 reported procedures.

Diagnostic Test Performance: Surgery. Because excisional biopsy is the gold standard for determining whether a lung nodule is benign or malignant, we assumed that video-assisted thoracoscopic surgery (VATS) was 100% sensitive and specific for pulmonary nodule diagnosis. This implies that our analysis is limited to patients with peripheral nodules that are amenable to VATS. We assumed that VATS would be converted to a full thoracotomy with lobectomy for resection of malignant nodules, as is standard practice in most centers.

To estimate the rates of fatal and nonfatal complications for surgical procedures, we searched MEDLINE for English-language studies published before January 2000 by combining the MeSH headings thoracotomy and thoracoscopy with a list of MeSH terms and keywords for pulmonary nodules and lung cancer. We identified 18 studies of VATS. We excluded 10 studies that described major lung resections (usually lobectomy) that were performed through VATS, because this procedure is performed only in specialized centers by highly experienced thoracic surgeons and because the risk for complications associated with these more extensive procedures is likely to be higher (110119). We also excluded three studies that enrolled substantial numbers of participants who underwent VATS for indications other than pulmonary nodule diagnosis, such as pulmonary fibrosis, empyema, and recurrent pneumothorax (120122) and two studies that did not report rates of specific complications (123124). The remaining four studies enrolled between 30 and 300 patients who underwent VATS wedge resection for pulmonary nodule diagnosis (59, 63, 65, 67). One study specified that consecutive patients were enrolled (59), two studies were retrospective case series, and one study involved voluntary reporting to a French registry (65). The mean age of participants ranged between 54 and 69 years.

In the four studies, the probability of serious, nonfatal complications ranged between 3% and 13%. Because sample size varied considerably across studies, we calculated the weighted mean probability of a nonfatal complication by first performing an arcsine transformation to stabilize the variance and then weighting studies by the inverse of the variance. We used the variance-stabilizing procedure because the variance of an observed proportion under 0.5 decreases as the magnitude of the proportion decreases, which might have biased our weighted estimate (125). The weighted mean complication rate was 6.5%. The probability of fatal complications ranged between 0% and 1.6%. The weighted mean fatal complication rate was 0.5%, which roughly corresponded to the reported frequency in the voluntary French registry.

To estimate the risk for fatal and nonfatal complications after thoracotomy, we identified two studies that examined postoperative outcomes in community settings. One observational study used discharge abstracts from California hospitals to identify 12 439 patients who underwent lung resection for cancer, 6569 of whom underwent lobectomy (62). In this study, the probability of in-hospital death from lobectomy was 4.2%, which we used as our base-case estimate. Lower postoperative mortality rates for lobectomy (1.6% to 3%) have been reported in tertiary care settings (6061, 64). Another study compared rates of serious, nonfatal complications in lung resections that were performed by thoracic surgeons and general surgeons in South Carolina nonfederal hospitals (66). In this study, the probability of serious, nonfatal postoperative complications after lobectomy performed by thoracic surgeons was 8.4%, which we used as our base-case estimate. The complication rate after lobectomy performed by general surgeons was 11%. In tertiary care settings, recent studies reported rates of nonfatal complications between 4.8% and 9%, but these reports did not distinguish between complications that occurred after lobectomy, pneumonectomy, or wedge resection (64, 126).

Costs. We converted all costs to 2001 U.S. dollars by using the gross domestic product deflator. Unlike the medical component of the Consumer Price Index, the gross domestic product deflator accounts for improvements in productivity. An index that does not account for technological change can seriously overestimate the true rate of inflation (127). In addition, the gross domestic product deflator considers all domestic economic activity, while the Consumer Price Index only includes spending by households.

Cost estimates for imaging tests, needle biopsy, and physician and pathologist services were based on Medicare reimbursement rates (7071), which are believed to be a good proxy for true opportunity costs (10). More specific information on hospital costs for surgical procedures was obtained from the Health Care Utilization Project (HCUP), which included data from 6.2 million discharges in 22 states in 1996 (72). Base-case estimates for costs are listed in Appendix Table 1.

Costs for noncontrast CT were $286 (Current Procedural Terminology [CPT] 71250), while costs for FDG-PET were $1980 (CPT G0125). To calculate costs for patients who were managed by watchful waiting, we added $34 for chest radiography (CPT 71020) and $38 for an outpatient office visit (CPT 99213) for each observation period. To calculate the cost of CT-guided needle biopsy ($583), we added procedure costs for percutaneous needle biopsy (CPT 32405), costs for CT guidance (CPT 76360), and pathology fees (CPT 88171). Costs for fluoroscope-guided needle biopsy were lower ($283). Costs for minor pneumothorax after needle biopsy were $72, including $34 for chest radiography and $38 for an outpatient office visit. Costs for chest tube drainage were $2566, including $2394 for 4 days of hospital care for iatrogenic pneumothorax (diagnosis-related group [DRG] 095) and $171 for physician services, including an initial evaluation (CPT 99221) and 3 days of subsequent hospital care (CPT 99231).

To calculate the cost of surgery for patients with malignant and benign nodules, we added hospital and professional fees. For patients with malignant nodules, we added the median cost-adjusted hospital charges for 3836 patients with lung cancer who underwent lobectomy from the HCUP database, and Medicare reimbursement for surgical lobectomy (CPT 32480) and pathologic interpretation (CPT 88309). Total surgical costs for patients with malignant nodules were $14 875. For patients with benign nodules, we added the median cost-adjusted hospital charges for 73 patients who underwent thoracoscopy for a benign diagnosis and Medicare reimbursement for surgical thoracoscopy with biopsy (CPT 32602) and pathologic interpretation (CPT 88307). Total surgical costs for patients with benign nodules were $11 625. To estimate costs for major surgical complications, we subtracted the mean Medicare reimbursement for lung biopsy without complications (DRG 077) from the mean reimbursement for lung biopsy with complications (DRG 076).

To estimate long-term health care costs for patients with surgically treated local lung cancer, surgically staged regional lung cancer, and distant-stage lung cancer, we used Medicare claims data for the years 1990 to 1993 (73). Average monthly costs for the first year after diagnosis are illustrated in Appendix Figure 8. To avoid counting costs for surgery twice, we subtracted surgical costs from the total costs for months 1 through 3. For years 2 to 5 after diagnosis, average monthly costs were $762, $934, and $1425 for patients with local, regional, and distant disease, respectively. To estimate health care costs for patients with benign nodules and for patients with malignant nodules more than 5 years after diagnosis, we used age-specific, average, annual health care expenditures from the U.S. Bureau of Labor Statistics (74).

Grahic Jump Location
Appendix (Figure 8). Average monthly Medicare expenditures for 1207 patients with surgically treated, T1N0M0 lung cancer ( ); 1954 patients with surgically staged, regional lung cancer ( ); and 10 835 patients with distant lung cancer ( ), from the linked Medicare claims–Surveillance, Epidemiology and End Results tumor registry database. SPN = solitary pulmonary nodule.
Health care costs for patients with malignant pulmonary nodules.black circlesblack squaresblack triangles
Grahic Jump Location

Utilities. To identify studies that measured utilities (preference-based weights for health states) in patients with undiagnosed pulmonary nodules, we searched MEDLINE for English-language studies published before January 2000 by combining the MeSH terms and keywords health status, health status indicators, quality of life, standard gamble, and time tradeoff with a list of MeSH terms and keywords for lung cancer and pulmonary nodules. We updated the literature search in November 2001. We also scanned a comprehensive library of 1000 health-related quality-of-life estimates (128). Because we could not identify studies that examined the preferences of patients with pulmonary nodules who were managed by watchful waiting, we assumed that the relative utility of the time spent in observation was normal, and therefore used age- and sex-specific values from the Beaver Dam Health Outcomes study (75). To account for the possibility that some patients might be uncomfortable not knowing whether a nodule was benign or malignant, we tested lower utility values in a sensitivity analysis. We believe it is likely that preferences will vary between patients, depending on factors such as the probability of malignancy, the duration of time spent in observation, and the patient's risk attitude.

We used age- and sex-specific values to estimate utilities for patients with local-stage malignant nodules because in two studies of outcomes after thoracotomy for lung cancer, health-related quality of life returned to baseline within 3 months of surgery (129130). To obtain utilities for patients with regional-stage, distant-stage, or recurrent lung cancer, we multiplied the age- and sex-specific value by a relative utility of 0.7. This value was taken from an economic evaluation that compared different chemotherapy regimens in patients with metastatic non–small-cell lung cancer, in which 14 oncology physicians and nurses reported that their utility for chemotherapy was 0.7 (76).

We adjusted quality of life for time spent in the hospital and time spent having diagnostic procedures. When possible, we used data on average length of hospital stay to make these adjustments (71).

Time Preference

We discounted all costs and health effects at an annual rate of 3% and tested discount rates between 0% and 5% in sensitivity analysis (10).

Analytical Plan

We stratified the hypothetical study cohort according to the pretest probability of malignancy and the risk for postoperative complications. Postoperative complications included both fatal and nonfatal complications after either VATS biopsy or lobectomy. We performed separate analyses for six combinations of pretest probability (low, intermediate, and high) and surgical risk (average and high). We selected probabilities of 26%, 55%, and 79% to represent patients with low, intermediate, and high pretest probability, respectively. We used the base-case values and upper ranges to represent patients at average and high risk for surgical complications, respectively. Because the probabilities of fatal and nonfatal surgical complications are likely to be highly correlated, we did not vary them independently. Instead, we assumed that they were completely correlated and bundled them together in groups.

Several quantitative models exist for estimating pretest probability (5, 21, 131132). Readers who wish to apply our results may benefit from using such a model to estimate pretest probability in individual patients. The best available model used multivariable logistic regression to identify six independent predictors of malignancy: age, smoking status, history of cancer, nodule diameter, spiculation, and upper lobe location (21). This clinical prediction rule was derived from a cohort of 629 patients with newly discovered pulmonary nodules measuring 4 to 20 mm in diameter who were evaluated at the Mayo Clinic between 1 January 1984 and 1 May 1986. The prevalence of malignancy was 26.4%. Two thirds of the observations were used to develop the model and the remaining observations were used as a validation set.

These equations can be used to calculate the pretest probability of malignancy:

where age is the patient's age in years, smoking = 1 if the patient is a current or former smoker (otherwise 0), cancer = 1 if the patient has a history of an extrathoracic cancer that was diagnosed more than 5 years ago (otherwise = 0), diameter is the diameter of the nodule in millimeters, spiculation = 1 if the edge of the nodule has spicules (otherwise = 0), and upper = 1 if the nodule is located in an upper lobe (otherwise = 0). To calculate pretest probability in individual patients, equations ((1)) and ((2)) can be entered into a personal digital assistant for use in the office or clinic. Unfortunately, it is not possible to calculate CIs for the predicted pretest probability based on the information provided because the developers of the model did not provide the covariance matrix for parameter estimates. Of note, age and history of cancer were not found to be independent predictors of malignancy in the validation set.

The ROC curve analysis indicated that the model had very good predictive accuracy in both the derivation set (area under the curve, 0.83) and the validation set (area under the curve, 0.80). An area under the curve of 1.0 indicates perfect accuracy in prediction, while an area under the curve of 0.5 indicates that the prediction is no more accurate than a coin flip. The goodness-of-fit statistics for the model derivation and validation sets indicated that the observed proportion of patients with malignancy did not differ from the predicted proportion (chi-square = 5.085, P = 0.75; and chi-square = 6.221, P = 0.62; respectively). A calibration curve indicated that the numbers of observed and predicted cases of cancer were the same across the entire range of the probability of malignancy.

Knowing that most physicians rely on clinical intuition and judgment to estimate pretest probability, the developers of the model used ROC analysis to compare estimates of the probability of cancer obtained by using the prediction equation with those made by four experienced clinicians at the Mayo Clinic, including a radiologist, pulmonologist, general internist, and thoracic surgeon (133). Areas under the ROC curves for each of the physicians were slightly greater than the area under the curve for the prediction equation, but these differences were not statistically significant. Relative to the prediction equation, physicians tended to overestimate pretest probability, particularly at lower values of predicted probability. Of interest, the four clinicians recommended a management strategy of observation in a very high percentage of cases (37%, 57%, 8%, and 35%, respectively) despite overestimating pretest probability. Predictions of pretest probability made by less experienced clinicians may not be as accurate as those made by these experts.

Two other quantitative models have been developed by using less rigorous methods and have not been validated (5, 131132). In addition, several groups have developed neural networks to assist in pulmonary nodule diagnosis. We are not aware of any published models that help to estimate the risk for complications for patients undergoing VATS biopsy or lobectomy.

Calculation of Incremental Cost-Effectiveness Ratios

We calculated incremental cost-effectiveness ratios by comparing strategies with the next most effective alternative, after eliminating strategies that were dominated. We considered all clinically plausible combinations of diagnostic tests because the cost-effectiveness of an intervention can be overestimated by comparing it with a suboptimal alternative. We eliminated strategies by strict dominance when an alternative strategy was both less costly and more effective. Strategies were eliminated by extended dominance when a more effective strategy had a more favorable incremental cost-effectiveness ratio (134). To calculate incremental cost-effectiveness ratios, we used the formula (C1C2)/(E1E2), where C1 and C2 represent total discounted lifetime health care costs associated with two different strategies for pulmonary nodule diagnosis in 2001 dollars, and E1 and E2 are outcomes associated with the two strategies, measured in discounted QALYs.

Sensitivity Analysis

We performed one-way and multiway sensitivity analyses to identify important model uncertainties. When possible, ranges for variables were based on reported or calculated 95% CIs for means and interquartile ranges for medians. For diagnostic accuracy, we tested several points on summary ROC curves and their 95% CIs. We used clinical judgment to determine ranges for utilities. For costs, we determined ranges by adding or subtracting 25% from the base case estimate. To determine ranges for transition probabilities in the Markov model, we added or subtracted 50% from the base-case estimate because these estimates were more uncertain.

We performed probabilistic sensitivity analysis by stratifying patients according to pretest probability and the risk for surgical complications. We assigned logit-normal distributions to all probabilities and costs for all diagnostic test variables by using the method of Doubilet and colleagues (77) and then performed 10 000 simulations by randomly sampling values from these distributions. Logit-normal distributions have two main advantages: The full distribution can be approximated from a specified mean (base-case value) and an upper or lower bound, and the mean of the distribution function equals the base-case value. Thus, the mean value of the expected utility of each strategy in probabilistic analysis will ultimately converge to its expected value in the corresponding deterministic model (77). To account for correlation between sensitivity and specificity, we sampled values from a logit-normal distribution that we assigned to describe the specificity of a diagnostic test and used a formula to identify the corresponding sensitivity on the summary ROC curve.

For the purpose of the probabilistic analyses, we assumed that the hypothetical study cohort was infinitely large. Thus, while the probabilistic analysis accounted for second-order uncertainty (uncertainty in the values of model parameters), it did not account for first-order uncertainty (uncertainty related to sampling variability that would occur in a finite-sized population). This type of uncertainty can affect the magnitude of cost-effectiveness ratios when sampling from populations that are finite in size (135). While this source of uncertainty is not likely to be important for policy decisions made at a national level, it may be important for decisions that are made at the hospital or departmental level. In these settings, the cost-effectiveness ratios might differ because of sampling variation.

To account for uncertainty in the natural history of lung cancer, we performed a separate set of probabilistic analyses stratified by pretest