Steve Goodacre, MB, ChB, FFAEM, MSc, PhD; Alex J. Sutton, BSc, MSc, PhD; Fiona C. Sampson, BA, MSc
Disclaimer: The views and opinions expressed herein are those of the authors and do not necessarily reflect those of the United Kingdom Department of Health.
Acknowledgments: The authors thank Vanja Dukic for her assistance with the meta-regression analysis and Angie Ryan for her help with the literature searches.
Grant Support: The United Kingdom Health Technology Assessment R&D Programme funded this project (reference no. 02/03/01).
Potential Financial Conflicts of Interest: None disclosed.
Requests for Single Reprints: Steve Goodacre, MB, ChB, FFAEM, MSc, PhD, Medical Care Research Unit, University of Sheffield, Regent Court, 30 Regent Street, Sheffield, S1 4DA, United Kingdom; e-mail, email@example.com.
Current Author Addresses: Dr. Goodacre and Ms. Sampson: Medical Care Research Unit, University of Sheffield, Regent Court, 30 Regent Street, Sheffield, S1 4DA, United Kingdom.
Dr. Sutton: Department of Health Sciences, University of Leicester, 22-28 Princess Road West, Leicester, LE1 6TP, United Kingdom.
Goodacre S., Sutton A., Sampson F.; Meta-Analysis: The Value of Clinical Assessment in the Diagnosis of Deep Venous Thrombosis. Ann Intern Med. 2005;143:129-139. doi: 10.7326/0003-4819-143-2-200507190-00012
Download citation file:
Published: Ann Intern Med. 2005;143(2):129-139.
Appendix: Details of the Summary ROC Meta-Regression Exploring the Influence of Study-Level Covariates on Diagnostic Accuracy of the Wells Score
Clinical assessment of suspected deep venous thrombosis (DVT) should be based on systematically evaluated evidence.
To determine whether clinical findings, risk scores, and physicians' empirical judgments affect the likelihood of detecting DVT on definitive testing.
MEDLINE, EMBASE, CINAHL, Web of Science, Cochrane Database of Systematic Reviews, Cochrane Controlled Trials Register, Database of Reviews of Effectiveness, ACP Journal Club, and citation lists (1966 to January 2005).
Cohort studies published in English, French, Spanish, or Italian that compared clinical assessment with a reference standard.
The authors extracted standardized data, including setting, exclusions, population characteristics, reference standard, and results, and assessed quality against validated criteria.
The authors combined data by using random-effects meta-analysis and, if appropriate, used meta-regression to identify covariates that predicted diagnostic accuracy. Only malignancy (likelihood ratio [LR], 2.71), previous DVT (LR, 2.25), recent immobilization (LR, 1.98), difference in calf diameter (LR, 1.80), and recent surgery (LR, 1.76) were useful for ruling in DVT, while only absence of calf swelling (LR, 0.67) or difference in calf diameter (LR, 0.57) was useful for ruling out DVT. The Wells clinical score was more valuable than the individual characteristics; it stratified patients into groups with high (LR, 5.2), intermediate, and low (LR, 0.25) probability of DVT. The Wells score seemed able to stratify patients by risk only for proximal DVT, and it performed better in cohorts that were younger or excluded patients with previous thromboembolism.
Pooled estimates were subject to substantial heterogeneity. This may limit extrapolation between observers and settings. Only published studies were included, so findings may be subject to publication bias.
Individual clinical features are of limited value in diagnosing DVT. Overall assessment of clinical probability by using the Wells score is more useful.
Which clinical findings most affect the probability of deep venous thrombosis (DVT)?
This systematic review of 54 cohort studies found that previous DVT and malignant disease modestly increased the probability of DVT (positive likelihood ratios, 2.25 and 2.71), followed by recent immobilization, difference in calf diameter, and recent surgery (positive likelihood ratios, 1.75 to 1.98). Wells scores, based on 9 items, stratified patients' probability of proximal DVT much better than did individual findings, particularly in younger patients and in patients without previous DVT.
Estimating the probability of DVT is best accomplished by assessing and scoring multiple findings.
Suspected deep venous thrombosis (DVT) is a common cause of emergency hospitalization (1). Many technologies can be used to diagnose DVT, varying from the cheap and simple but inaccurate (d-dimer testing) to the accurate but expensive and technically challenging (venography). Clinical assessment can be used to select patients for an appropriate diagnostic test. This may involve using individual clinical features to estimate the likelihood of DVT or using standardized clinical assessment to derive a pretest probability based on a clinical score. The Wells clinical score is a widely used instrument that categorizes patients into high, intermediate, and low risk for DVT (2).
It is increasingly being recognized that clinical diagnosis should be based on systematic evaluation of the scientific evidence (3). Investigations of the clinical diagnosis of DVT have been published over more than 4 decades (4, 5). We aimed to systematically review the literature to determine whether physicians' empirical judgments, clinical findings, and risk scores affect the likelihood of detecting thrombosis with venography, ultrasonography, or plethysmography in adults with suspected DVT.
We sought to identify all diagnostic cohort studies of patients with suspected DVT that recorded physicians' empirical judgments, clinical findings, or a clinical score and then undertook diagnostic testing for DVT. We searched the following electronic sources (1966 to January 2005): MEDLINE, EMBASE, CINAHL, Web of Science, Cochrane Database of Systematic Reviews, Cochrane Controlled Trials Register, Database of Reviews of Effectiveness, and ACP Journal Club. We scanned the bibliographies of all retrieved articles for potentially relevant articles that were not identified by the original search.
Two reviewers screened the titles and abstracts of all articles identified by the search strategy and independently determined whether the article could potentially be reporting a cohort study that measured the diagnostic performance of physicians' empirical judgments, clinical findings, or a clinical score compared with a reference standard test (venography, ultrasonography, or plethysmography). Full copies of all selected articles were retrieved. The same 2 reviewers then independently reviewed the full articles to determine whether they did meet the criteria outlined earlier. A κ score was calculated for agreement between the 2 reviewers at both stages of the selection process, and disagreements were resolved by discussion.
We specifically excluded the following: studies that measured the risk for developing DVT after recording clinical characteristics rather than measuring the probability that DVT was present at the time of assessment; case-control studies, in which patients were selected on the basis or having or not having DVT; and studies with fewer than 10 patients. We included studies published in English, French, Spanish, or Italian and excluded studies published in other languages. If a study was published as an abstract, we contacted the authors to ask for full details of the data. If we could not extract the necessary data from the published report, we contacted the authors for clarification, provided the study was published in the past 10 years.
We assessed study quality by determining whether the reference standard was applied independently of the findings of the clinical assessment, whether observers blinded to the reference standard result undertook clinical assessment, and whether observers blinded to the results of clinical assessment interpreted the reference standard. Empirical evidence suggests that failure to meet these criteria is associated with overestimation of diagnostic accuracy (6).
We extracted the following data from each article: the setting for recruitment; groups excluded from the study; population characteristics (mean or median age, sex balance); prevalence of DVT (proximal/above knee and distal/below knee); whether clinical data were extracted from clinical notes or collected on a standardized form by the clinician; the person who recorded the clinical data; the reference standard used; the number of true-positive results (proximal and distal DVT), true-negative results, and false-positive and false-negative results (proximal and distal DVT) for each clinical feature; and the number of case-patients with and without DVT for each clinical score (either as reported or calculated from the reported data).
We used a random-effects model, as implemented by MetaDiSc statistical software (7), to estimate pooled likelihood ratios for the presence and absence of each clinical feature (8). A chi-square test for heterogeneity is reported for each clinical feature. Although considerable heterogeneity existed for a proportion of the outcomes, we did not undertake meta-regression of individual clinical features because of the relatively small numbers of studies available for most meta-analyses (9).
Clinical scores are usually reported as the prevalence of DVT in each risk category. This is similar to reporting the predictive values of a diagnostic test and will vary according to the population prevalence of DVT. We therefore analyzed the data by examining how the scores categorized patients with and without DVT. This approach is similar to analyzing and reporting sensitivity and specificity.
Meta-analyses of the Wells score and empirical estimates were necessarily more complex since individuals were categorized into 3 groups (high, intermediate, and low risk for DVT). We carried out ordinal logistic regression, including a random study effect coefficient, using the software WinBUGS (MRC Biostatistics Unit, Cambridge, United Kingdom) (10) to estimate the probability of being categorized as having high, intermediate, and low risk; we used separate models for persons with any DVT, those with proximal DVT, those with distal DVT, and those without DVT. From this analysis, we could estimate sensitivity and specificity for 2 possible decision thresholds for all cases of DVT: high versus intermediate and low, and high and intermediate versus low. We estimated pooled likelihood ratios for high and low categories using the random-effects model implemented by MetaDiSc statistical software.
We used meta-regression to explore the influence of study-level covariates on diagnostic performance of the Wells score and potentially explain a proportion of the between-study heterogeneity. To do this, we extended the ordinal regression model to fit a fixed-effects summary receiver-operating characteristic (ROC) curve through the data and to explore the influence of adding covariates into the model on the shape of the curve (11). Results are reported along with an indication of which covariates were statistically significant at the 5% level. The NLMIXED procedure in SAS (SAS Institute, Inc., Cary, North Carolina) (12) was used for the analysis. The Appendix further describes the details of this analysis.
The United Kingdom Health Technology Assessment R&D Programme funded this project (reference no. 02/03/01). The funding source had no role in the design, conduct, and reporting of the study or in the decision to submit the report for publication.
Figure 1 outlines the flow of articles considered for the review. The 51 articles included in the meta-analysis reported data from 54 cohorts: 29 cohorts evaluated individual clinical features, 25 cohorts evaluated the Wells clinical score, 7 cohorts developed or evaluated other scores, and 8 cohorts evaluated physicians' empirical judgments. Appendix Table 1 describes the characteristics of the cohorts. In most studies, the reference standard was applied independently of the results of clinical assessment. The exceptions were studies that augmented an ultrasonography reference standard with further testing based on clinical probability. Reporting of blinding of clinical assessment and the reference standard was generally poor; in most studies, it was unclear whether assessments were blinded or not.
*κ = 0.85. †κ = 0.86.
We undertook meta-analyses of 13 different clinical features, 1 clinical score (the Wells score), and 2 approaches to physicians' empirical judgments. Table 1 summarizes the characteristics of the studies included in each meta-analysis. Studies included in meta-analyses of individual clinical features were more likely to use venography as the reference standard, whereas studies included in the meta-analysis of the Wells score were more likely to use ultrasonography. More studies in the meta-analysis of the Wells score used a reference standard that depended on the results of clinical assessment.
Each study of clinical features reported only a selection of the 13 features evaluated. Table 2 shows which cohorts studied which clinical features and outlines the likelihood ratios from these studies. In most cases, when a particular feature was not reported it was unclear whether it had been examined but not reported or simply not examined. A few studies reported clinical features in such a way that the relevant data could not be extracted (for example, by combining features such as edema and swelling). We recorded these as “unable to extract relevant data” in Table 2.
Figure 2 shows the results of meta-analysis of individual clinical features. If a likelihood ratio greater than 2 is considered useful for ruling in DVT and a ratio less than 0.5 is useful for ruling out DVT, then only a history of DVT and malignancy are useful for ruling in DVT (based on the point estimates) and no individual feature is useful for ruling out DVT. Recent immobilization, recent surgery, or a difference in calf diameter is of borderline value in ruling in DVT, while absence of calf swelling or a difference in calf diameter is of borderline value in ruling out DVT.
Three studies of the Wells score used dichotomized versions of the score; we did not include them in this analysis (13-15). Table 3 shows the results of meta-analysis of the remaining 22 studies of the Wells score. A high Wells score markedly increases the probability of DVT (likelihood ratio, 5.2), whereas a low Wells score markedly reduces the probability of DVT (likelihood ratio, 0.25). Figure 3 shows how the Wells score performs as a function of the pretest probability of DVT (the population prevalence of DVT), assuming that the Wells scores are applied in a Bayesian manner. Most populations with suspected DVT have a prevalence of 10% to 40%. A population with a DVT prevalence of 24% (the median prevalence for studies included in the meta-analysis) would be categorized as follows: Twenty-three percent would have a high Wells score with a DVT prevalence of 62%, 39% would have an intermediate score with a prevalence of 21%, and 38% would have a low score with a prevalence of 7%. Six studies reported proximal and distal DVT separately (2),16-20), showing that risk stratification was more accurate for proximal DVT than distal DVT in those studies.
Appendix Table 1.
Heterogeneity is difficult to measure when the test under evaluation has 3 diagnostic categories, but it can be demonstrated graphically. Figure 4 shows the sensitivity and specificity of the Wells score for diagnosing all cases of DVT in each study.
Two results are plotted from each study of the Wells score on the receiver-operating characteristic plane. Circles represent use of a high versus intermediate and low decision threshold (that is, only persons categorized as at high risk receive a diagnosis of deep venous thrombosis). Triangles represent a high and intermediate versus low decision threshold (that is, persons categorized as at high or intermediate risk receive a diagnosis of deep venous thrombosis). The point estimates and 95% CIs for pooled sensitivity and specificity for the 2 thresholds are also plotted as boxes.
By fitting a summary ROC curve to the data in Figure 4 and examining the influence of study-level covariates using meta-regression, we identified younger mean patient age (P = 0.011) and exclusion of persons with a history of thromboembolism (P = 0.020) as potentially important covariates associated with improved diagnostic performance. Assessment of the reference standard that was blinded to the results of clinical assessment almost showed a significant association with improved diagnostic performance (P = 0.056). Covariates examined but not statistically significant were setting for recruitment (for all settings, P > 0.3); exclusion of patients with suspicion of pulmonary embolus (P > 0.2); exclusion of pregnant women (P > 0.2); percentage of male participants (P > 0.2); whether the study used the original or modified Wells criteria (P > 0.2); prevalence of DVT in the cohort (P > 0.2); whether the investigators used a standardized form to collect data (P = 0.15); whether a physician assessed the patient (P > 0.2); use of single ultrasonography as a reference standard, compared with venography or initial ultrasonography plus follow-up or repeated scanning (P > 0.2); use of an independent reference standard (P > 0.2); and performance of clinical assessment that was blinded to the reference standard (P > 0.2).
Meta-analysis was repeated, stratified by the 2 significant covariates. The likelihood ratios for high and low Wells score, respectively, were 9.6 (95% CI, 7.0 to 13.3; P = 0.043 for heterogeneity) and 0.17 (CI, 0.13 to 0.23; P > 0.2) for studies that excluded patients with previous thromboembolism and 3.9 (CI, 3.0 to 5.2; P < 0.001) and 0.28 (CI, 0.24 to 0.33; P > 0.2) for studies that did not appear to exclude these patients. The same variables were 4.0 (CI, 2.5 to 6.5; P < 0.001) and 0.30 (CI, 0.24 to 0.39; P > 0.2) for studies with a mean participant age older than 60 years, 5.3 (CI, 3.9 to 7.2; P = 0.025) and 0.24 (CI, 0.20 to 0.29; P > 0.2) for studies with a mean participant age of 60 years or younger, and 6.8 (CI, 3.4 to 13.7; P < 0.001) and 0.23 (CI, 0.14 to 0.37; P = 0.056) for studies that did not report data on age. Hence, the Wells score seems to perform better in populations that exclude patients with previous thromboembolism and to perform worse in older populations.
Seven cohorts were used to develop or evaluate a variety of other clinical scores (17),21-25). Kahn (25), Oudega (21), and Constans (17),22) and their colleagues each used multivariate analyses to develop scores using 4 to 9 items to categorize patients as having high, moderate, or low risk for DVT. Each score stratified patients appropriately in their derivation cohorts, but none has been widely validated outside their initial setting; as a result, meta-analysis was not possible. Kiil and Moller (24) and Wojciechowski (23) used structured clinical assessment based on 5 or 8 items to identify the presence or absence of DVT. Neither reliably identified DVT.
The 8 studies of physicians' empirical judgments used different approaches to stratification: Four categorized patients into groups with low, intermediate, or high risk for DVT (26-29), and 4 dichotomized assessment as low or high risk or dichotomized DVT as present or absent (13),30-32). Table 3 shows meta-analysis of the 3-category assessments alongside the comparable estimates for the Wells score. Likelihood ratios for high and low empirical estimates are similar to those for the Wells score. These estimates are based on only 4 studies and have relatively wide CIs. Evidence also suggests significant heterogeneity, despite the limited number of studies. Meta-analysis of dichotomized physician judgments produced the following pooled estimates: sensitivity, 86.6% (CI, 80.7% to 91.2%; P = 0.152 for heterogeneity); specificity, 69.3% (CI, 64.4% to 73.9%; P < 0.001); positive likelihood ratio, 6.2 (CI, 1.0 to 40.0; P < 0.001); and negative likelihood ratio, 0.18 (CI, 0.13 to 0.26; P > 0.2). Again, the CIs are wide, and there is evidence of significant heterogeneity despite the small number of studies.
This meta-analysis has shown that individual clinical features, used in isolation, have limited value in diagnosing DVT. A history of DVT, known malignancy, recent immobilization, or recent surgery slightly increase the likelihood of DVT, as does the physical examination finding of difference in calf diameter. Absence of a history of calf swelling or no difference in calf diameter on examination slightly reduces the likelihood of DVT.
Clinical probability estimates, whether structured and based on specific criteria (Wells) or unstructured and based on empirical assessment, appear to provide more useful information. A high Wells score substantially increases the likelihood of DVT and indicates that definitive diagnostic testing is appropriate. A low Wells score substantially decreases the likelihood of DVT and indicates that a simple noninvasive test, such as the d-dimer assay, may be sufficient to rule out DVT. The Wells score has advantages over empirical assessment because it is standardized and reproducible and its estimated performance is based on more studies.
Six studies reported proximal and distal DVT separately for the Wells score (2),16-20). Meta-analysis of these studies showed that the Wells score accurately categorizes proximal DVT but not distal DVT. This raises some doubt about repeating ultrasonography on the basis of clinical risk. Repeated ultrasonography is intended to detect propagating distal DVT, yet our analysis suggests that patients with distal DVT are more likely to be stratified into the intermediate-risk group than into the high-risk group.
We attempted to identify potential causes of the heterogeneity seen in the results of studies of the Wells score. Meta-regression showed that the Wells score performed better in cohorts that excluded patients with previous thromboembolism and performed worse in cohorts with an older mean age. The former finding may be confounded by the fact that the researchers who developed the Wells score were among the authors of 4 of the 7 studies that excluded patients with previous thromboembolism. We would expect a clinical score to perform better in the setting in which it was developed. Nevertheless, it seems reasonable to conclude that using the Wells score in patients with previous thromboembolism is inappropriate.
Some limitations in our analysis need to be appreciated. Even after identifying many significant covariates, we could not explain the substantial heterogeneity in estimates of the diagnostic performance of the Wells score. The most likely causes for this heterogeneity are unreported differences in the study sample or the observers who assigned the Wells scores. Care should therefore be taken in extrapolating our findings between settings and observers. In addition, we did not search for unpublished data. It is unlikely that the poor diagnostic performance of individual clinical features could be attributed to publication bias, but unpublished data might further contribute to the heterogeneity observed in studies of the Wells score.
Further research is required to determine how the Wells score performs when used in different settings and by different observers. Comparison with empirical scoring by a variety of different observers would also be valuable. Recent studies (17),22) have identified new scores that may have performance similar to that of the Wells score but are simpler to assign. These scores require further evaluation. In the meantime, use of the Wells score appears to be the most valuable element of clinical assessment for the patient with suspected DVT.
Full details of all the statistical analyses and Forest plots of all the analyses are available from the authors.
The model is described in detail elsewhere (83), but a brief account is given below. Categorization using the Wells criteria can be considered a scale with 3 ordered categories and 2 cut-points at the category boundaries. It is assumed that the response for the i th individual in the k th study (Yik) arises from an underlying latent continuous variable, which is discretized at thresholds [thgr ]0k(= −∞) < [thgr ]1k< [thgr ]2k< [thgr ]3k(= ∞). Define Dik to indicate the true disease status of the i th patient in the k th study. An ordinal regression type model can then be constructed. Let [THgr ] = ([thgr ], α, β, γ, δ) be a vector containing thresholds from every study ([thgr ]), scale (α), and location (β) parameters, and regression scale coefficients (γ). The following probit model is constructed including a study covariate x:
The results from fitting this model to each covariate individually (that is, 14 separate models are fitted in all) are presented in Appendix Table 2.
Appendix Table 2.
The P value for the significance of each regression coefficient relating to the study-level covariates provides an omnibus test of influence of each coefficient across thresholds. In the main paper, pooled likelihood ratios are calculated for each of the 2 recognized Wells thresholds for different values of the study-level covariates that are significant at the 5% level in the above analysis (that is, age and history of DVT).
Note: If desired, a summary ROC curve can be obtained by plotting the pairs
at each threshold [thgr ] for each model reported above:
The In the Clinic® slide sets are owned and copyrighted by the American College of Physicians (ACP). All text, graphics, trademarks, and other intellectual property incorporated into the slide sets remain the sole and exclusive property of the ACP. The slide sets may be used only by the person who downloads or purchases them and only for the purpose of presenting them during not-for-profit educational activities. Users may incorporate the entire slide set or selected individual slides into their own teaching presentations but may not alter the content of the slides in any way or remove the ACP copyright notice. Users may make print copies for use as hand-outs for the audience the user is personally addressing but may not otherwise reproduce or distribute the slides by any means or media, including but not limited to sending them as e-mail attachments, posting them on Internet or Intranet sites, publishing them in meeting proceedings, or making them available for sale or distribution in any unauthorized form, without the express written permission of the ACP. Unauthorized use of the In the Clinic slide sets will constitute copyright infringement.
Results provided by:
Copyright © 2016 American College of Physicians. All Rights Reserved.
Print ISSN: 0003-4819 | Online ISSN: 1539-3704
Conditions of Use
This PDF is available to Subscribers Only