Steven M. Asch, MD, MPH; Elizabeth A. McGlynn, PhD; Mary M. Hogan, PhD; Rodney A. Hayward, MD; Paul Shekelle, MD, MPH; Lisa Rubenstein, MD; Joan Keesey, BA; John Adams, PhD; Eve A. Kerr, MD, MPH
Acknowledgments: The authors acknowledge the invaluable contributions of Landon Donsbach, Alison DeCristofaro, Jennifer Hicks Curtis, Liisa Hiatt, Eureka Moline, Jill Baker, Peggy Wallace, Karen Ricci, Anne Griffin, Rena Hasenfeld Garland, and the Veterans Affairs site investigators.
Grant Support: This study was funded by a Veterans Affairs Health Services Research and Development grant. Drs. Asch and Kerr were funded by Veterans Affairs Health Services Research and Development Career Development Awards. The initial development of the indicators was funded by the Centers for Medicare & Medicaid Services and the Agency for Healthcare Research and Quality (grant no. 5U18HS09463-03). The California HealthCare Foundation (grant no. 98-5005) funded development of the chart abstraction tool. The Robert Wood Johnson Foundation (grant no. 0-0192) funded work with the national sample (design, sampling, and conduct).
Potential Financial Conflicts of Interest: None disclosed.
Requests for Single Reprints: Steven M. Asch, MD, MPH, West Los Angeles Veterans Affairs Medical Center, Mail Code 111G, 11301 Wilshire Boulevard, Los Angeles, CA 90073; e-mail, Steven.Asch@med.va.gov.
Current Author Addresses: Dr. Asch: West Los Angeles Veterans Affairs Medical Center, Mail Code 111G, 11301 Wilshire Boulevard, Los Angeles, CA 90073
Drs. McGlynn and Shekelle: RAND, 1776 Main Street m4s, Santa Monica, CA 90407.
Drs. Hogan, Hayward, and Kerr: Ann Arbor Veterans Affairs Center for Practice Management and Outcomes Research, PO Box 130170, Ann Arbor, MI 48113.
Drs. Rubenstein and Adams and Ms. Keesey: RAND, 1776 Main Street m3s, Santa Monica, CA 90407.
Asch SM, McGlynn EA, Hogan MM, Hayward RA, Shekelle P, Rubenstein L, et al. Comparison of Quality of Care for Patients in the Veterans Health Administration and Patients in a National Sample. Ann Intern Med. 2004;141:938-945. doi: 10.7326/0003-4819-141-12-200412210-00010
Download citation file:
Published: Ann Intern Med. 2004;141(12):938-945.
As methods for measuring the quality of medical care have matured, widespread quality problems have become increasingly evident (1, 2). The solution to these problems is much less obvious, however, particularly with regard to large delivery systems. Many observers have suggested that improved information systems, systematic performance monitoring, and coordination of care are necessary to enhance the quality of medical care (3). Although the use of integrated information systems (including electronic medical records) and performance indicators has become more common throughout the U.S. health care system, most providers are not part of a larger integrated delivery system and continue to rely on traditional information systems (4).
An exception is the Veterans Health Administration (VHA). As the largest delivery system in the United States, the VHA has been recognized as a leader in developing a more coordinated system of care. Beginning in the early 1990s, VHA leadership instituted both a sophisticated electronic medical record system and a quality measurement approach that holds regional managers accountable for several processes in preventive care and in the management of common chronic conditions (5, 6). Other changes include a system-wide commitment to quality improvement principles and a partnership between researchers and managers for quality improvement (7).
As Jha and colleagues (8) have shown, since these changes have been implemented, VHA performance has outpaced that of Medicare in the specific areas targeted. Nevertheless, whether this improvement has extended beyond the relatively narrow scope of the performance measures is unknown. Beyond that study, the data comparing VHA care with other systems of care are sparse and mixed. For example, patients hospitalized at VHA hospitals were more likely than Medicare patients to receive angiotensin-converting enzyme inhibitors and thrombolysis after myocardial infarction (9). On the other hand, VHA patients were less likely to receive angiography when indicated and had higher mortality rates after coronary artery bypass grafting than patients in community hospitals (10, 11). Kerr and colleagues found that care for diabetes was better in almost every dimension in the VHA system than in commercial managed care (12). More extensive comparisons, especially of outpatient care, are lacking. To address these issues, a more comprehensive assessment of quality is needed.
Using a broad measure of quality of care that is based on medical record review and was developed outside the VHA, we compared the quality of outpatient and inpatient care among 2 samples: 1) a national sample of patients drawn from 12 communities and 2) VHA patients from 26 facilities in 12 health care systems located in the southwestern and midwestern United States (13). We analyzed performance in the years after the institution of routine performance measurement and the electronic medical record. Using the extensive set of quality indicators included in the measurement system, we compared the overall quality of care delivered in the VHA system and in the United States, as well as the quality of acute, chronic, and preventive care across 26 conditions. In addition, we evaluated whether VHA performance was better in the specific areas targeted by the VHA quality management system.
For this study, we used quality indicators from RAND's Quality Assessment Tools system, which is described in more detail elsewhere (14-17). The indicators included in the Quality Assessment Tools system are process quality measures, are more readily actionable than outcomes measures, require less risk adjustment, and follow the structure of national guidelines (18, 19). After reviewing established national guidelines and the medical literature, we chose a subset of quality indicators from the Quality Assessment Tools system that represented the spectrum of outpatient and inpatient care (that is, screening, diagnosis, treatment, and follow-up) for acute and chronic conditions and preventive care processes representing the leading causes of morbidity, death, and health care use among older male patients. The Appendix Table lists the full indicator set, which was determined by four 9-member, multispecialty expert panels. These panels assessed the validity of the proposed indicators using the RAND/University of California, Los Angeles–modified Delphi method. The experts rated the indicators on a 9-point scale (1 = not valid; 9 = very valid), and we accepted indicators that had a median validity score of 7 or higher. This method of selecting indicators is reliable and has been shown to have content, construct, and predictive validity (20-23). Of the 439 indicators in the Quality Assessment Tools system, we included 348 indicators across 26 conditions in our study and excluded 91 indicators that were unrelated to the target population (for example, those related to prenatal care and cesarean sections). Of the 348 indicators, 21 were indicators of overuse (for example, patients with moderate to severe asthma should not receive β-blocker medications) and 327 were indicators of underuse (for example, patients who have been hospitalized for heart failure should have follow-up contact within 4 weeks of discharge).
Two physicians independently classified each indicator according to the type of care delivered; the function of the indicated care (screening, diagnosis, treatment, and follow-up); and whether the indicator was supported by a randomized, controlled trial, another type of controlled trial, or other evidence. Type of care was classified as acute (for example, in patients presenting with dysuria, presence or absence of fever and flank pain should be elicited), chronic (for example, patients with type 2 diabetes mellitus in whom dietary therapy has failed should receive oral hypoglycemic therapy), or preventive (for example, all patients should be screened for problem drinking). In addition, we further classified the indicators into 3 mutually exclusive categories according to whether they corresponded to the VHA performance indicators that were in use in fiscal year 1999. Twenty-six indicators closely matched the VHA indicators, 152 involved conditions that were targeted by the VHA indicators but were not among the 26 matches, and 170 did not match the VHA measures or conditions. We performed a similar process to produce a list of 15 indicators that matched contemporaneous Health Plan Employer Data and Information Set (HEDIS) performance measures (24). Table 1 shows the conditions targeted by the indicators, and Table 2 gives an example indicator for each of the conditions or types of care for which condition- or type-specific comparisons were possible.
Patients were drawn from 2 ongoing quality-of-care studies: a study of VHA patients and a random sample of adults from 12 communities (13). The VHA patients were drawn from 26 clinical sites in 12 health care systems located in 2 Veterans Integrated Service Networks in the midwestern and southwestern United States. These networks closely match the overall Veterans Affairs system with regard to medical record review and survey-based quality measures (25, 26). We selected patients who had had at least 2 outpatient visits in each of the 2 years between 1 October 1997 and 30 September 1999. A total of 106 576 patients met these criteria. We randomly sampled 689, oversampling for chronic obstructive pulmonary disease (COPD), hypertension, and diabetes, and were able to locate records for 664 patients (a record location rate of 96%). Because of resource constraints, we reviewed a random subset of 621 of these records. Since this sample contained only 20 women and 4 patients younger than 35 years of age, we further restricted the sample to men older than 35 years of age. Thus, we included 596 VHA patients in the analysis. All of these patients had complete medical records.
The methods we used to obtain the national sample have been described elsewhere (13) and are summarized here. As part of a nationwide study, residents of 12 large metropolitan areas (Boston, Massachusetts; Cleveland, Ohio; Greenville, South Carolina; Indianapolis, Indiana; Lansing, Michigan; Little Rock, Arkansas; Miami, Florida; Newark, New Jersey; Orange County, California; Phoenix, Arizona; Seattle, Washington; and Syracuse, New York) were contacted by using random-digit dialing and were asked to complete a telephone survey (27). To ensure comparability with the VHA sample, we included only men older than 35 years of age. Between October 1998 and August 2000, we telephoned 4086 of these participants and asked for permission to obtain copies of their medical records from all providers (both individual and institutional) that they had visited within the past 2 years. We received verbal consent from 3138 participants (77% of those contacted by telephone). We mailed consent forms and received written permission from 2351 participants (75% of those who had given verbal permission). We received at least 1 medical record for 2075 participants (88% of those who had returned consent forms). We excluded participants who had not had at least 2 medical visits in the past 2 years to further ensure comparability with the VHA sample. Thus, our final national sample included 992 persons. The rolling abstraction period (October 1996 to August 2000) substantially overlapped the VHA sampling period. The average overlap was 70%, and all records had at least 1 year of overlap. Seven hundred eight (71%) of the 992 persons in the national sample had complete medical records. On the basis of data from the original telephone survey, we determined that participants in the national sample were more likely to be older, white, and better educated; to have higher income levels; and to have less than excellent health compared with eligible nonparticipants (13).
We sent photocopies of all of the medical records to 1 of 2 central areas for abstraction. For VHA patients, we abstracted data on all care received between October 1997 and September 1999; for patients in the national sample, we abstracted data on all care received in the 2 years before the date of recruitment. We used computer-assisted abstraction software on a Microsoft Visual Basic 6.0 platform (Microsoft Corp., Seattle, Washington), which allowed us to tailor the manual chart abstraction to the specific record being reviewed and provided interactive data quality checks (consistency, range), calculations (for example, high blood pressure), and classifications (for example, drug class). Twenty trained registered nurse abstractors collected the data. To assess interrater reliability, we reabstracted charts for 4% of the participants selected at random. According to the κ statistic, average reliability in the national sample was substantial to almost perfect (28) at 3 levels: presence of a condition (κ = 0.83), indicator eligibility (κ = 0.76), and indicator scoring (κ = 0.80) (13).
All analyses were conducted by using SAS, version 8.2 (SAS Institute, Cary, North Carolina). The unit of analysis was adherence to a given indicator in a given patient. For each indicator, we determined the criteria that made participants eligible for the process specified in the indicator (yes or no). We then determined whether participants had received the specified process each time an indication was noted in their medical record (yes, no, or proportion). We determined aggregate indicator scores for each summary category (that is, acute, chronic, and preventive care; screening; diagnosis; treatment; and follow-up) by dividing all instances in which participants received recommended care by the total number of instances in which the care should have been received. We constructed the scores as proportions ranging from 0% to 100%, adjusting for clustering of indicators within patients. Because of clustering of the data, we used the bootstrap method to estimate standard errors for all of these scores (29).
We applied sampling weights to represent the original populations from which the 2 samples were drawn and to adjust for nonresponse. We also used weights to standardize the patients for characteristics common among the VHA population: COPD; hypertension; diabetes; and age categories ranging from 35 to 50 years of age, 51 to 65 years of age, and older than 65 years of age. Sampling weights were applied at the individual level; indicators were implicitly weighted on the basis of prevalence of eligibility. Although we report weighted results because we believe they are most representative, weighting did not affect the direction or significance of any reported results.
We used t-tests or chi-square tests with bootstrapped standard errors to compare the standardized VHA and national samples according to population characteristics; aggregate quality of care; subsets of indicators related to acute, chronic, and preventive care; subsets of indicators related to function of care; subsets of indicators supported by randomized, controlled trials; subsets of indicators similar to those used by the VHA in its performance measurement system; and chronic conditions that affected more than 50 patients from both samples, including COPD, coronary artery disease, depression, diabetes, hyperlipidemia, headache, hypertension, and osteoarthritis. We used logistic regression to compare the rates at which the respective samples received the care specified in the indicators. This allowed us to adjust for factors beyond the standardization, including age as an integer variable, number of chronic and acute conditions, and number of outpatient visits. We calculated adjusted scores after taking into account clustering of indicators at the individual patient level. For the logistic regression models, standard errors and confidence intervals were adjusted for the clustering of indicators within patients by using the sandwich estimator (30).
To test the sensitivity of our results to geography and insurance, we also estimated models confining the national sample to the 6 communities nearest the 2 VHA regions and to respondents with insurance. To test the sensitivity of our results to completeness of documentation, we estimated models restricted to patients with complete records and to the subset of indicators with high likelihood (laboratory tests and radiology) and less likelihood (counseling and education) of complete documentation. Since the number of visits could represent an intervening variable between the comparison samples and quality, we also ran models that did not adjust for the number of visits. Finally, to test the sensitivity of our results to the type of indicator set used, we compared the adjusted performance of the VHA and the community on the subset of indicators that matched the widely accepted HEDIS indicator set.
The funding agencies (Veterans Affairs Health Services Research and Development Service, the Robert Wood Johnson Foundation, the Centers for Medicare & Medicaid Services, the Agency for Healthcare Research and Quality, and the California HealthCare Foundation) did not participate in the data collection or analysis or in interpretation of the results. Veterans Affairs officials received advance copies of the manuscript for comment.
Table 3 presents the characteristics of the VHA and national samples, with and without weighting for sampling, nonresponse, and standardization for age categories and the prevalence of COPD, hypertension, and diabetes in the VHA sample. After standardization, there were no statistically significant differences in the age of the participants or the number of chronic conditions, although patients in the national sample had slightly more acute conditions. There were also no significant differences in the rates of chronic conditions between the 2 samples, with the exception that VHA patients had a somewhat higher prevalence of osteoarthritis. Patients from the VHA also had a significantly greater number of outpatient visits per year (9.2 vs. 7.9; P < 0.001).
Table 4 presents the results of our analyses comparing the quality of care between the standardized VHA and national samples, adjusting for age and for the number of chronic conditions, acute conditions, and outpatient visits. Sixteen of the 348 indicators had no eligible patients in either sample, leaving 294 indicators and 596 patients on which to base the VHA scores and 330 indicators and 992 patients on which to base the national scores. Overall, VHA patients were more likely than patients in the national sample to receive the care specified by the indicators (67% vs. 51%; difference, 16 percentage points [CI, 14 to 18 percentage points]). Performance in the VHA outpaced that of the national sample for both chronic care (72% vs. 59%; difference, 13 percentage points [CI, 10 to 17 percentage points]) and preventive care (64% vs. 44%; difference, 20 percentage points [CI, 12 to 28 percentage points]), but not for acute care (53% vs. 55%; difference, −2 percentage points [CI, −9 to −4 percentage points]). In particular, the VHA sample received significantly better care for depression, diabetes, hyperlipidemia, and hypertension. The VHA also performed consistently better across the entire spectrum of care, including screening, diagnosis, treatment, and follow-up. These differences in quality of care held true when we considered only those indicators (n = 72) supported by randomized, controlled trials (57% vs. 45%; difference, 12 percentage points [CI, 3 to 20 percentage points]).
To test the association between performance and performance measurement within the VHA, we restricted the analysis of overall quality to processes and conditions specifically addressed by the VHA performance measurement set. When we restricted the analysis to specific indicators that closely matched the performance measures targeted by the VHA, VHA patients had a substantially greater chance of receiving the indicated care than did patients in the national sample (adjusted scores, 67% vs. 43%; difference, 24 percentage points [CI, 21 to 26 percentage points]). Patients from the VHA were also more likely than national patients to receive care in the conditions or areas specified by the VHA indicator set, even when the processes covered by the indicators were substantially different (70% vs. 58%; difference, 12 percentage points [CI, 10 to 15 percentage points]). The difference between VHA patients and national patients in conditions or areas not covered by the VHA performance measurement system barely reached conventional levels of statistical significance (55% vs. 50%; difference, 5 percentage points [CI, 0 to 10 percentage points]).
Confining the analyses to patients in both samples who had complete records did not change the direction or significance of any reported results. The VHA advantage was largest in indicators most likely to have possible underdocumentation (adjusted performance for counseling and education, 45% vs. 26%; difference, 19 percentage points [CI, 14 to 30 percentage points]), but even in laboratory tests and radiology, an area that would be less sensitive to documentation differences, there was also a substantial difference (67% vs. 52%; difference, 15 percentage points [CI, 11 to 19 percentage points]). Confining the analysis to the 6 nationally sampled metropolitan areas closest to the 2 VHA regions also did not change the direction or significance of any result, nor did excluding uninsured patients from the national sample. Models that did not adjust for the number of visits had the same VHA effects as those that did adjust for number of visits. Patients from the VHA also still received more indicated care (adjusted rates, 60% vs. 39%; difference, 21 percentage points [CI, 16 to 26 percentage points]) when the analyses were confined to the overlap of our indicator set and HEDIS measures, the most commonly used national performance indicator set for managed care.
Using the RAND Quality Assessment Tools broad measure of quality of care, we found that adherence to recommended processes of care in 2 VHA regions typically exceeded that in a comparable national sample in 12 communities. These findings persisted when we adjusted the samples for age, number of acute and chronic conditions, and number of outpatient visits and when we examined only processes supported by randomized, controlled trials. In addition, we found that the differences between the VHA and national sample were greatest in processes subject to the VHA performance measurement system. The “halo effect” of better VHA care extended to measures of processes in the same condition or area that were not specifically measured by the VHA performance system; however, this effect decreased greatly in unrelated areas. Acute care, COPD care, osteoarthritis care, and coronary artery disease care were exceptions to the pattern of better care in the VHA, although our power to distinguish quality differences was limited by the small number of patients with COPD in the national sample (n = 62).
To date, the VHA has not targeted acute care or osteoarthritis care as part of its intensive performance measurement system (6). Coronary artery disease, on the other hand, has been the subject of quality improvement efforts both inside and outside the VHA, including those sponsored by the American Heart Association (31-33). Indeed, many previous comparisons between VHA and national samples outside the VHA performance set have involved patients with coronary artery disease and have yielded mixed results (10). That we found little difference between the care provided to patients with coronary artery disease in the VHA and in a national sample is consistent with other findings and could be the result of comparable quality measurement programs for this condition in the United States and in the VHA. On the other hand, predominantly outpatient-based quality improvement efforts for diabetes have also been implemented in both the VHA system and other institutions, and our analyses showed that the VHA outperformed the national sample for diabetes care. The difference may be due to more effective outpatient VHA quality improvement for diabetes, but further research is needed to investigate the roots of this discrepancy.
Although our study is one of the most comprehensive comparisons between VHA patients and national patients, it has limitations. First, our analysis is based on a comparison of 2 different study samples. Although we used robust statistical techniques to account for any differences between the samples, we could not adjust for the somewhat different geographic distributions or abstraction periods, although there was a great deal of overlap in both areas. Furthermore, in other analyses, we have not observed any large geographic variations in the aggregate indicator scores for the national sample, and our results did not change when we confined the national sample to the 6 communities closest to the 2 Veterans Affairs regions (34). Our study also relied on patient recollection of provider visits in the national sample. It is possible that patients received care from additional providers but did not recall or that we did not receive all available charts. However, we found that confining our analyses to patients with complete records did not change the results, and persons with missing charts were likely to have higher quality scores (13). We lack data on whether patients in the national sample were also receiving care at the VHA, or vice versa. Other studies have found evidence of co-management between VHA and non-VHA providers (35). To the extent that this co-management occurred, it would probably lead to an underestimate of the differences between the 2 groups. An additional limitation of our study is that there were too few men younger than 35 years of age and too few women in our VHA sample to analyze care for these subgroups. For women, limited data from other studies indicate a VHA advantage in breast cancer screening (7). While the Quality Assessment Tools system is quite broad, it cannot represent all of medical care, and there are probably gaps in the indicator set. Last, the evidence grading system for Quality Assessment Tools is based on a simple measure of research design. More precise evidence categories might have altered our analysis of the effect of level of evidence on the comparison between the VHA and national samples, but it is difficult to tell whether the differences would be accentuated or diminished.
Several unmeasured patient characteristics could have biased our results. The response rate was lower in the national sample than in the VHA sample, underrepresenting ethnic minorities and the poor and exacerbating the natural difference in prevalence between the VHA and the United States as a whole. Ethnic minorities and people with low incomes generally receive lower-quality care (36, 37), although these disparities have not yet been examined by using the Quality Assessment Tools system. If we had been able to adjust for these variables, the differences in quality of care that we observed may have been even greater. Patients from the VHA also tend to have more severe disease than patients outside the VHA, and it is possible that severity of disease influences care quality (38). However, the process indicators we used are clinically precise, and all eligible patients should have received the indicated care regardless of disease severity. In any case, our findings persisted even when we adjusted for number of conditions.
One of the purported advantages of the electronic medical record (which was universally available in the VHA sites) is more thorough documentation. Indeed, the volume of the VHA medical records we reviewed was larger than that of the national sample; it took almost one and a half times longer to abstract data from the VHA sample, although some of this difference was no doubt due to the higher number of visits and conditions. Some of the observed differences may be due to more thorough documentation for VHA patients rather than more thorough medical care. In constructing the indicator set, expert panelists were instructed to include indicators only where the absence of documentation itself would be evidence of poor care. Even so, 1 VHA study found gaps of only approximately 10% between documentation in the medical record and actual care provision among standardized patients (39, 40). Furthermore, the VHA patients received more care both in indicators that are sensitive to documentation practices (counseling and education) and those that are insensitive (laboratory tests and radiology). Therefore, it seems unlikely that different documentation practices alone could account for all of the differences we observed. Instead, other aspects of the electronic medical record, such as notation templates that structure physician–patient interaction or computerized reminders targeting performance measures, may account for the difference.
The implications of these data are important to our understanding of quality management. The VHA is the largest health care system to have implemented an electronic medical record, routine performance monitoring, and other quality-related system changes, and we found that the VHA had substantially better quality of care than a national sample. Our finding that performance and performance measurement are strongly related suggests that the measurement efforts are indeed contributing to the observed differences. Performance measurement alone seems unlikely to account for all the differences; the VHA scored better even on HEDIS measures widely applied in managed care settings (but not in other settings) outside the VHA. Our study was not designed to determine which other mechanisms might be acting to improve VHA care, but other studies have suggested that they might include computerized reminders, standing orders, improved interprovider communication, facility performance profiling, leveraging of academic affiliations, accountability of regional managers for performance, and a more coordinated delivery system (5, 6, 41, 42). More research is needed to estimate the relative effects of these practices. As more coordinated systems of medical care delivery develop, our data support the use of the types of information and quality management systems available in the VHA.
The In the Clinic® slide sets are owned and copyrighted by the American College of Physicians (ACP). All text, graphics, trademarks, and other intellectual property incorporated into the slide sets remain the sole and exclusive property of the ACP. The slide sets may be used only by the person who downloads or purchases them and only for the purpose of presenting them during not-for-profit educational activities. Users may incorporate the entire slide set or selected individual slides into their own teaching presentations but may not alter the content of the slides in any way or remove the ACP copyright notice. Users may make print copies for use as hand-outs for the audience the user is personally addressing but may not otherwise reproduce or distribute the slides by any means or media, including but not limited to sending them as e-mail attachments, posting them on Internet or Intranet sites, publishing them in meeting proceedings, or making them available for sale or distribution in any unauthorized form, without the express written permission of the ACP. Unauthorized use of the In the Clinic slide sets will constitute copyright infringement.
Healthcare Delivery and Policy, Prevention/Screening.
Results provided by:
Copyright © 2016 American College of Physicians. All Rights Reserved.
Print ISSN: 0003-4819 | Online ISSN: 1539-3704
Conditions of Use
This PDF is available to Subscribers Only