Drawing Conclusions about Short-Term Variability in Liver Function Test Results

TO THE EDITOR: We read with great interest the recent study by Lazo and colleagues (1) on the short-term variability of various biochemical liver tests and want to share some concerns. First, this study is based on a nonrandom, convenience sample that represents only 9.5% (1864 out of 19618) of the original NHANES (National Health and Nutrition Examination Survey) study population, raising questions about the generalizability of the findings, given the strong likelihood for selection bias. Second, the very small percentage of persons from racial or ethnic groups other than white, black, and Hispanic restricts the applicability of the results, particularly for Asians. The cutoff values of normal levels of serum alanine aminotransferase (ALT) that were used in this study40 IU/L for men and 31 IU/L for womenare probably overestimates, as has been shown in 2 large studies from Italy and Korea (2, 3). These studies recommend using cutoff values of 30 IU/L for men and 19 IU/L for women. On the basis of these recommended cutoff values, a substantial percentage of patients in Lazo and colleagues' study classified as having elevated ALT levels at examination 1 (median, 43 IU/L) who returned to normal in examination 2 (median, 27 IU/L) would probably still have abnormal levels of ALT. These distinctions are not trivial, as evidenced by the increased risk for death in individuals with ALT levels greater than 20 IU/L compared with those with ALT levels less than 20 IU/L (relative risk for patients with ALT levels of 20 to 29 IU/L, 2.9; relative risk for patients with ALT levels of 30 to 39 IU/L, 9.5) (3). The American Association for the Study of Liver Diseases has also called for recalibration of the normal range for ALT level (4). Finally, the Gilbert syndrome, a genetic disease with a prevalent homozygosity of 9% in the Western population, can result in values outside the normal range of total bilirubin level, varying with fasting status or stress. This may affect the determination of normal versus abnormal values in Lazo and colleagues' study.

TO THE EDITOR: I read with interest the article by Lazo and colleagues (1). The authors do not mention the possible effect of statin use in their study. They enrolled middle-aged participants with diabetes and hypertension; such status brought awareness of the metabolic syndrome. The volunteers may be prescribed statins, a widely used group of cholesterol-lowering drugs known to cause increased levels of aminotransferases (2)(3)(4). The increases in aminotransferase levels with statin use are asymptomatic, and these levels may return to normal with continuation of statin therapy (4). Approximately one third of adult participants in the study had transient increases in aminotransferase levels (1), which might be caused by unrecognized statin use rather than intraindividual variability. So it is important to specify whether the participants used statins during the study.
TO THE EDITOR: As Lazo and colleagues (1) report, clinicians should be aware of the high intraindividual variability in common liver tests, and practice guidelines should explicitly recommend retesting asymptomatic individuals who have abnormal liver test results. Lazo and colleagues evaluate the association of intraindividual variability with alcohol consumption; hepatitis A, B, or C serologic status; recent infection; body mass index; or sociodemographic characteristics. However, they do not clearly explain the statistical factors affecting intraindividual variability. If liver function tests are repeated as recommended, clinicians will need to know the clinical implications of intraindividual variability in liver function tests. For example, the readers are likely to assume a prognostic evaluation by follow-up. Could the authors comment on the prognosis of a patient with 1 abnormal ALT level measurement who has a positive serologic test result for hepatitis A, B, or C compared with a patient who has a normal ALT level?
This study is based on the participants living in the United States, so the results may not be applicable to Asians. We performed similar experiments for 258 asymptomatic adults with hepatitis B living in China. Although the sample was small, the results were interesting. In adults with initially elevated ALT levels, 23% had normal levels at the second examination. The intraindividual variability was found to be statistically significantly associated with heredity. The short-term variability in ALT levels was familial prevalence. Thus, we assume that the epigenetic inheritance may be the mechanism underlying the intraindividual variability. Could the au-

Annals of Internal Medicine
Letters thors comment on the potential effect of inheritance on intraindividual variability in common liver test results?  (1) shows that in the context of an epidemiologic survey, such as NHANES, abnormal liver test results in more than one third of participants (levels of aspartate aminotransferase, ALT, alkaline phosphatase, ␥-glutamyltransferase, and bilirubin) would be reclassified as normal if retested 17 days apart. On the basis of their findings, Lazo and colleagues recommend that, to avoid unnecessary testing, individuals with abnormal liver test results on a first determination be routinely retested before undergoing further evaluation.
In our opinion, caution should be used in drawing practice recommendations from epidemiologic studies. We are particularly concerned that the proposed strategy could be highly misleading in a clinical setting. Liver tests results, particularly aminotransferases and ␥-glutamyltransferase, typically fluctuate in patients with chronic liver disease (2). When evaluating a patient, even an asymptomatic one, with abnormal liver biochemistries, clinicians should interpret results according to the clinical context and consider an adequate work-up (2,3). A repeated value in the normal range does not ensure that the initial value was truly erroneous.
In addition, substantial evidence indicates that high aminotransferase values are statistically significantly correlated with increased future mortality, suggesting that these blood tests are valuable indicators of long-term prognosis (4,5). How should one differentiate between a clinically insignificant fluctuation of normality and a predictor of mortality in a single patient? To support their recommendations, Lazo and colleagues should have noted that patients with 2 discordant test results have the same long-term outcome as those with 2 concordant normal test results. Otherwise, normalization cannot be defined as proof of normality.

IN RESPONSE:
We appreciate the readers' interest in our study. As indicated by Drs. Arora and Triadafilopoulos, because our study was based on a nonrandom sample of the NHANES III population, selection bias remains a possibility. However, even if selection bias were present, it doesn't change the fact that, although sociodemographic and other patient-level characteristics may be associated with the absolute level of liver enzymes, such factors are unlikely to affect variability, which is a biological phenomenon. Indeed, we did not observe differences in estimates of variability by demographic or other patient-level characteristics. We agree that because Asians make up only a very small percentage of the NHANES III study population (Ͻ5%), it is not possible to draw firm conclusions regarding this population. Results of analyses using the cutoffs for an ALT level of 19 IU/L or greater for women and 30 IU/L or greater for men were essentially identical (31% returned to normal); however, the prevalence of elevated ALT levels using these cutoffs is higher: 17% versus 6%. We agree that defining the reference range for liver tests is an important area of debate (1)(2)(3). Finally, although the Gilbert syndrome is a common cause of elevated bilirubin levels, to our knowledge no evidence suggests that it affects the within-person variability of liver enzyme levels. In addition, our results were unchanged when we excluded adults with recent illness or those who fasted more than 8 hours at both examinations.
As pointed out by Dr. Kittisupamongkol, the use of statins can cause abnormal aminotransferase levels. Because NHANES III was conducted from 1988 to 1994, only 4.8% of participants reported taking lipid-lowering medications. Use was not associated with elevated liver enzyme levels or their within-person variability in our study.
Drs. Hong, Wu, and Fan raise an important question regarding factors that affect intraindividual variability. However, our paper focused on describing the intraindividual variability in liver test results in the U.S. population. Further studies are needed to examine factors that may contribute to variability in liver enzyme levels across individuals.
As Drs. Colli and Prati mention, one should certainly be cautious when drawing practice recommendations from epidemiologic studies. The development of guidelines for the reporting of observational studies, such as STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) (4), was the result of similar concerns and the appreciation that much of our clinical and population-based knowledge has been derived from observational studies. We took advantage of a large, multiracial sample of the U.S. population, with a vast amount of rigorously collected data. This study would have been very difficult to conduct otherwise. Miller. One of us is a practicing physician and is aware of the extra burden imposed by many performance measurement programs with unclear patient benefits. Evidence suggests, however, that the proliferation, measurement, and dissemination of quality information have a substantial impact on measured areas of quality. Indeed, one measure (␤-blocker use after a myocardial infarction) has been retired because performance has approached perfection (1). It is highly unlikely that performance on this measure and others would be so high if a spotlight had not been aimed at them. Although pay-forperformance and other programs have been shown to have a generally small impact over short periods, their cumulative effects over time remain unknown. The hope is that better use of population health management techniques and electronic resources, such as electronic health records and decision support, will improve the capacity of physician organizations to achieve higher-quality care. Although space constraints prohibit us from addressing each of Dr. Miller's questions, we comment on a few key points. First, given recent evidence that the quality of care produced by the U.S. health care system is suboptimal, we believe not only that limited resources should be directed toward improving care but that this investment should be much more substantial (2)(3)(4). Second, we disagree that these programs are at the root of the current primary care crisis. In fact, the United Kingdom has instituted a broad pay-for-performance program that includes substantial additional resources directed toward general practitioners, in part to stabilize the primary care workforce. Like Dr. Miller, we hope that onerous utilization management tools and requests used by health plans will diminish with time as the interconnectedness of the health care system is improved. Nonetheless, it is unlikely that these programs will disappear while there is still substantial evidence of overuse and variations in use that cannot be explained by clinical need. Some of these other issues have been discussed in other papers (5). Despite these problems, we believe that increased measurement and transparency are required for improving health systems.