Antipsychotics in Adults With Schizophrenia: Comparative Effectiveness of First-Generation Versus Second-Generation Medications

BACKGROUND
Debate continues about the comparative benefits and harms of first-generation antipsychotics (FGAs) and second-generation antipsychotics (SGAs) in treating schizophrenia.


PURPOSE
To compare the effects of FGAs with those of SGAs in the treatment of adults aged 18 to 64 years with schizophrenia and related psychosis on illness symptoms, diabetes mellitus, mortality,tardive dyskinesia, and a major metabolic syndrome.


DATA SOURCES
English-language studies from 10 electronic databases to March 2012, reference lists of relevant articles, and gray literature.


STUDY SELECTION
Randomized trials for efficacy and cohort studies at least 2 years in duration for adverse events.


DATA EXTRACTION
Two independent reviewers extracted data from 114 studies involving 22 comparisons and graded the strength of evidence for primary outcomes as insufficient, low, moderate, or high using the Grading of Recommendations Assessment, Development and Evaluation approach.


DATA SYNTHESIS
Few differences of clinical importance were found for core illness symptoms; lack of precision in effect estimates precluded firm conclusions for many comparisons. Moderate-strength evidence showed a clinically important benefit of haloperidol over olanzapine for improving positive symptoms, but the benefit was scale-dependent: It was seen when the Scale for the Assessment of Positive Symptoms was used but not when the Positive and Negative Syndrome Scale (PANSS) was used. Moderate-strength evidence showed a clinically important benefit of olanzapine over haloperidol in improving negative symptoms when the PANSS and the Scale for the Assessment of Negative Symptoms were used. Low-strength evidence showed no difference in mortality for chlorpromazine verus clozapine or haloperidol versus aripiprazole,increased incidence of the metabolic syndrome for olanzapine versus haloperidol (risk differences, 2% and 22%), and higher incidence of tardive dyskinesia for chlorpromazine versus clozapine (risk differences, 5% and 9%). Evidence was insufficient to draw conclusions for diabetes mellitus.


LIMITATIONS
All studies had high or unclear risk of bias. Length of study follow-up was often too brief to adequately measure adverse events. Medication comparisons, dosage, and outcome measurement were heterogenous for head-to-head comparisons. Selective patient populations limit generalizability.


CONCLUSION
Clear benefits of FGAs versus SGAs for treating schizophrenia remain inconclusive because of variation in assessing outcomes and lack of clinically important differences for most comparisons. The strength of evidence on safety for major medical events is low or insufficient.


PRIMARY FUNDING SOURCE
Agency for Healthcare Research and Quality.

T he introduction of second-generation antipsychotics (SGAs) for treatment of schizophrenia was an important effort to improve symptom management, reduce extrapyramidal symptoms caused by first-generation antipsychotics (FGAs), and offer patients improved quality of life and functioning. Today, 20 commercial FGAs and SGAs that have been approved by the U.S. Food and Drug Administration (FDA) are available in the United States (Appendix Table 1, available at www.annals.org). Of these, SGAs are more frequently prescribed by physicians. In 2003, three quarters of the 2 million adult patients in the United States who were prescribed an antipsychotic medication were prescribed an SGA, which accounted for 93% of the estimated $2.82 billion spent on these medications in the United States (1).
Recent large-scale trials and meta-analyses have called into question whether SGAs and FGAs provide clinically important differences for patient outcomes (1)(2)(3), and the question of which medication is more efficacious has yet to be definitively answered. Part of the uncertainty about medication efficacy relates to the lack of studies focused on long-term management. Such issues as how patient management should be influenced by medication heterogeneity within the 2 classes also add ambiguity for physician decision making (1, 4 -6), as do differences between recently published reviews in defining eligible medication comparisons, patients, and clinically important outcomes and evaluating the strength of evidence (1,(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19).
This comparative effectiveness review summarizes the benefits and harms associated with commercially available, FDA-approved FGAs and SGAs. Broad inclusion criteria were used for comparisons among FGAs and SGAs, patients, and study outcomes to address the diversity of previously published reviews. and a protocol that followed standards for systematic reviews (21)(22)(23). A full technical report with detailed search strategies, methods, and evidence tables is available from the Agency for Healthcare Research and Quality (21).

Literature Search
We conducted comprehensive searches in MEDLINE (Appendix Table 2, available at www.annals.org), EMBASE, PsycINFO, International Pharmaceutical Abstracts, CINAHL, ProQuest Dissertations and Theses-Full Text, the Cochrane Central Register of Controlled Trials, and Scopus for studies published from 1950 to March 2012. For adverse events, we also searched the U.S. National Library of Medicine's TOXLINE and the MedEffect Canada Adverse Reaction Database.
We hand-searched proceedings from the annual meetings of the American Psychiatric Association (2008 -2010) and the International College of Neuropsychopharmacology (2008 -2010). We searched clinical trial registries and contacted experts in the field and authors of relevant studies. We retrieved new drug applications for each of the included interventions from the FDA Web site. We reviewed the reference lists of reviews, guidelines, and new drug applications and searched for articles citing relevant studies using Scopus Citation Tracker.

Study Selection
Two reviewers independently screened titles and abstracts. We retrieved the full text of potentially relevant studies. Two reviewers independently reviewed each article using a standardized form with a priori eligibility criteria (Appendix Table 3, available at www.annals.org). We resolved discrepancies through discussion or third-party adjudication. We included studies if they were randomized, controlled trials (RCTs); were nonrandomized, controlled trials (non-RCTs); were cohort studies with a minimum follow-up of 2 years; included adults aged 18 to 64 years with schizophrenia or related psychoses; compared a commercially available FDA-approved FGA with an FDAapproved SGA; and provided data on illness symptoms (Appendix Table 4, available at www.annals.org) or the following adverse events: diabetes mellitus, death, tardive dyskinesia, or a major metabolic syndrome.

Quality Assessment and Rating the Body of Evidence
Two reviewers independently assessed the methodological quality of included studies and resolved disagreements through discussion. We assessed RCTs and non-RCTs using the Cochrane Risk of Bias Tool (22) and cohort studies using the Newcastle-Ottawa Scale (24).
Two reviewers independently evaluated strength of evidence using the Grading of Recommendations Assessment, Development and Evaluation approach of the Evidence-based Practice Center Program and resolved discrepancies through discussion (25). We examined 4 domains: risk of bias, consistency, directness, and precision. Within the grading system, randomized trials always begin with a "high" strength of evidence that can be downgraded on the basis of shortcomings in the body of evidence (for example, overall risk of bias, inconsistency between study results, indirectness of the measured outcomes, and imprecision of the pooled estimate). In contrast, observational studies (for example, cohort studies) begin with a "low" strength of evidence that can be further downgraded (similar to randomized trials) but can also, in rare cases, be upgraded. We assigned an overall grade of "high," "moderate," "low," or "insufficient" strength of evidence. We graded core illness symptoms in the categories of positive symptoms, negative symptoms, general psychopathology, and global ratings or total scores (typically a compilation of positive and negative symptoms or general psychopathology, which included these symptoms plus mood states). We provided a grade for each scale that was reported in the relevant studies. We also graded the adverse events listed in the previous section.

Data Extraction
Two reviewers independently extracted data using standardized forms and resolved discrepancies by referring to the original report. We extracted information on study characteristics, populations, interventions, outcomes, and results. Primary outcomes were improved core symptoms Review First-vs. Second-Generation Antipsychotics in Adults With Schizophrenia of illness (positive and negative symptoms and general psychopathology) and 4 adverse events specified a priori. Secondary outcomes included functional outcomes; health care system use; response, remission, and relapse rates and medication adherence; health-related quality of life; other patient-oriented outcomes (for example, patient satisfaction); and general and specific measures of other adverse events (for example, extrapyramidal symptoms and weight gain). When studies incorporated multiple relevant treatment groups or multiple follow-up periods, we extracted data from all groups for the longest follow-up period. In cases of multiple reports of the same study, we referenced the primary, or most relevant, study and extracted additional data from companion reports.

Data Analysis
We conducted meta-analyses in RevMan, version 5.01 (The Cochrane Collaboration, Nordic Cochrane Centre, Copenhagen, Denmark), using a random-effects model (26) when studies were sufficiently similar in terms of design, population, interventions, and outcomes. We combined risk ratios for dichotomous outcomes using the Der-Simonian and Laird random-effects model and combined continuous outcomes using mean differences with 95% CIs. We quantified statistical heterogeneity using the I 2 statistic. For trials with multiple study groups, we pooled the data for all relevant groups in the same trial before including the study in any meta-analysis so that the same groups were never represented more than once in any given meta-analysis. Where measures of variance were not reported in the studies, we imputed the variance from the largest reported SD in the given meta-analysis.
We conducted subgroup and sensitivity analyses for illness or disorder subtypes, sex, age group (18 to 35 years, 36 to 54 years, and 55 to 64 years), race, comorbid conditions, drug dosage, follow-up period, previous exposure to antipsychotics, treatment of a first episode versus prior episodes, and treatment resistance. Details of these analyses are presented in the appendices to the full technical report. We report subgroup and sensitivity analyses if there was substantial heterogeneity (I 2 Ն 50%). For comparisons with at least 10 studies, we assessed publication bias using funnel plots and statistical tests (27)(28)(29). For our primary outcome of core symptoms, we considered a difference of 20% to be clinically important (7,30). We calculated absolute differences (that is, risk differences) for adverse events to enhance interpretation of results.

Role of the Funding Source
The Agency for Healthcare Research and Quality suggested the initial questions and approved copyright assertion for the manuscript but did not participate in the literature search, data analysis, or interpretation of the results.

RESULTS
A total of 9703 unique study reports were identified; we included 114 primary publications (2, 31-143) (110 RCTs, 2 non-RCTs, and 2 retrospective cohort studies) and 149 companion publications (Figure). The studies were published between 1974 and 2012 and involved 22 drug comparisons. Most studies were multicenter (54%), involved inpatients (48%), and were conducted in North America (42%). The number of participants ranged from 10 to 118 522 (median, 78; interquartile range, 38 to 296). The average participant age ranged from 21 to 50 years (median, 37 years; interquartile range, 32 to 40 years). The length of follow-up (that is, study duration) ranged from less than 1 day to 4 years (median, 8 weeks;

Continued on following page
Review First-vs. Second-Generation Antipsychotics in Adults With Schizophrenia interquartile range, 6 to 26 weeks) for RCTs and non-RCTs; the cohort studies were 3 and 22 years in duration. The route of medication administration was primarily oral; intramuscular administration occurred in 10 studies (9%). Sixty-eight percent of studies were supported by the pharmaceutical industry. None of the RCTs and non-RCTs had low risk of bias, 67% had unclear risk of bias, and 33% had high risk of bias. Trials were commonly assessed as having unclear risk of bias because of incomplete reporting of sequence generation, allocation concealment, and blinding methods.
The most common reasons for trials to be assessed as having high risk of bias were lack of blinding and inadequate handling or reporting of outcome data. Methodological quality of the cohort studies was good; both collected data retrospectively.

Core Illness Symptoms
The findings for core illness symptoms are presented in Table 1. Comparisons and outcomes for which strength of evidence was insufficient (for example, evidence from single trials) to draw a conclusion are not displayed; these results for the Positive and Negative Syndrome Scale (PANSS) are displayed in Appendix Table 5 (available at www.annals.org). The following sections describe the results for which there was at least low strength of evidence. Two differences were found in positive symptom alleviation in comparisons of haloperidol with 5 SGAs, as measured by the PANSS and the Scale for the Assessment of Positive Symptoms. Low-strength evidence showed a benefit for risperidone compared with haloperidol on the PANSS; the difference was not considered clinically important, and there was indication of publication bias. Moderate-strength evidence showed a clinically important benefit of haloperidol over olanzapine on the Scale for the Assessment of Positive Symptoms (Appendix Figure 1, available at www.annals.org). The low strength of evidence for all remaining comparisons was driven by lack of precision in effect estimates.
Evidence of benefit for treating negative symptoms with SGAs was stronger. Haloperidol was compared with 6 SGAs by using the PANSS and the Scale for the Assessment of Negative Symptoms. Moderate-strength evidence showed that olanzapine had a clinically important benefit compared with haloperidol for both scales (Appendix Figure 2, available at www.annals.org), with no indication of publication bias. Risperidone also showed moderatestrength evidence of benefit compared with haloperidol on the PANSS, although results were not considered clinically important. There was also no indication of publication bias. Aripiprazole showed moderate-strength evidence of benefit compared with haloperidol, although the difference was not considered clinically important. Strength of evidence for haloperidol versus clozapine, quetiapine, and ziprasidone was low due to lack of precision in effect estimates.
There were few differences between FGAs and SGAs in global rating and total symptom score improvement. Moderate-strength evidence showed that olanzapine had a clinically important benefit compared with haloperidol on the PANSS (Appendix Figure 3, available at www.annals .org), with no indication of publication bias. Olanzapine also showed a difference compared with haloperidol on the Clinical Global Impression-Severity scale, but it was not considered clinically important. Moderate-strength evidence showed a clinically important benefit of risperidone compared with haloperidol on the PANSS (Appendix Figure 4, available at www.annals.org), although there was substantial heterogeneity (I 2 ϭ 76%). When 1 outlier (significantly favoring haloperidol) was removed, heterogeneity decreased and results remained in favor of risperidone (Appendix Figure 5, available at www.annals.org); there was no indication of publication bias. The outlying study (n ϭ 100) used a relatively small fixed dose of risperidone (2 mg/d), whereas most of the other studies used a range from 1 mg/d to 5 to 20 mg/d. Subgroup analyses by dosage showed less heterogeneity and more benefits for higher doses of risperidone (data in technical report). Moderatestrength evidence showed a benefit for haloperidol compared with quetiapine on the Clinical Global Impression-Severity scale, but the difference was not clinically important. Moderate-strength evidence showed a clinically important benefit for clozapine compared with chlorpromazine based on the total score from the Brief Psychiatric Rating Scale (Appendix Figure 6, available at www.annals .org).
Haloperidol was compared with 4 SGAs, most commonly olanzapine, and results were reported for 8 scales assessing an overall change in general psychopathology. Moderate-strength evidence showed a difference for 1 of  Figure 7, available at www.annals.org).

Patient-Oriented Outcomes and Health Care System Use
Patient-oriented outcomes broadly refer to functional outcomes (for example, sexual dysfunction, employment, and economic independence) and outcomes that are important to patients (for example, health-related quality of life). Results for functional outcomes were available for 9 head-to-head comparisons ( Table 2), with no statistically significant differences in any comparisons. In terms of health-related quality of life, aripiprazole compared with perphenazine showed 20% improvement (1 trial) (90), and ziprasidone compared with haloperidol showed benefits on the Quality-of-Life Scale (1 trial) (118). Statistically significant differences were found favoring aripiprazole over haloperidol for caregiver satisfaction (1 trial) (66) and patient satisfaction (1 trial) (66). Results for health care system use were available for 10 head-to-head comparisons, with no statistically significant differences for any comparison (Table 2). Some of the results described in this section and Table 2 are based on single trials and should be interpreted with caution.

Medication-Associated Adverse Events and Safety
For the 4 key adverse events, the strength of evidence was insufficient to draw conclusions for most comparisons (Appendix Table 6, available at www.annals.org). Two trials each provided data on mortality for chlorpromazine versus clozapine (105, 106) and haloperidol versus aripiprazole ( Table 3) (34, 136). Absolute differences were small, ranging from 1% to 4% and 0% to 1%, respectively. The length of follow-up (that is, duration) of the trials for the latter comparison was only 24 hours, and the drug was administered via intramuscular injection in both studies. Low-strength evidence showed a higher incidence of the metabolic syndrome for olanzapine than for haloperidol; risk differences were 2% and 22%, respectively, in the 2 relevant studies (88,102). Low-strength evidence showed a higher incidence of tardive dyskinesia for chlorpromazine than for clozapine; risk differences were 5% and 9% at 12 weeks and 9 years, respectively (77,84). Across all studies involving adverse events, the strength of evidence was driven by lack of precision in the estimates of effect because of the small numbers of participants studied and events observed.
Data were also recorded for general measures of adverse events and specific adverse events by physiologic system; extrapyramidal symptoms were the most frequently reported event (detailed data and analyses available in technical report). For general measures of adverse events, statistically significant differences were found in the incidence of adverse events and withdrawals due to adverse events for several comparisons. The comparison usually included haloperidol, and the risk was consistently higher with the FGA.

DISCUSSION
Despite FGAs and SGAs being a mainstay in the treatment of schizophrenia in adults, questions remain about whether and how the various commercially available medications differ in efficacy and safety profiles (1)(2)(3)(4)(5)(6). This review provides a comprehensive synthesis of the evidence on the comparative benefits and harms of FDA-approved FGAs and SGAs. We used a broad approach to inclusion criteria for comparisons, patients, and study outcomes to bring together the diversity of previously published reviews and provide a broader perspective on evidence in the field (1,(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19).
We identified a large number of relevant studies (114 studies and 22 different comparisons), the majority of which were efficacy trials (146). The most frequent comparisons involved haloperidol and risperidone (40 studies) or olanzapine (35 studies); however, the number of studies Table 3

Consistency Directness Precision Strength of Evidence
First-vs. Second-Generation Antipsychotics in Adults With Schizophrenia www.annals.org available for each comparison and outcome was often limited. Overall, we found few differences of clinical importance between the active drugs; however, this does not imply that they are equivalent. The strength of evidence from these studies was generally low or insufficient, with considerable variation in scales and subscales used to measure symptoms. This heterogeneity, coupled with the small number of studies within specific comparisons, suggests that there is insufficient power to explain some of the negative findings and precludes firm conclusions that are needed for front-line clinical decision making.
At this time, evidence supporting the use of SGAs for negative symptoms is stronger than that supporting their use for positive symptoms; olanzapine and risperidone were found to be more efficacious than haloperidol in reducing such symptoms as blunted affect and withdrawal. This effect, however, was not observed for improving overall (global) functioning and general psychopathology. Contrary to recent reviews (7,8), we found no evidence of benefit in improving symptoms with clozapine compared with haloperidol, although moderate-strength evidence showed benefits for clozapine compared with chlorpromazine. Differences in study inclusion criteria between our review and previously published reviews probably account for the different outcomes, with our review including more studies from which to base conclusions. In light of the totality of evidence in this review, the ample low-quality evidence showing no difference between haloperidol and various SGAs in improving symptoms provides an inadequate evidence base to advocate for one medication over another.
The data for adverse events were of low to insufficient strength, suggesting the need for a more focused evaluation of drug safety. Despite our efforts to identify long-term safety data from observational studies, only 2 retrospective cohort studies provided follow-up data at least 2 years in duration. Short-term efficacy trials, which are accepted by the regulatory authorities, may not identify timedependent adverse events, such as tardive dyskinesia, diabetes mellitus, the metabolic syndrome, or death. Although few studies measured mortality, some evidence suggests that treatment with FGAs or SGAs is no different after immediate use (within 24 hours) or long-term use (Ͼ12 months). The strength of evidence for other mortalityrelated outcomes (such as suicide-related behaviors, which is a risk in this clinical population) (147)(148)(149) was insufficient to draw conclusions.
We found low-strength evidence for an increased incidence of the metabolic syndrome with use of olanzapine. In general, most studies showed no difference between FGAs and SGAs in terms of increased risk for the metabolic syndrome or diabetes mellitus; however, the strength of evidence was usually insufficient. Although the methodological and reporting limitations of these studies make conclusions about these outcomes premature (150), several reviews have identified clozapine and olanzapine as contributing to greater weight gain (7,(151)(152)(153), but this may not necessarily translate into increased risk for more severe outcomes. Further study of this trajectory is warranted with higher-quality longitudinal studies.
Our results are consistent with those of CATIE (Clinical Antipsychotic Trials of Intervention Effectiveness) (2), a widely cited trial in this field. CATIE was designed to evaluate whether FGAs were inferior to SGAs in efficacy and safety. Findings from CATIE suggested that the FGA perphenazine and various SGAs (olanzapine, quetiapine, risperidone, and ziprasidone) differed more in their adverse effect profiles than in their therapeutic effect profiles. The study, like this review, also showed that effectiveness across medications varied and that the difference was clinically important in some cases.
Our results are also similar to those of a recent systematic review of SGAs versus FGAs, although our review is broader in scope in terms of medications included, patient populations, and outcomes (1). There were several methodological differences between the previous review and this one: The previous review included non-FDA-approved antipsychotics, restricted the analysis to only double-blind trials, included only studies examining optimum SGA dosage and oral route of administration, pooled data across efficacy outcome measures, and pooled different FGAs. The different methodologies may have led to slightly different conclusions about individual SGAs.
One of the unique features of our review is the strength-of-evidence assessments, which provide information on the level of confidence one can place on the results of existing studies. In most cases, the strength of evidence was insufficient or low, highlighting the likelihood that future research may change the estimates of effect and the need for a stronger evidence base to inform clinical practice. Current treatment guidelines from the American Psychiatric Association for patients with schizophrenia provide specific recommendations on medication timing (for example, acute phase or first episode) but broad variables for medication options (154). This approach may reflect the current state of evidence for FGAs and SGAs, and as stronger evidence emerges, it may come to reflect more specific recommendations for prescribing physicians.
There were limitations in the design and quality of the primary studies. Most studies were short-term RCTs, often with an a priori hypothesis that the SGA would be more efficacious (155). Most trials did not sufficiently report methods to prevent selection and performance bias. Few trials reported blinding study investigators and participants; single-blinded and open-label trials in this field have been found to favor SGAs over FGAs (1). Furthermore, the individual studies and, in many cases, the pooled results may not have sufficient power to detect equivalence or noninferiority between drugs.
Most studies in this review were industry-funded (69%), which can increase the chance of proindustry findings (156). Funding was not disclosed for 19% of studies, highlighting the need for transparency in reporting the nature and extent of financial support. The choice of medication comparisons, dosages, and outcomes in the studies included in this review may have been driven by the funder's interests and priorities. Publication and reporting of select comparisons and outcomes are other potential limitations of this body of evidence.
Few studies provided evidence for comparable patient populations. We found notable heterogeneity across studies for disorder subtypes, comorbid drug or alcohol use, treatment resistance, and number of previous episodes, which result in differential response to treatment. Furthermore, many studies were highly selective in patient enrollment, which may increase the likelihood of drug benefit and decrease the likelihood of adverse events. Detailed subgroup analyses are reported elsewhere (21). Characteristics of the research, including drug dosages (for example, lower doses of FGAs in more recent studies) and patient populations (for example, fewer patients already exposed to FGAs or proven treatment resistance to FGAs in recent studies), also changed over time. Finally, differences in medication comparisons and dosage and outcome measurement limited our synthesis, and outcomes that are important for understanding medication adherence and persistence (a common clinical encounter in this patient population), such as sedation and restlessness, were rarely reported.
More longitudinal research is needed on the long-term safety of FGAs versus SGAs. Despite our efforts to identify long-term safety data from observational studies, only 2 retrospective cohort studies were identified. Consensus is needed on the most important comparisons between FGAs and SGAs for future studies. Short-and long-term evaluations with patient subpopulations, including those with medical and neurologic comorbid conditions, are needed. There is a need for studies investigating the influence of dose, age, and other factors, such as comorbid conditions, on serious adverse events, which would help estimate possible risks in specific patient populations. Future studies should also examine functional outcomes that are important to patients, including health-related quality of life, relationships, academic and occupational performance, and legal interactions.
Existing studies on the comparative effectiveness of individual FGAs and SGAs preclude drawing firm conclusions because of sparse data and imprecise effect estimates. There were relatively few differences of clinical importance among 114 studies. The current evidence base is inadequate for clinicians and patients to make informed decisions about treatment. Outcomes potentially important to patients were rarely assessed. Data on long-term safety are lacking and urgently needed.