Comparative Benefits and Harms of Second-Generation Antidepressants for Treating Major Depressive Disorder

BACKGROUND
Second-generation antidepressants dominate the management of major depressive disorder (MDD), but evidence on the comparative benefits and harms of these agents is contradictory.


PURPOSE
To compare the benefits and harms of second-generation antidepressants for treating MDD in adults.


DATA SOURCES
English-language studies from PubMed, Embase, the Cochrane Library, PsycINFO, and International Pharmaceutical Abstracts from 1980 to August 2011 and reference lists of pertinent review articles and gray literature.


STUDY SELECTION
2 independent reviewers identified randomized trials of at least 6 weeks' duration to evaluate efficacy and observational studies with at least 1000 participants to assess harm.


DATA EXTRACTION
Reviewers abstracted data about study design and conduct, participants, and interventions and outcomes and rated study quality. A senior reviewer checked and confirmed extracted data and quality ratings.


DATA SYNTHESIS
Meta-analyses and mixed-treatment comparisons of response to treatment and weighted mean differences were conducted on specific scales to rate depression. On the basis of 234 studies, no clinically relevant differences in efficacy or effectiveness were detected for the treatment of acute, continuation, and maintenance phases of MDD. No differences in efficacy were seen in patients with accompanying symptoms or in subgroups based on age, sex, ethnicity, or comorbid conditions. Individual drugs differed in onset of action, adverse events, and some measures of health-related quality of life.


LIMITATIONS
Most trials were conducted in highly selected populations. Publication bias might affect the estimates of some comparisons. Mixed-treatment comparisons cannot conclusively exclude differences in efficacy. Evidence within subgroups was limited.


CONCLUSION
Current evidence does not warrant recommending a particular second-generation antidepressant on the basis of differences in efficacy. Differences in onset of action and adverse events may be considered when choosing a medication.


PRIMARY FUNDING SOURCE
Agency for Healthcare Research and Quality.

M ajor depressive disorder (MDD) affects more than 16% of adults at some point during their lifetime (1). The estimated U.S. economic burden of depressive disorders is approximately $83 billion annually (2), and projected workforce productivity losses related to depression are $24 billion annually (3).
Pharmacotherapy is the primary choice for medical management of MDD. As of 2005, approximately 27 million persons in the United States had received antidepressant therapy (4). Second-generation antidepressants now comprise most antidepressant prescriptions. These drugs include selective serotonin reuptake inhibitors (SSRIs), serotonin and norepinephrine reuptake inhibitors, and other drugs with related mechanisms of action that selectively target neurotransmitters ( Table 1). In 2009, these drugs accounted for $9.9 billion in U.S. sales and were the fourth top-selling therapeutic class of prescription drugs (5).
Several systematic reviews have assessed the comparative efficacy and safety of second-generation antidepressants (6 -14). Two recent comparative effectiveness reviews provide the most comprehensive, albeit contradictory, assessments to date (15,16). One review, conducted by some of the authors of this article, concluded that efficacy does not differ substantially among second-generation antidepressants (16); conversely, the MANGA (Multiple Meta-Analyses of New Generation Antidepressants) study group reported that escitalopram and sertraline have the best efficacy-acceptability ratio compared with other secondgeneration antidepressants (15).
This article updates a previous systematic review funded by the Agency for Healthcare Research and Quality (AHRQ) (16) and uses the same statistical approach as the MANGA study group did. We assessed evidence on comparative benefits and harms of second-generation antidepressants for acute, continuation, and maintenance phases of MDD, including variations of effects in patients with accompanying symptoms and among patient subgroups.

METHODS
An open process involving the public (described at www .effectivehealthcare.ahrq.gov/index.cfm/what-is-comparative -effectiveness-research1/what-is-the-research-process), the Scientific Resource Center for the Effective Health Care Program of the AHRQ, and various stakeholder groups produced key questions. We followed a standardized protocol for all review steps (17).

Data Soures and Searches
We searched PubMed, Embase, PsycINFO, the Cochrane Library, and International Pharmaceutical Abstracts from 1980 to August 2011. We used Medical Subject Heading terms as search terms when available or keywords when appropriate. We combined terms for MDD with a list of 13 second-generation antidepressants (bupropion, citalopram, desvenlafaxine, duloxetine, escitalopram, fluoxetine, fluvoxamine, mirtazapine, nefazodone, paroxetine, sertraline, trazodone, and venlafaxine) and their trade names. We limited electronic searches to "adult 19 ϩ years," "human," and "English language." We also performed semiautomated manual searches of reference lists of pertinent review articles and letters to the editor by using Scopus (18).

Context
Multiple second-generation antidepressants with different pharmacologic actions are available for treating major depressive disorder in adults.

Contribution
This comparative effectiveness review of 234 studies found no clinically important differences in treatment response among second-generation antidepressants. Differences among agents did exist in onset of action, dosing regimens, and adverse effects.

Caution
Most studies were efficacy trials conducted in selected populations.

Implication
Possible side effects, convenience of dosing regimens, and costs may best guide the choice of a second-generation antidepressant for treating major depression in adults, because these agents probably have similar efficacy.

Study Selection
Two persons independently reviewed abstracts and full-text articles. Studies reported only in abstract form were excluded. To assess efficacy or effectiveness, we included head-to-head randomized, controlled trials (RCTs) of at least 6 weeks' duration that compared 2 drugs. Because many comparisons lacked head-to-head evidence, we included placebo-controlled trials for indirect comparisons. All outcomes of interest were health-related (for example, response, remission and quality of life).
To specifically assess harms, we examined RCTs as well as data from observational studies with 1000 participants or more and a follow-up of 12 weeks or more. To determine the differences of benefits and harms in subgroups and participants with accompanying symptoms, we reviewed head-to-head and placebo-controlled trials. We included meta-analyses if we believed them to be relevant for a key question and of good or fair methodological quality (19).
We excluded studies that both reviewers agreed did not meet eligibility criteria. Investigators resolved disagreements about inclusion or exclusion by consensus or by involving a third reviewer.

Data Extraction and Quality Assessment
Trained reviewers abstracted data from each study and assigned an initial quality rating by using the Web-based data abstraction form SRSNexus, version 4.0 (Mobius Analytics, Ottawa, Ontario, Canada). A senior reviewer evaluated completeness of data abstraction and confirmed the quality rating.
To assess trial quality (risk for bias), we used predefined criteria based on those developed by the U.S. Preventive Services Task Force (ratings of good, fair, or poor) (20) and the National Health Service Centre for Reviews and Dissemination (21). To assess the quality of observational studies, we used criteria outlined by Deeks and colleagues (22). We rated studies with a high risk for bias in 1 or more categories as "poor" quality and excluded them from the analyses.
To identify effectiveness studies, we used a tool that distinguishes them from efficacy trials on the basis of certain elements of study design (23). To evaluate the comparability of drug doses, we considered a large range of doses within and across studies. Because no reference standard exists for comparing doses among drugs, we had previously created a comparative dose classification system to identify gross inequities in comparisons of drug doses (24). We used this roster, which does not indicate dosing equivalence, to detect inequalities in dosing that could affect comparative efficacy and effectiveness.

Data Synthesis and Analysis
We conducted meta-analyses of head-to-head comparisons if 3 or more studies provided data to calculate either the odds ratio (OR) of achieving response (defined as Ͼ50% improvement from baseline) or the weighted mean difference of changes on the Hamilton Rating Scale for Depression (HAM-D) or the Montgomery-Asberg Depression Rating Scale (MADRS).
For each meta-analysis, we tested for heterogeneity by using the Cochran Q test and estimated the extent of heterogeneity by using the I 2 statistic. If heterogeneity was high (Ͼ60%), we explored differences in clinical and methodological characteristics among studies considered for meta-analyses. We assessed publication bias by using funnel plots and Kendall rank correlation.
Lacking head-to-head evidence for many drug comparisons, we conducted mixed-treatment comparisons of head-to-head and placebo-controlled trials by using Bayesian methods (25,26). Because of clinical heterogeneity, we did not include studies conducted in patients older than 65 years. Our outcome measure of choice was the rate of response on the HAM-D. We recalculated response rates for each study by using the number of all randomly assigned patients as the denominator.
We gave all drug effect parameters flat normal (0, 1000) priors and gave the between-study SD flat, uniform distributions with a large range. We discarded a burn-in of 20 000 simulations. All results are based on a further sample of 80 000 simulations. We calculated the OR and 95% credible interval (CrI) for all possible comparisons among our drugs of interest.
All statistical analyses were performed by using Stats-Direct Statistical Software, version 2.7.7 (StatsDirect, Cheshire, United Kingdom). We computed Bayesian inferences by using a Markov-chain Monte Carlo simulation with WinBUGS, version 1.4.3 (Medical Research Council Biostatistical Unit, Cambridge, United Kingdom). We evaluated the strength of evidence for major comparisons and outcomes by using a modified Grading of Recommendations Assessment, Development and Evaluation approach (27).

Role of the Funding Source
The AHRQ participated in formulating the key questions and reviewed planned methods and data analyses, as well as interim and final evidence reports. The AHRQ had no role in study selection, quality ratings, and interpretation in or synthesis of the evidence.

RESULTS
Our searches identified 3927 citations (Appendix Figure  1, available at www.annals.org). We included 234 studies of good or fair quality, of which 118 were head-to-head RCTs presented in report form at www.effectivehealthcare .ahrq.gov. Pharmaceutical companies financially supported most of the studies (77%), governmental agencies or independent funds supported 7%, and undetermined sources funded 16%. Funnel plots of head-to-head trials did not indicate publication bias.
Overall, comparative efficacy and effectiveness of second-generation antidepressants did not differ substantially for treating patients with MDD. These findings pertain to patients in the acute, continuation, and maintenance phases of this condition; those with accompanying symptom clusters; and subgroups defined by age, sex, ethnicity, or comorbid conditions, although only sparse evidence for these findings exists for subgroups. Overall, 37% of patients with acute-phase MDD who received first-line treatment did not achieve response within 6 to 12 weeks, and 53% did not achieve remission.

Comparative Efficacy for Acute-Phase Treatment of MDD
Ninety-three good-or fair-quality head-to-head trials that included more than 20 000 patients compared the efficacy or effectiveness of the treatment of acute-phase MDD. These studies provided direct evidence for 40 of 78 possible comparisons among these drugs. Direct evidence from head-to-head trials was sufficient to conduct metaanalyses for 6 drug-drug comparisons. In addition, we conducted mixed-treatment comparisons of response rates for all comparisons, incorporating 64 placebo-controlled or head-to-head trials.
Overall, treatment effects were similar among secondgeneration antidepressants ( Table 2). Some analyses yielded statistically significant differences among treatments, but the magnitudes of differences were modest and probably not clinically relevant.
The 2 largest relative differences in response rates were between escitalopram and citalopram and fluoxetine and venlafaxine, but absolute differences were modest. On average, 62% of patients receiving escitalopram and 56% receiving citalopram achieved a response. The pooled difference of the reduction of points on the MADRS scale was 1.52 in favor of escitalopram (CI, 0.59 to 2.45 points), which is approximately one sixth of the average SD of change on the MADRS scale in trials.
The additional benefit of venlafaxine versus fluoxetine was similarly modest. On average, 65% of patients receiving venlafaxine and 60% receiving fluoxetine achieved a response. Pooled results of reductions of points on the HAM-D showed a non-statistically significant 1. Seventeen studies (n ϭ 3960) indicated no differences in health-related quality of life ( Table 2) (30, 37, 41, 44 -47, 49 -58) Seven studies, all funded by the maker of mirtazapine, reported that this agent has a significantly faster onset of action than some comparators (49, 50, 55, 59 -62). After 4 weeks of treatment, most response rates among the drugs studied were similar. In 1 trial, mirtazapine and venlafaxine did not differ in speed of action (52).

Achieving Response in Unresponsive or Recurrent Disease
Overall, 37% of patients did not achieve a treatment response during 6 to 12 weeks of treatment with secondgeneration antidepressants, and 53% did not achieve remission. The STAR*D (Sequenced Treatment Alternatives to Relieve Depression) trial (63) provides the best evidence for assessing alternative medications among patients in whom initial therapy has failed. Approximately 1 in 4 of the 727 participants who switched medications after initial treatment failure became symptom-free; however, no statistically significant difference was seen in patients who switched to sustained-release bupropion, sertraline, or extended-release venlafaxine. In 3 additional head-to-head trials involving patients with treatment-resistant depression, response and remission rates were numerically better with venlafaxine than with comparators (64 -67), but differences generally were not statistically significant.

Maintaining Response or Remission After Successful Treatment
In several head-to-head trials (68 -75), overall efficacy in maintaining remission did not significantly differ between escitalopram and desvenlaxafine (74), escitalopram and paroxetine (72), fluoxetine and sertraline (68), fluox- Consistent results from 7 fair-quality trials suggest that mirtazapine has a statistically significantly faster onset of action than citalopram, fluoxetine, paroxetine, and sertraline. Whether this difference favoring mirtazapine can be extrapolated to other second-generation antidepressants is unclear. Most other trials do not indicate a faster onset of action of a particular second-generation antidepressant compared with another.
Maintaining response or remission † Comparative efficacy Moderate Findings from 5 efficacy trials and 1 naturalistic study show no statistically significant differences in preventing relapse or recurrence between escitalopram and paroxetine, fluoxetine and sertraline, fluoxetine and venlafaxine, fluvoxamine and sertraline, and trazodone and venlafaxine.

Managing treatment-resistant depression
Comparative efficacy Low Results from 3 trials support modestly better efficacy for venlafaxine compared with citalopram, fluoxetine, and paroxetine. Comparative effectiveness Low Results from 2 effectiveness studies are conflicting. One good-quality trial showed no statistically significant differences in effectiveness among sustained-release bupropion, sertraline, and extended-release venlafaxine. One fair-quality effectiveness trial found venlafaxine to be modestly superior to citalopram, fluoxetine, mirtazapine, paroxetine, and sertraline; however, differences may not be clinically relevant.

Treating depression in patients with accompanying symptom clusters Anxiety
Comparative efficacy for depression Moderate Results from 5 fair-quality head-to-head trials suggest that efficacy does not differ substantially for treatment of depression in patients with accompanying anxiety. Comparative efficacy for anxiety Moderate Results from 8 fair-quality head-to-head trials and 3 fair-quality placebo-controlled trials suggest that no substantial differences in efficacy exist among second-generation antidepressants for treatment of accompanying anxiety. Insomnia Comparative efficacy for depression Insufficient Evidence from 1 fair-quality head-to-head study is insufficient to draw conclusions about the comparative efficacy for treating depression in patients with coexisting insomnia. Comparative efficacy for insomnia Low Evidence from 5 fair-quality head-to-head trials suggests that no substantial differences in efficacy exist among second-generation antidepressants for treatment of accompanying insomnia. etine and venlafaxine (73,75), fluvoxamine and sertraline (69,70), and trazodone and venlafaxine (71). One of these studies reported a significantly shorter time to recurrence with fluoxetine than with venlafaxine during 2 years of maintenance treatment (75). In one naturalistic study, rehospitalization rates did not differ between patients continuing therapy with fluoxetine versus venlafaxine (76).

Efficacy or Effectiveness in Treating Depression or Accompanying Symptoms
Clinicians may use symptom clusters that accompany depression (for example, anxiety and insomnia) to guide antidepressant selection. We identified studies addressing 7 symptom clusters: anxiety, insomnia, low energy, pain, psychomotor change (retardation or agitation), melancholia (a subtype of depression that is a severe form of MDD with characteristic somatic symptoms), and somatization (physical symptoms that are manifestations of depression rather than of an underlying physical illness). Table 2 summarizes these findings.

Treatment of Depression in Patients With Accompanying Symptom Clusters
For patients with MDD and accompanying anxiety, 4 head-to-head trials (45, 77-79) suggested that antide-pressants have similar antidepressive efficacy. Two of these studies compared SSRIs (fluoxetine, paroxetine, and sertraline) (77, 78), 1 compared sertraline and sustained-release bupropion (79), and 1 compared sertraline and extended-release venlafaxine (45). One study reported a greater decrease in severity of depression and higher response rates with venlafaxine than with fluoxetine (75% vs. 49%) (39).
For other symptom clusters, such as insomnia (35), melancholia (78,80), or psychomotor changes (78), most studies indicated similar treatment effects for depression among compared drugs. Because these studies were small or had conflicting results, the strength of the evidence is low.

Treatment of Accompanying Symptom Clusters in Patients With Depression
Results from 8 head-to-head trials suggested that antidepressant medications do not differ in efficacy for treating anxiety associated with MDD. Among these studies, 4 compared SSRIs (including escitalopram, fluoxetine, sertraline, and paroxetine) (77, 81-83); 3 compared paroxetine and nefazodone (84), citalopram and mirtazapine (50), and sertraline and sustained-release bupropion (79); and 1 compared extended-release venlafaxine and sertraline (45). Only 1 trial (146 participants) reported that patients receiving venlafaxine had statistically significantly greater reductions in Covi Anxiety Scale scores (5.7 vs. 3.9) than those receiving fluoxetine (39). For insomnia, 2 studies suggested greater improvement in sleep scores with trazodone than with fluoxetine (47) and venlafaxine (71). In 3 other studies, rates of insomnia did not significantly differ in patients receiving escitalopram or fluoxetine (83); fluoxetine, paroxetine, or sertaline (35); or fluoxetine or mirtazapine (55). A well-conducted meta-analysis (85) of 3 fairquality head-to-head trials (86 -88) and 1 poor-quality trial (89) (1466 participants) found no substantial difference between duloxetine and paroxetine in the relief of accompanying pain.

Risk for Harms
We analyzed 93 head-to-head studies and 48 additional studies of both experimental and observational design to assess the comparative risk for harm. We distinguished adverse events from serious adverse events on the basis of an FDA classification. A serious adverse event is any medical occurrence that results in death, is lifethreatening, requires hospitalization, results in persistent or substantial disability or incapacity, or is a congenital birth defect (90). Table 3 summarizes these findings.

Favors First Drug Favors Second Drug
h-h ϭ head-to-head; MA ϭ meta-analysis; MTC ϭ mixed-treatment comparison; SNRI ϭ serotonin and norepinephrine reuptake inhibitor; SSRI ϭ serotonin reuptake inhibitor. * The first number indicates the number of trials directly comparing 2 drugs; the second indicates the number of additional studies used to perform MTCs.

Adverse Events and Discontinuation of Therapy
In efficacy trials, an average of 63% of patients experienced at least 1 adverse event during treatment. Diarrhea, dizziness, dry mouth, fatigue, headache, nausea, sexual dysfunction, sweating, tremor, and weight gain were commonly reported. Overall, second-generation antidepressants caused similar adverse events; however, the frequency of specific events differed among some drugs (Appendix Table 1, available at www.annals.org).
Overall discontinuation rates were similar between SSRIs and other second-generation antidepressants (range of means, 15% to 25%). Duloxetine had a 67% (CI, 17% to 139%) and venlafaxine had a 40% (CI, 16% to 73%) higher risk for discontinuation of therapy because of adverse events than SSRIs as a class did. Discontinuation rates due to lack of efficacy were similar between SSRIs and other secondgeneration antidepressants except for venlafaxine. Venlafaxine had a 34% (CI, 47 to 93) lower risk for discontinuation of therapy because of lack of efficacy than SSRIs did.

Serious Adverse Events
Except for sexual dysfunction, trials and observational studies were too small and than durations were too short to assess the comparative risks for rare but serious adverse events, such as suicidality, seizures, cardiovascular events, the serotonin syndrome, hyponatremia, or hepatotoxicity. Sexual Dysfunction. Five trials and a pooled analysis (2399 participants) of 2 identical RCTs provided evidence that bupropion causes lower rates of sexual dysfunction than escitalopram (91), fluoxetine (92), paroxetine (93), and sertraline (94 -96). Compared with other second-generation antidepressants, paroxetine frequently caused higher rates of sexual dysfunction, particularly ejaculatory dysfunction. These differences, however, did not always reach statistical significance (35,44,60,81,(97)(98)(99)(100)(101).
Underreporting of sexual dysfunction in efficacy studies is likely. A fair-quality Spanish prospective, observational study (1022 participants) reported that 59% of patients treated with second-generation antidepressants experienced sexual dysfunction (102).
Suicidality. Although suicide is relatively rare and affects approximately 1 in 8000 psychiatric patients treated with second-generation antidepressants, 1 in 166 patients reported suicidal feelings while receiving treatment with a second-generation antidepressant (103).
Thirteen studies assessed the risk for suicidality (defined as suicidal thinking or behavior) in patients treated with second-generation antidepressants (104 -116). Data on the comparative risk for suicidality among secondgeneration antidepressants were sparse. Results from existing studies did not indicate that any particular drug of interest had an excess risk compared with other secondgeneration antidepressants (106 -109, 113, 116).
Several large observational studies determined that second-generation antidepressants cause a general increase in the risk for suicidality (106, 107, 116). A recent metaanalysis of observational studies in a combined population

General tolerability
Adverse events profiles High Adverse events profiles are similar among second-generation antidepressants. Differences exist in the incidence of specific adverse events. Nausea and vomiting High Meta-analysis of 15 fair-quality studies indicates that venlafaxine has a higher rate of nausea and vomiting than SSRIs as a class. Diarrhea Moderate Evidence from multiple fair-quality studies indicates that sertraline has a higher incidence of diarrhea than bupropion, citalopram, fluoxetine, fluvoxamine, mirtazapine, nefazodone, paroxetine, and venlafaxine. Weight change Moderate Seven fair-quality trials indicate that mirtazapine causes greater weight gain than citalopram, fluoxetine, paroxetine, and sertraline. Somnolence Moderate Six fair-quality studies provide evidence that trazodone has a higher rate of somnolence than bupropion, fluoxetine, mirtazapine, paroxetine, and venlafaxine. The discontinuation syndrome Moderate A good-quality systematic review provides evidence that paroxetine and venlafaxine have the highest rates of the discontinuation syndrome; fluoxetine has the lowest. Discontinuation rates High Meta-analyses of efficacy trials indicate that overall discontinuation rates are similar among second-generation antidepressants. Venlafaxine has a higher rate of discontinuation due to and a lower rate of discontinuation due to lack of efficacy than SSRIs as a class.  (116). These findings are consistent with an FDA data analysis of more than 99 000 participants of 372 trials (103). The FDA identified that the risk of suicidality is increased in children and patients aged 18 to 24 years but not in other adult patients.

Serious adverse events
Other Serious Adverse Events. Evidence on the comparative risk for rare but severe adverse events, such as seizures, cardiovascular events, hyponatremia, hepatotoxicity, and the serotonin syndrome, is insufficient to draw firm conclusions.

Treatment of Major Depressive Disorder in Subgroups
No study directly compared efficacy, effectiveness, and harms of second-generation antidepressants between subgroups and the general population for treatment of MDD. However, numerous studies conducted subgroup analyses or used subgroups as the study population (Appendix Table 2, available at www.annals.org).
Multiple head-to-head trials (36,58,(117)(118)(119)(120)(121)(122)(123)(124)(125) indicated that the efficacy of second-generation antidepressants did not differ in participants aged 55 years or older. Efficacy trials usually did not address differences in efficacy or effectiveness between men and women. Two head-to-head RCTs provided limited evidence on adverse sexual effects of these agents; 1 reported a higher risk for sexual dysfunction in men than in women receiving paroxetine (93), and the other reported greater sexual dysfunction in women receiving paroxetine than in those receiving sertraline (44).
No head-to head trials or other studies directly compared differences in efficacy, effectiveness, and harms among groups identified by race or ethnicity or between patients with depression and comorbid conditions and the general population. One recent RCT reported no differences between citalopram and fluoxetine in participants with type 2 diabetes and MDD (126).

DISCUSSION
In this systematic review of data from 234 studies, direct and indirect comparisons of second-generation antidepressants showed no substantial differences in efficacy for the treatment of MDD. Statistically significant results were small and are unlikely to have clinical relevance. No differences in efficacy were seen in patients with accompanying symptoms or in subgroups based on age, sex, ethnicity, or comorbid conditions.
Although second-generation antidepressants are similar in efficacy, they cannot be considered identical drugs. Differences with respect to onset of action, adverse events, and some measures of health-related quality of life may be clinically relevant and influence the choice of a medication for a specific patient. For example, mirtazapine has a faster onset of action than citalopram, fluoxetine, paroxetine, and sertraline (49, 55, 60 -62), whereas bupropion has fewer sexual side effects than escitalopram, fluoxetine, paroxetine, and sertraline (91,92,94,96,127).
Our findings are consistent with results of most other systematic reviews assessing the comparative efficacy and safety of second-generation antidepressants (8 -14). Our conclusions contradict some findings of the 2009 MANGA study, which indicated that escitalopram and sertraline have the best efficacy-acceptability ratio compared with that of other agents (15). The MANGA study, however, has been criticized for methodological shortcomings (128 -132). Specifically, the authors included studies with a high risk for bias and open-label designs, assumed that a response on the HAM-D equals a response on MADRS or the Clinical Global Inventory, excluded placebo-controlled trials in their network meta-analysis, and overstated the importance of statistically significant findings without considering clinical relevance. In particular, the assumption that responses on different scales are comparable is not evidence-based (133) and thus might introduce substantial bias in a mixed-treatment comparison model.
For the current update of our review, we used the same statistical methods as the authors of the MANGA study, although we retained more rigid systematic review methods. We specifically excluded studies with high risk for bias or open-label designs and limited mixed-treatment comparisons to ORs of response on a single diagnostic scale (HAM-D). Furthermore, whenever possible, we used meta-analyses of head-to-head trials to determine the relative efficacy.
Our study has several limitations. Most important, we primarily derived our conclusions from efficacy trials with highly selected populations. For example, for data on acute-phase MDD, we found only 3 effectiveness studies (37,120,134) out of 93 head-to-head RCTs. Two of these effectiveness studies were conducted in Europe, and their applicability to the U.S. health care system might be limited. Although findings from effectiveness studies are generally consistent with those from efficacy trials, the evidence is limited to a few comparisons.
Indirect comparisons have methodological limitations, most prominently the assumption that prognostic factors for a specific outcome (for example, response to treatment) are similar across study populations in the network metaanalyses. Nevertheless, they are a valuable additional analytic tool when available head-to-head evidence is insufficient.
Publication bias is a concern for all systematic reviews and has been empirically proven to be problematic for placebo-controlled trials of second-generation antidepressants (135,136). Selective availability of studies with positive results can seriously bias conclusions, particularly when a pharmaceutical company compares 2 of its own drugs (as in the case of citalopram and escitalopram). The small number of studies for individual comparisons limits the validity of statistical methods to explore publication bias, such as funnel plots.
How do these findings that pharmacologic differences among second-generation antidepressants do not translate into substantial clinical differences, although tolerability may differ, inform the practicing clinician? Given the difficulty in predicting what medication will be both efficacious for and tolerated by an individual patient, familiarity with a broad spectrum of antidepressants is prudent. Existing evidence of efficacy, however, does not warrant choosing a particular second-generation antidepressant as firstline therapy for acute-phase MDD or as a subsequent treatment in patients who do not respond to therapy or experience remission. Because of differences in adverse events and dosing regimens, engaging in informed decision making can help physicians to take patient preferences into consideration. Disclaimer: This manuscript and the work it is derived from was commissioned by the Agency of Healthcare Research and Quality (AHRQ), through a contract to the xyz Evidence-based Practice Center (contract 290200710056l#2). While AHRQ has approved the assertion of copyright by the authors, as noted in the attached letter from the AHRQ Contracting Officer, the government retains rights to the use of the manuscript according to the contract and the Federal Acquisition Regulations (FAR). In order to facilitate and meet the need of public access to research works and findings funded by the government, AHRQ will publish the full report from which this manuscript is derived. The original report from which this manuscript was derived has undergone rigorous peer and public review through the Effective Health Care Program.

Favors First Drug Favors Second Drug
MTC ϭ mixed-treatment comparison; SNRI ϭ serotonin and norepinephrine reuptake inhibitor. * The first number indicates the number of trials directly comparing 2 drugs; the second indicates the number of additional studies used to perform MTCs.