Rongwei Fu, PhD; Shelley Selph, MD; Marian McDonagh, PharmD; Kimberly Peterson, MS; Arpita Tiwari, MHS; Roger Chou, MD; Mark Helfand, MD, MS
Note: Annals peer review materials (original and revised manuscripts and communications, including peer reviewer, editorial, statistical, and author comments) are available at www.annals.org (see the Supplement).
Disclaimer: The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the YODA Project or Medtronic.
Acknowledgment: The authors thank Robin Paynter, MLIS, for conducting literature searches and Howard Balshem, MS; Susan Carson, MPH; Elaine Graham, MLS; Allison Lowe, BA; Edwin Reid, MS, MAT; Katie Reitel, MPH, MSW; Sujata Thakurta, MA; Ngoc Wasson, MPH; and Leah Williams, BS, for their contributions to this article.
Financial Support: By a research subcontract to Oregon Health & Science University under a sponsored research agreement between Yale University and Medtronic.
Potential Conflicts of Interest: Disclosures can be viewed at www.acponline.org/authors/icmje/ConflictOfInterestForms.do?msNum=M12-2731.
Reproducible Research Statement: Study protocol: Available at www.crd.york.ac.uk/Prospero. Statistical code: SAS codes for meta-analysis are in the Appendix and are also available from Dr. Fu (e-mail, email@example.com). Data set: Accessible through the Yale University Open Access Data (YODA) Project at http://medicine.yale.edu/core/projects/yodap/index.aspx.
Requests for Single Reprints: Rongwei Fu, PhD, Oregon Health & Science University, 3181 SW Sam Jackson Park Road, Mail Code CSB669, Portland, OR 97239.
Current Author Addresses: Dr. Fu: Oregon Health & Science University, 3181 SW Sam Jackson Park Road, Mail Code CSB669, Portland, OR 97239.
Drs. Selph, McDonagh, Chou, and Helfand and Ms. Peterson: Oregon Health & Science University, 3181 SW Sam Jackson Park Road, Mail Code BICC, Portland, OR 97239.
Ms. Tiwari: Oregon State University, College of Public Health and Human Sciences, 401 Waldo Hall, Corvallis, OR 97331.
Author Contributions: Conception and design: R. Fu, S. Selph, M. McDonagh, A. Tiwari, R. Chou, M. Helfand.
Analysis and interpretation of the data: R. Fu, S. Selph, M. McDonagh, K. Peterson, A. Tiwari, R. Chou, M. Helfand.
Drafting of the article: R. Fu, S. Selph, R. Chou, M. Helfand.
Critical revision of the article for important intellectual content: R. Fu, S. Selph, M. McDonagh, A. Tiwari, R. Chou, M. Helfand.
Final approval of the article: R. Fu, S. Selph, M. McDonagh, K. Peterson, R. Chou, M. Helfand.
Statistical expertise: R. Fu, A. Tiwari, R. Chou.
Obtaining of funding: R. Fu, M. Helfand.
Administrative, technical, or logistic support: R. Chou, M. Helfand.
Collection and assembly of data: R. Fu, S. Selph, K. Peterson, A. Tiwari, M. Helfand.
Fu R., Selph S., McDonagh M., Peterson K., Tiwari A., Chou R., Helfand M.; Effectiveness and Harms of Recombinant Human Bone Morphogenetic Protein-2 in Spine Fusion: A Systematic Review and Meta-analysis. Ann Intern Med. 2013;158:890-902. doi: 10.7326/0003-4819-158-12-201306180-00006
Download citation file:
Published: Ann Intern Med. 2013;158(12):890-902.
Appendix: SAS Code
Recombinant human bone morphogenetic protein-2 (rhBMP-2) is used as a bone graft substitute in spinal fusion, which unites (fuses) bones in the spine. The accuracy and completeness of journal publications of industry-sponsored trials on the effectiveness and harms of rhBMP-2 has been called into question.
To independently assess the effectiveness and harms of rhBMP-2 in spinal fusion and reporting bias in industry-sponsored journal publications.
Individual-patient data (IPD) from 17 industry-sponsored studies; related internal documents; and searches of MEDLINE (1996 to August 2012), other databases, and reference lists.
Randomized, controlled trials (RCTs) and cohort studies of rhBMP-2 versus any control and uncontrolled studies of harms.
Effectiveness outcomes in IPD were recalculated using consistent definitions. Study characteristics and results were abstracted by 1 investigator and confirmed by another. Two investigators independently assessed quality using predefined criteria.
Thirteen RCTs and 31 cohort studies were included. For lumbar spine fusion, rhBMP-2 and iliac crest bone graft were similar in overall success, fusion, and other effectiveness measures and in risk for any adverse event, although rates were high across interventions (77% to 93% at 24 months from surgery). For anterior lumbar interbody fusion, rhBMP-2 was associated with nonsignificantly increased risk for retrograde ejaculation and urogenital problems. For anterior cervical spine fusion, rhBMP-2 was associated with increased risk for wound complications and dysphagia. At 24 months, the cancer risk was increased with rhBMP-2 (risk ratio, 3.45 [95% CI, 1.98 to 6.00]), but event rates were low and cancer was heterogeneous. Early journal publications misrepresented the effectiveness and harms through selective reporting, duplicate publication, and underreporting.
Outcome assessment was not blinded, and ascertainment of harms in trials was poor. No trials were truly independent of industry sponsorship.
In spinal fusion, rhBMP-2 has no proven clinical advantage over bone graft and may be associated with important harms, making it difficult to identify clear indications for rhBMP-2. Earlier disclosure of all relevant data would have better informed clinicians and the public than the initial published trial reports did.
Yale University and Medtronic.
The most common surgery for chronic low back pain with lumbar disc degenerative conditions is vertebral fusion (1) to restrict spinal motion and remove the presumed cause of pain. An interbody fusion, which involves removal of a degenerated intervertebral disc and fusion of the adjacent vertebral bodies, can be performed via an anterior (anterior lumbar interbody fusion [ALIF]), posterior (posterior lumbar interbody fusion [PLIF]), or transforaminal (transforaminal lumbar interbody fusion [TLIF]) approach. Posterolateral lumbar fusion (PLF) involves adjacent transverse processes.
Spinal fusions usually use graft material from the patient's iliac crest to promote fusion. In 2002, the U.S. Food and Drug Administration (FDA) approved recombinant human bone morphogenetic protein-2 (rhBMP-2), a genetically engineered protein with bone growth–stimulating properties, as a bone graft substitute in conjunction with a device implant (LT-CAGE) for single-level ALIF (2). In December 2003, the FDA approved the use of rhBMP-2 with another implant (Inter Fix) for similar indications (3). In clinical practice, rhBMP-2 has primarily been used “off-label” in PLF and TLIF (4).
Before 2009, trials sponsored by Medtronic (Minneapolis, Minnesota), the sole manufacturer of rhBMP-2 devices, reported beneficial effects with no or few adverse events (5–8). Subsequent observational studies reported serious complications associated with rhBMP-2 in cervical spine fusion (9–12), and FDA documents summarizing Medtronic-sponsored trials seemed to indicate substantially more adverse events than reported in journal publications (13). In 2008, the FDA issued a public health notification of life-threatening complications (swelling of the neck and throat resulting in compression of the airway and other structures) associated with off-label use of rhBMP-2 in cervical spine fusion (14).
Selective reporting or underreporting of outcomes in journal publications may have misrepresented the balance of benefits and harms of rhBMP-2 (13, 15). Our study aimed to estimate effectiveness and harms of rhBMP-2 in spinal fusion in a systematic review by using individual-patient data (IPD) when available (aim 1) and to assess reporting biases in published articles of industry-sponsored studies (aim 2).
We registered a short version of the review protocol at the PROSPERO registry of systematic reviews (16) on 23 February 2012 and deposited the full protocol with the Yale University Open Data Access (YODA) Project. Detailed methods and additional analyses are available elsewhere (17).
We used 4 sources of data: Medtronic IPD, related protocols, and data dictionaries (source 1); Medtronic internal reports (source 2); documents from the FDA Web site (source 3); and a broad-based literature search (source 4) (Appendix Table 1) to identify additional studies on rhBMP-2 and publications related to Medtronic-sponsored studies. For aim 1, we used data from sources 1, 2 and 4, and for aim 2, we compared the journal publications with other sources.
Appendix Table 1. Search Strategies
For sources 1 and 2, the YODA Project provided de-identified patient-level data, protocols, data dictionaries, and Medtronic internal reports for all 17 Medtronic-funded studies of rhBMP-2 in spinal fusion completed or terminated by December 2011. The internal reports included summaries of study data and brief adverse event case histories. We also received 1229 MedWatch adverse event reports.
For sources 3 and 4, we searched MEDLINE (1996 to August 2012), EMBASE, the Cochrane Library (third quarter 2012), Scopus, ClinicalTrials.gov, and the FDA Web site and manually searched reference lists.
For aim 1, two reviewers independently assessed each article for eligibility. For effectiveness and harms, we included controlled clinical trials and cohort studies of rhBMP-2 in spinal fusion. For harms, we also included uncontrolled intervention series. We excluded studies that combined results of rhBMP-2 with those of other bone morphogenetic proteins unless we could determine that rhBMP-2 was predominantly used. For aim 2, we identified publications in peer-reviewed journals that reported results from 1 or more Medtronic trials.
One investigator abstracted patient and study characteristics and results, and a second reviewed the abstracted data for accuracy. For Medtronic-funded studies, quality assessment was based on information from trial protocols and internal reports. Two investigators independently rated the quality of each study as good, fair, or poor using criteria adapted from the Cochrane Back Review Group (18) and the U.S. Preventive Services Task Force (19). Discrepancies were resolved through consensus.
We used the study protocols and ClinicalTrials.gov entries to determine prespecified primary outcomes. In 9 studies, the primary effectiveness measure was “overall success” (at 24 months); fusion was the primary end point in the remainder. Other effectiveness outcomes included pain, disability, neurologic status, function, and return to work. Studies differed slightly in definitions of effectiveness outcomes. To standardize effectiveness measures, we applied consistent definitions (Appendix Table 2) and recoded and recalculated effectiveness outcomes using IPD.
Appendix Table 2. Outcome Variable Definitions/Criteria From Medtronic Protocols Compared With Those in Published Studies and IPD Analysis for Comparative Effectiveness and Harms
We obtained data on adverse events directly from IPD except for urine retention, wound infection, wound dehiscence, and possible lumbar radiculitis, which we identified by reviewing internal report case histories. We applied 4 alternative definitions (Appendix Table 2) for lumbar radiculitis.
Overall success and fusion were determined using multiple criteria; all had to be satisfied for a case to be classified as a success (Appendix Table 2). In the primary analysis, patients meeting some criteria but missing data for others were classified as failures, and patients without data for any criteria were excluded. We also performed 2 sensitivity analyses. In one, patients with missing data for some or all criteria were excluded; in the other, such patients were included as failures. For other binary effectiveness outcomes, patients with missing data were excluded in the primary analysis but included as failures in the sensitivity analysis. For adverse events, we included all patients because we sought to analyze cumulative adverse events from the time of surgery.
We stratified analyses by spinal area (lumbar or cervical) and surgical approach (for example, ALIF or PLF) for all outcomes except cancer and death, for which we combined all surgical approaches because these rare outcomes were not necessarily affected by surgical technique. Only the ALIF and PLF trials provided sufficient data for meta-analyses, which were based on IPD from Medtronic-sponsored trials. We identified 1 additional trial without corresponding IPD (20) and qualitatively compared its results with IPD results.
For effectiveness end points, we calculated outcomes at the time points typically evaluated in the trials: 6 weeks and 3, 6, 12, and 24 months after surgery. For harms, we aggregated data into 2 periods: first, operative and up to 4 weeks postoperative, and second, up to 24 months postoperative. Data beyond 24 months were sparse (Appendix Table 3) and are reported elsewhere (17), except we also report results through 48 months for cancer and death.
Appendix Table 3. Included Medtronic Studies of rhBMP-2
We used mixed-effects models to combine IPD. For continuous outcomes, we used a linear mixed-effects model to obtain a combined mean difference between rhBMP-2 and control groups after adjusting for baseline values and individual study effects (21). We assumed random treatment effects and heterogeneous residual variance across included studies. For common binary outcomes, we used a generalized linear mixed-effects model assuming random treatment effects and binomial distribution with log link to obtain a combined risk ratio (RR). For rare binary outcomes, we used a generalized linear fixed-effects model assuming binomial distribution with log link. We fitted a separate model for each time point. When the generalized linear model with log link could not produce a combined estimate because of ill-fitting data, we provided combined estimates from a 2-step approach described elsewhere (17).
We assessed statistical heterogeneity using the estimated between-study variance from the mixed-effects model (21). We evaluated baseline age, sex, smoking status, diabetes status, previous back surgery, and employment status as potential sources of heterogeneity. We also performed sensitivity analyses by excluding poor-quality studies and studies that used a lower rhBMP-2 concentration and by excluding graft site–related adverse events in analyses of harms. For cancer, we performed sensitivity analyses by excluding events not reportable to the National Cancer Institute Surveillance, Epidemiology and End Results (SEER) Program (skin cancer with low propensity to metastasize). Results of sensitivity analyses were generally similar and, except for cancer and lumbar radiculitis, not reported separately. Meta-analyses of IPD for continuous, common, and rare binary outcomes were performed using PROC MIXED, PROC NLMIXED, and PROC GENMOD, respectively, in SAS, version 9.2 (SAS Institute, Cary, North Carolina) (sample SAS codes are provided in the Appendix).
We rated the strength of evidence by outcome on the basis of the aggregate risk of bias, consistency, directness, and precision of the evidence (22).
We assessed publication and outcome reporting biases and quality of reporting (23) by comparing journal publications with corresponding study protocols, reports, and data dictionaries provided by Medtronic. We used a previously published protocol to classify publications as primary or secondary and to categorize potential sources of reporting bias (24, 25).
The YODA Project proposed the aims for the review and served as the intermediary for data and information requests to Medtronic. Medtronic provided comments on our draft report (26). Neither the YODA Project nor Medtronic influenced the conduct of our analyses or the content of this article.
We included 13 randomized, controlled trials (RCTs), 12 of which were sponsored by Medtronic (n = 1879) and 1 by Norton Healthcare (n = 102) (20) (Appendix Figure 1). All RCTs compared rhBMP-2 with iliac crest bone graft (ICBG) except for study 10, which compared artificial disc replacement with fusion with rhBMP-2 (see Appendix Table 3 for study identification numbers). The trials applied similar eligibility criteria and enrolled similar populations within each surgical approach. We excluded 1 small (n = 3) Medtronic trial. Eight studies enrolled fewer than 100 patients (sample sizes ranged from 14 to 85). At 24 months, follow-up rates were greater than 90% in both groups in 9 of the 12 trials.
Summary of evidence search and selection.
ALIF = anterior lumbar interbody fusion; PLF = posterolateral lumbar fusion; PLIF = posterior lumbar interbody fusion; RCT = randomized, controlled trial.
* One trial is active (Actifuse ABX Versus INFUSE in Posterolateral Instrumented Lumbar Fusion [PLIF] With Interbody Fusion [ClinicalTrials.gov: NCT01013389]); the other is completed, but results have not been found (Spine Fusion Instrumented With BMP-2 vs Uninstrumented With Infuse BMP-2 Alone [ClinicalTrials.gov: NCT00405600]).
† Includes 1 Medtronic RCT with 3 patients.
‡ Documents provided by Medtronic for unpublished studies.
Although we saw some baseline differences between patients receiving ICBG and those receiving rhBMP-2, we did not detect a pattern favoring the latter. The main sources of bias were lack of blinding of surgeons, patients, and outcome assessors (except for radiologic end points). The quality of ascertainment varied across outcomes. Effectiveness outcomes (for example, pain, function, and fusion) were generally ascertained with well-designed questionnaires or scales. For harms, the studies used broad classifications for many adverse events, and events were generally not actively elicited by means of specific symptom questionnaires or objective tests. For example, for retrograde ejaculation, it was unclear how the outcome was defined or whether investigators asked about specific symptoms. No trial defined radiculitis, and adverse events consistent with possible radiculitis were variously classified as back and leg pain, neurologic events, or spinal events. Cancer was not a prespecified end point and was only captured by voluntary reporting (26). Local effects, such as inflammation, heterotopic bone formation, or osteolysis, were seldom reported.
We also identified 31 cohort studies, 47 intervention series, and 34 case series or reports (17). Four intervention series were prospective Medtronic studies (studies 3, 11, 15, and 16 [Appendix Table 3]). Most others were retrospective and small and provided little information on patient characteristics. Most of the cohort studies reported baseline differences between groups or did not report baseline characteristics, had unclear blinding of outcome assessors, and did not adjust for potential confounding.
Five Medtronic-sponsored trials with IPD (studies 1, 2, 4, 5, and 9, which were of fair quality overall) evaluated rhBMP-2 versus ICBG in ALIF. Studies 1, 2, and 9 used rhBMP-2 with the approved LT-CAGE or Inter Fix devices. Studies 4 and 5 used rhBMP-2 with an off-label bone dowel. According to Medtronic, study 5 was terminated early for business reasons, with less than half of the projected sample (n = 180) enrolled (27).
The 5 RCTs (n = 465) provided moderate-strength evidence of no consistent differences between rhBMP-2 and ICBG in overall success, fusion rates, or other effectiveness measures from 6 weeks through 24 months after surgery (Table 1). One exception was that the Short Form-36 Physical Component Summary score was 3 points higher in the rhBMP-2 group at 3, 6, 12, and 24 months. At 24 months, fusion rates ranged from 60% to 100%, and the average overall success rate was 61% for the rhBMP-2 group and 53% for the ICBG group.
Table 1. Effectiveness End Points for ALIF and PLF With rhBMP-2 Versus ICBG
Adverse events were common. Through 4 weeks after surgery, 38% of rhBMP-2 recipients and 45% of ICBG recipients had experienced at least 1 adverse event; by 24 months, about 80% in each group had (Table 2 and Appendix Figure 2). Meta-analysis showed no significant differences between groups for any specific adverse event, including lumbar radiculitis, although estimates were frequently imprecise, precluding strong conclusions (Table 2). For retrograde ejaculation, subsidence (defined as sinking or settling of the device into bone), and urogenital problems, risk estimates favored ICBG but the differences were not statistically significant and CIs were wide.
Table 2. Adverse Events in Trials of ALIF and PLF With rhBMP-2 Versus ICBG
Cumulative proportion of patients with ≥1 AE for ALIF (top) and PLF (bottom).
AE = adverse event; ALIF = anterior lumbar interbody fusion; ICBG = iliac crest bone graft; PLF = posterolateral lumbar fusion; rhBMP-2 = recombinant human bone morphogenetic protein-2; SAE = serious adverse event.
* We found no significant difference between the rhBMP-2 and ICBG groups at any time point for either outcome or surgery approach.
Two small single-center cohort studies found higher rates of subsidence and similar or lower fusion rates in patients who received rhBMP-2 (28, 29). Two single-center retrospective cohort studies evaluated retrograde ejaculation in ALIF. The first found a higher rate in rhBMP-2 recipients (5 of 69) than in the control group (1 of 174) (30). In the other, rates were similar for patients who received rhBMP-2 (4 of 54) and those who had an artificial disc implant without rhBMP-2 (4 of 41) (31).
Four Medtronic-sponsored randomized trials with IPD (studies 8, 12, 13, and 14, which were of fair quality overall) and 1 other trial without IPD (20) evaluated rhBMP-2 for PLF. Studies 12, 13, and 14 used a higher dose and concentration of rhBMP-2 than those used in ALIF trials. The non-Medtronic trial did not report dosage.
Meta-analysis based on IPD (n = 722) provided moderate-strength evidence of no consistent difference between rhBMP-2 and ICBG in effectiveness outcomes through 24 months (Table 1). The fusion rate at 24 months ranged from 70% to 90% in the ICBG group and 86% to 100% in the rhBMP-2 group; the rate of overall success ranged from 40% to 60% in both groups. The additional trial (20) also found no difference in fusion rates at 24 months (86% for rhBMP-2 vs. 71% for ICBG; RR, 1.12 [95% CI, 0.98 to 1.29]).
As with ALIF, we found no significant difference between the rhBMP-2 and ICBG groups in adverse events (Table 2 and Appendix Figure 2), but estimates were frequently imprecise, precluding strong conclusions. The only exception was that the rhBMP-2 group had increased risk for back and leg pain through 4 weeks, although heterogeneous events (for example, radiculopathy, Baker cyst, arthritic knee pain, or ankle pain) were included and may be unrelated to fusion surgery.
Results from cohort studies (32–39) and intervention series (40–46) seemed consistent with the randomized trials, although few studies (32, 37, 38, 42) reported specific adverse events.
We were not able to reach conclusions on effectiveness or harms of rhBMP-2 for other lumbar fusion procedures. Except for 1 small Medtronic-sponsored, fair-quality trial of PLIF (study 6; n = 67), only low-quality observational studies were available (17).
In a small Medtronic trial (study 7; n = 33), rhBMP-2 and ICBG did not differ in effectiveness end points. Three cohort studies also found no clear differences in effectiveness (11, 12, 29).
In study 7, rhBMP-2 was associated with a greater risk for adverse events than ICBG at 24 months (45 adverse events in 18 patients vs. 13 adverse events in 15 patients; rate ratio, 2.88 [CI, 1.30 to 6.41]). A large fair-quality cohort study (n = 27 067) found that rhBMP-2 was associated with increased risk for complications (odds ratio, 1.43 [CI, 1.12 to 1.70]), dysphagia or dysphonia (odds ratio, 1.63 [CI, 1.30 to 2.05]), and wound complications (odds ratio, 1.67 [CI, 1.10 to 2.53]) (9). Smaller cohort studies (n = 346 total) were consistent with these results (10–12). Intervention series that defined dysphagia differently reported that 5% to 60% of patients developed the condition (47–51).
In posterior cervical spine fusion, there were no controlled trials of rhBMP-2 and 1 cohort study showed no difference in rates of major complications (9).
Five Medtronic-sponsored trials with IPD (studies 2, 4, 5, 10, and 14) reported at least 1 cancer case through 24 months and were included in our meta-analysis (see Appendix Table 4, for detailed information about cancer). Compared with the control groups, rhBMP-2 was associated with an increased risk for cancer (RR, 3.45 [CI, 1.98 to 6.00]; absolute difference, 1.9 percentage points [CI, 0.5 to 3.2 percentage points]), with a number needed to harm of 53 (CI, 31 to 200) (Figure). Data were insufficient to determine the effect of rhBMP-2 dose on estimates of cancer risk. Although 10 of 17 cancer cases with rhBMP-2 occurred in the largest high-dose trial (study 14; n = 239), another high-dose study (study 13; n = 98) reported no cancer cases with rhBMP-2. At 48 months, the increased risk was no longer statistically significant (4 studies; RR, 1.82 [CI, 0.84 to 3.95]).
Appendix Table 4. Cancer Occurrence at 24 and 48 mo in Trials
Comparison of cancer risk between rhBMP-2 and control.
The forest plot shows the comparison of cancer risk in 5 studies at 24 mo and 4 studies at 48 mo. BCP = biphasic calcium phosphate; rhBMP-2 = recombinant human bone morphogenetic protein-2.
* We obtained the combined risk ratio by using a generalized linear fixed-effects model with binomial distribution and log link without correction for zero events. For zero events, we estimated the risk ratio from each study by adding a continuity correction of 0.5 for illustrative purposes.
Excluding non-SEER cancer cases resulted in estimates similar to those that included them (RR through 24 months, 2.92 [CI, 1.75 to 4.87]; RR through 48 months, 1.92 [CI, 0.86 to 4.32]). One cohort study of 125 patients (24 who received rhBMP-2 and 101 who received ICBG) reported a statistically nonsignificant increased risk for cancer (RR, 2.10 [CI, 0.69 to 6.41]) (52). Overall, the strength of evidence was low due to sparse data.
Risk for death through 24 months did not differ between the rhBMP-2 and control groups (studies 2, 4, 6 to 10, 13, and 14; RR, 0.67 [CI, 0.28 to 1.63]) or 48 months (studies 4, 10, 13, and 14; RR, 0.65 [CI, 0.33 to 1.30]), but the event rates were low and RRs were imprecise.
In 2002, the FDA approved rhBMP-2 with the LT-CAGE device in ALIF on the basis of 3 premarketing studies (studies 1 to 3) (2). The primary publications of the pivotal trials did not report the primary end point, which was overall success at 24 months (rates ranged from 50% to 60% for both groups) (Table 3, studies 2 and 3) (6, 53).
By 2004, at least 12 articles and reviews reporting results from these studies had been published in major orthopedic journals (6, 53–62). In contrast with reports to the FDA, many of these articles presented the results of the pivotal trials as showing better fusion rates for rhBMP-2 than for ICBG. For example, the primary publication for study 2 reiterated higher fusion rates in the rhBMP-2 group (94.5% vs. 88.7%) in the abstract and results and conclusion sections and downplayed the fact that the difference was not statistically significant (6). Another publication reported results for 1 site in study 3 (22 of the 137 patients), stating a 100% rate of fusion and “improvement in back pain, leg pain, and function,” which did not represent the overall results for the study (Table 3, study 3) (57). Seven other Medtronic-supported articles that referred to study 3 cited this article instead of the overall results (5, 8, 53, 56, 60, 63, 64).
In 2003, Burkus and colleagues published a post hoc “integrated analysis” that promoted the idea that rhBMP-2 would have superior outcomes compared with ICBG with sufficient sample size (53). The authors combined the rhBMP-2 groups from studies 2 and 3 and compared them with a control group that combined the ICBG group of study 2 (n = 136) with an older, unrelated, unpublished series of patients (n = 266) who had laparoscopic surgery with the LT-CAGE device (53). According to an internal Medtronic report, surgeons in the unrelated study were probably less skilled with the new laparoscopic cage technique, as evidenced by longer operative times, higher blood loss, and longer hospital stays (65). The authors did not mention this concern and concluded that rhBMP-2 “had statistically superior outcomes” for these outcomes and for fusion rates. In 2004, Burkus and colleagues stated in another article, “the outcomes represent typical results from a wide variety of surgeons with different degrees of experience” (60).
Articles by Medtronic-associated investigators underreported adverse events in the rhBMP-2 and ICBG groups (Table 3, rows 1 to 3). As noted previously (13), these articles reported “no adverse events due to rhBMP-2” (56) and “no unanticipated device-related adverse events” (6). In the control group, the articles emphasized “donor site hip pain,” which was assessed only in the control patients and only on the side of the iliac crest operation. Figure 1 of the primary publication for study 2 represented the hip pain scores in the rhBMP-2 group as zeroes even though hip pain was not measured in that group (6). Adverse events were well-reported in a 2011 publication in which on-label rhBMP-2 was the control (Table 3, study 10). The study reported that 7% of rhBMP-2 recipients had a serious adverse event that was “possibly device-related” (66).
Two Medtronic studies of rhBMP-2 used bone dowels, an off-label lumbar application (Table 3, studies 4 and 5). In 2002, Burkus and colleagues reported that 24 of 24 patients (100%) receiving rhBMP-2 achieved fusion at 24 months compared with 13 of 19 in the control group (68%) (Table 3, study 4) (7). The larger pivotal bone dowel trial (study 5) was terminated early. Study 5 was published only in an article that combined the pilot and pivotal trials and represented them as “a two-part, prospective, randomized, multicenter study” with “two sequential phases.” It reported that “fusion rates were significantly better in the study group (P < 0.001)” without mentioning early termination (56), as did 2 additional articles by the same author (63, 67). In our analysis, fusion rates for study 5 were 91% for rhBMP-2 versus 95% for ICBG (Table 3, study 5).
Table 3. Comparison of IPD Analysis With Published Data in Medtronic-Sponsored Studies of rhBMP-2
In December 1999 (before FDA approval of rhBMP-2 for use in ALIF), Medtronic suspended enrollment in study 6, a randomized trial of rhBMP-2 in PLIF, because of ectopic bone formation in some patients (27), potentially leading to radiculopathy from nerve root impingement. Medtronic followed the 67 enrolled patients for 24 months. In March 2002, Medtronic requested FDA permission to terminate the study. The same year, Medtronic sponsored a supplement in the journal Spine in which review articles were published along with conclusions from an “international panel of experts” that included outside experts, investigators associated with Medtronic, and Medtronic employees. Two articles in the supplement discussed the concern about ectopic bone formation in study 6. Although it noted that large randomized trials were needed to establish the safety of rhBMP-2 in off-label procedures, the supplement argued that ectopic bone formation and complications it might cause were due to poor technique (59, 61). No data from study 6 were presented to support this argument. The international panel stated, “when used properly, BMPs currently appear to be extremely safe for spine fusion” (68).
After study 6 was terminated, an article published in 2004 (Table 3, study 6) (8) reported data on ectopic bone formation (24 of 34 patients in the rhBMP-2 group vs. 4 of 33 in the ICBG group; P < 0.001) for the first time. Despite the small sample, the authors emphasized the lack of association between ectopic bone formation and leg pain and gave an incomplete account of the reasons for study termination (13, 69).
There were 6 studies of rhBMP-2 in PLF, 3 of which (studies 8, 12, and 14) were published (70–72). Our IPD analysis from study 14 (72) showed that rhBMP-2 and ICBG did not differ in rates of overall success (56% vs. 56%) and fusion (90% vs. 90%). In contrast, the journal publication and FDA summary reported that use of rhBMP-2 resulted in a higher fusion rate (96% vs. 89%; P = 0.014) (72, 73). This difference may be due to our classification of patients with partial data as failures, although why this would differentially affect the rhBMP-2 group is not clear. Adverse events were underreported in 2 publications (Table 3, studies 8 and 12) (70, 71) but well-reported in the largest trial (Table 3, study 14) (72).
In spinal fusion, rhBMP-2 and ICBG seem to be similarly effective when used in ALIF and PLF, although the current evidence does not allow definitive conclusions about effectiveness in other surgical approaches. The Short Form-36 Physical Component Summary scores were slightly better with rhBMP-2 than with ICBG in ALIF patients through 24 months, but the difference was only 2 to 3 points on a 100-point scale and thus did not meet typical criteria for a clinically meaningful difference (74).
The use of rhBMP-2 in anterior cervical spine fusion was associated with statistically significant increases in overall adverse events, wound complications, and dysphagia or dysphonia. For lumbar fusion—both on-label and off-label—adverse events were common with rhBMP-2 and ICBG. Although our review raises concerns about a possible increased risk for retrograde ejaculation, urine retention, subsidence, and ectopic bone formation with rhBMP-2, the data on these harms were sparse and the quality of ascertainment was often poor. Our analysis underscores that more definitive evidence about harms was needed before rhBMP-2 became widely used.
We found that rhBMP-2 was associated with an increased risk for cancer through 24 months regardless of whether non–SEER-reportable cases were considered. This finding should be interpreted with caution because cancer cases were heterogeneous and, according to Medtronic, underreported (26). Seven Medtronic-sponsored trials (n = 429 total) with no cancer cases in either group were not included in the meta-analysis but were not expected to affect the results (17). Animal studies do not suggest that rhBMP-2 is carcinogenic (61), but bone morphogenetic proteins are expressed by and promote the growth of some types of cancer (75–77). The development of cancer within 2 to 4 years also argues for a pro-oncogenic mechanism.
For both on-label and off-label indications, journal publications selected analyses and results that favored rhBMP-2 over ICBG. In their review, Carragee and colleagues demonstrated underreporting of adverse events in publications of 5 trials (3 on-label and 2 off-label) for which the FDA had made summary results public (13). Our study shows that adverse events were underreported for more on- and off-label uses, with results not previously available to the public. Journal practices for sponsored supplements, trial registration, and conflict of interest disclosure may have contributed to publication of an incomplete and sometimes misleading evidence base (78–80).
Meta-analysis of IPD offers several advantages over traditional, study-level meta-analysis (81). Compared with other reviews (15, 82), ours had a more complete and standardized evaluation of outcomes, with data from unpublished studies and data that were unreported or incompletely reported by published studies, thus reducing potential effects of publication and reporting bias. In addition, we could recalculate and recategorize outcome measures by using consistent definitions, adjust for potential baseline imbalances, and perform sensitivity analyses to handle missing data.
Nevertheless, IPD meta-analysis requires substantially more time and resources than traditional study-level meta-analysis, and availability of IPD cannot compensate for flawed data collection or sparse data. Even with IPD on 1879 patients from 12 trials, 1 additional trial, and many observational studies, the evidence base remains relatively small within each surgical approach. We found no published trials truly independent of the manufacturer. In addition, there has been no prospective, well-designed, adequately powered study that specifically aimed to assess important harms by using adequate ascertainment methods.
Information to adequately evaluate the effects of dose on risk for effectiveness and harms was also insufficient. Eleven Medtronic studies (studies 1 to 11) used rhBMP-2 at a concentration of 1.5 mg/mL, with total doses ranging from 0.6 to 16.8 mg (27). Higher and unapproved concentrations of rhBMP-2 (2.0 to 3.0 mg/mL) were used in 5 of the 6 PLF studies, with total doses ranging from 15.0 to 63.0 mg (27). Determining the effects of rhBMP-2 dosage was not possible because of differences in surgical approach, rhBMP-2 carrier, and fusion hardware.
Although we had unusual access to protocols and documents submitted by the manufacturer to the FDA, other information, such as operative notes and internal correspondence, might have helped assess the extent of design and reporting bias. Internal correspondence is essential to evaluating selective analysis reporting, ghostwriting, time-lag-bias, and misrepresentation of facts (25). Finally, we were not able to evaluate the integrity of adverse event adjudication.
In conclusion, we found substantial evidence of reporting bias and no evidence that rhBMP-2 is more effective than ICBG in spinal fusion, with some evidence of an association with important harms. More research is needed to provide more reliable estimates of risk for cancer and other adverse events and to identify patient populations in which use of rhBMP-2 may be beneficial, such as cases where use of bone graft alone is associated with a high risk for pseudarthrosis. On the basis of the currently available evidence, it is difficult to identify clear indications for rhBMP-2 in spinal fusion.
/*Sample SAS code for linear mixed-effects model for combining continuous outcomes – Here Oswestry disability score (OSSCORE) is the outcome*/
proc mixed data = ALIF_combined covtest;
class period study treat;
model OSSCORE = OSSCORE_b study treat/solution CL DDFM=RESIDUAL;
/* OSSCORE_b is the baseline score of OSSCORE;
Variable study is the identification variable for each study;
Variable treat identifies rhBMP-2 vs. ICBG group. **/
random treat /subject = study; /* Specify random treatment effect */
repeated /group = study;
/* Specify heterogeneous residual variance for included studies */
where period = 7; /* A separate model was fit for each follow-up time */
/*Sample SAS code for generalized linear mixed effects model for combining common binary outcomes */
ods output ParameterEstimates = ParameterEstimates;
proc nlmixed data = ATLEASTONEAE;
parms beta0= −0.3 beta1 = −0.2 sigma = 0.4;
if treat = 0 then eta = beta0;
if treat = 1 then eta = beta0 + beta1 + u;
expeta = exp(eta); /* This corresponds to a log link */
model Xevents ~ binomial(Nevents,expeta);
random u ~ normal(0,sigma * sigma) subject = study;
where period = “Four Weeks”;
/* A separate model was fit for each follow-up time */
RR = exp(Estimate);
RR_low_normal = exp(Estimate − 1.96* StandardError);
RR_upp_normal = exp(Estimate + 1.96* StandardError);
/*Calculate 95% confidence interval based on normal approximation*/
/*Sample SAS code for generalized linear fixed-effects model for combining rare binary outcomes */
proc genmod data = cancer;
model Xevents/Nevents = study treat/dist = binomial link = log scale = deviance;
/* A log link was used to produce a risk ratio;
The option for scale = is used to correct over- or
underdispersion if necessary*/
estimate “RR for rhBMP-2 vs. ICBG” treat 1 ;
where period = “24mon”;
The In the Clinic® slide sets are owned and copyrighted by the American College of Physicians (ACP). All text, graphics, trademarks, and other intellectual property incorporated into the slide sets remain the sole and exclusive property of the ACP. The slide sets may be used only by the person who downloads or purchases them and only for the purpose of presenting them during not-for-profit educational activities. Users may incorporate the entire slide set or selected individual slides into their own teaching presentations but may not alter the content of the slides in any way or remove the ACP copyright notice. Users may make print copies for use as hand-outs for the audience the user is personally addressing but may not otherwise reproduce or distribute the slides by any means or media, including but not limited to sending them as e-mail attachments, posting them on Internet or Intranet sites, publishing them in meeting proceedings, or making them available for sale or distribution in any unauthorized form, without the express written permission of the ACP. Unauthorized use of the In the Clinic slide sets will constitute copyright infringement.
Results provided by:
Copyright © 2016 American College of Physicians. All Rights Reserved.
Print ISSN: 0003-4819 | Online ISSN: 1539-3704
Conditions of Use
This PDF is available to Subscribers Only