0
Research and Reporting Methods |

Engaging Patients and Stakeholders in Research Proposal Review: The Patient-Centered Outcomes Research InstituteEngaging Patients and Stakeholders in Research Proposal Review FREE

Rachael L. Fleurence, PhD; Laura P. Forsythe, PhD, MPH; Michael Lauer, MD; Jason Rotter, MHS; John P.A. Ioannidis, DSc, MD; Anne Beal, MD, MPH; Lori Frank, PhD; and Joseph V. Selby, MD, MPH
[+] Article and Author Information

From Patient-Centered Outcomes Research Institute, Washington, DC; National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland; and Stanford University School of Medicine, Stanford, California.

Acknowledgment: PCORI thanks the contribution of the reviewers who participated in the merit-review process, follow-up surveys, and focus groups.

Disclosures: Disclosures can be viewed at www.acponline.org/authors/icmje/ConflictOfInterestForms.do?msNum=M13-2412.

Requests for Single Reprints: Rachael L. Fleurence, PhD, Patient-Centered Outcomes Research Institute, 1828 L Street, Suite 900, Washington, DC 20036; e-mail, rfleurence@pcori.org.

Current Author Addresses: Drs. Fleurence, Forsythe, Frank, and Selby and Mr. Rotter: Patient-Centered Outcomes Research Institute, 1828 L Street, Suite 900, Washington, DC 20036.

Dr. Lauer: National Heart, Lung, and Blood Institute Office of the Director, National Heart, Lung and Blood Institute, Building 31, Room 5A52, 31 Center Drive, MSC 2486, Bethesda, MD 20892.

Dr. Ioannidis: Stanford University School of Medicine, 291 Campus Drive, Room LK3C02, Li Ka Shing Building, 3rd Floor, Stanford, CA 94305-5101.

Dr. Beal: Sanofi-Aventis, 55 Corporate Drive, Bridgewater, NJ 08807.

Author Contributions: Conception and design: R.L. Fleurence, L.P. Forsythe, M. Lauer, J.P.A. Ioannidis, L. Frank.

Analysis and interpretation of the data: R.L. Fleurence, L.P. Forsythe, M. Lauer, J. Rotter, J.P.A. Ioannidis, L. Frank.

Drafting of the article: R.L. Fleurence, L.P. Forsythe, J. Rotter, J.P.A. Ioannidis, A. Beal, L. Frank, J.V. Selby.

Critical revision of the article for important intellectual content: R.L. Fleurence, L.P. Forsythe, J. Rotter, J.P.A. Ioannidis.

Final approval of the article: R.L. Fleurence, L.P. Forsythe, M. Lauer, J. Rotter, J.P.A. Ioannidis, A. Beal, L. Frank.

Statistical expertise: M. Lauer, J. Rotter, J.P.A. Ioannidis, A. Beal.

Administrative, technical, or logistic support: R.L. Fleurence.

Collection and assembly of data: R.L. Fleurence, L. Frank.


Ann Intern Med. 2014;161(2):122-130. doi:10.7326/M13-2412
Text Size: A A A

The inaugural round of merit review for the Patient-Centered Outcomes Research Institute (PCORI) in November 2012 included patients and other stakeholders, as well as scientists. This article examines relationships among scores of the 3 reviewer types, changes in scoring after in-person discussion, and the effect of inclusion of patient and stakeholder reviewers on the review process. In the first phase, 363 scientists scored 480 applications. In the second phase, 59 scientists, 21 patients, and 31 stakeholders provided a “prediscussion” score and a final “postdiscussion” score after an in-person meeting for applications. Bland–Altman plots were used to characterize levels of agreement among and within reviewer types before and after discussion. Before discussion, there was little agreement among average scores given by the 4 lead scientific reviewers and patient and stakeholder reviewers. After discussion, the 4 primary reviewers showed mild convergence in their scores, and the 21-member panel came to a much stronger agreement. Of the 25 awards with the best (and lowest) scores after phase 2, only 13 had ranked in the top 25 after the phase 1 review by scientists. Five percent of the 480 proposals submitted were funded. The authors conclude that patient and stakeholder reviewers brought different perspectives to the review process but that in-person discussion led to closer agreement among reviewer types. It is not yet known whether these conclusions are generalizable to future rounds of peer review. Future work would benefit from additional data collection for evaluation purposes and from long-term evaluation of the effect on the funded research.


In the Patient Protection and Affordable Care Act of 2010, PCORI was authorized “to assist patients, clinicians, purchasers, and policymakers in making informed health decisions through research and evidence synthesis” (1). Central to PCORI's strategy is the engagement of patients, caregivers, and other health care stakeholders in key aspects of the research enterprise (2). One critical opportunity for engaging patients and stakeholders is in the research application review process. PCORI posted its first broad funding announcements for comparative effectiveness research on 22 May 2012 (www.pcori.org/funding-opportunities/funding-announcements/closed-opportunities). Awards were for a maximum of $1.5 million in direct costs over 3 years. The first portfolio of projects was awarded on 18 December 2012 (www.pcori.org/pfaawards). Between May and November 2012, PCORI established and conducted a peer-review process that involved scientists; patients; and other stakeholders, such as clinicians, policymakers, and funders.

PCORI is the first major U.S. funding agency to systematically require the inclusion of reviewers who are not scientifically trained in reviewing funding applications. Inclusion of such reviewers in this complex process is rare, and little guidance exists in scientific literature (3). Agencies, such as the National Institutes of Health, Agency for Healthcare Research and Quality, and U.S. Department of Defense, have some experience with including nonscientist reviewers, and some benefits have been suggested (47). However, little evidence from these efforts is available to determine whether selection of research projects is altered when patients and stakeholders are included (8). Evidence suggests that peer review using only scientists is biased against novelty (9) and may lead to selection of applications that are similar to the interests of the reviewers (10). It is speculated (but not proven) that participation of nonscientifically trained reviewers or scientists from very different fields may help correct these problems and may also improve the relevance of research to stakeholders who would implement study findings. In the context of health-related research, these end users include patients, caregivers, clinicians, and clinical policymakers. In this review, patients could either represent personal patient or caregiver perspectives or represent patients in their professional capacity (foundation or advocacy employees). They were not required to have or represent the condition discussed in a particular application. PCORI recognizes that there may be differences in these perspectives but sees value in both. Although scientists and stakeholders may also bring a patient perspective from their personal lives, reviewers who self-identified as scientific reviewers were categorized as such for the purposes of this review. This article explores the merit-review process of PCORI for its inaugural round of funding and investigates the contributions of scientist, patient, and stakeholder reviewers.

PCORI set up a 2-phase review process (Figure 1) because it received nearly 1300 letters of intent indicating an intention to apply and a 1-phase review to handle this volume of applications did not seem feasible. (PCORI received only 480 complete applications, but planning was based on the number of letters of intent received.) In phase 1, each application was reviewed by 3 scientific reviewers who submitted reviews online. There was no discussion among reviewers for this phase. PCORI had considered inviting patients and stakeholders to review in phase 1 (either alone or in addition to the scientific reviewers). However, there was not sufficient time for recruitment and training and PCORI did not yet have experience in inviting patients and stakeholders to review applications online and without discussion. Therefore, phase 1 was conducted with scientific reviewers alone.

Grahic Jump Location
Figure 1.

Inaugural peer-review process.

PCORI = Patient-Centered Outcomes Research Institute.

Grahic Jump Location

A total of 363 scientific reviewers were selected through an open call on the PCORI Web site between August and September 2012. An automated search (“Reviewer Finder”) was also used to identify potential researchers matched for research expertise, and they were contacted by e-mail to gauge their interest. Potential scientific reviewers were evaluated by PCORI staff using criteria, such as training; previous review experience; research experience, including receipt of grants; and professional engagement as measured by membership in key professional societies, publications, and presentations at scientific meetings. The scientists participating in phase 1 were not the same as those in phase 2. Reviewers were required to recuse themselves from reviewing an application in the cases of actual or perceived financial, professional, or personal associations with the applicant or the applicant's institution.

These reviewers were trained on PCORI's merit-review process and the 8 merit-review criteria through a series of Webinars (Appendix Table 1). These criteria differ substantially from those of most scientific reviews because, in addition to the scientific rigor of the study, they include patient-centeredness, the engagement of patients and stakeholders in the conduct of the research, and the likelihood that the research could alter patient or clinician practices. Scientific reviewers in phase 1 used these review criteria to score applications and also provided an overall score for their assigned proposals. They were further asked to pay particular attention to the scientific rigor of the proposal when writing their overall assessment. Proposals with an average score among the 3 scientific reviewers that ranked in the top one third of all applications in phase 1 moved forward to phase 2 (n = 152).

Table Jump PlaceholderAppendix Table 1. PCORI's 8 Merit-Review Criteria Used in November 2012 

Applications in phase 2 were first scored by 4 lead reviewers: 2 scientists, 1 patient, and 1 other stakeholder. We refer to these as “prediscussion” scores; the applications were then given a final score by each member of a 21-person panel during a face-to-face meeting (referred to as the “postdiscussion” score). Applications for patient and stakeholder reviewers were solicited in a 2-month open-application process (411 applications were received). Written statements from applicants were scored by PCORI staff using a 4-point scale to assess the applicant's motivation, relevant experience, and understanding of PCORI's mission. Nearly one half of the selected patient and stakeholder reviewers self-identified as patients, patient advocates, patient family members, or unpaid caregivers (42%) (Appendix Table 2). Patients and stakeholders were provided with several mandatory trainings by PCORI (Webinars and a 1-day face-to-face meeting).

Table Jump PlaceholderAppendix Table 2. Characteristics of Patient and Stakeholder Reviewers 

Overall, 59 scientists and 52 patients and stakeholders participated in phase 2. Each of the 4 lead reviewers was tasked with providing an overall score for their applications before the in-person meeting based on the phase 1 reviews. Lead reviewers had access to the critiques and scores provided by phase 1 reviewers. Because some patient and stakeholder reviewers had little scientific training, they were invited to base their overall score on 3 of the 8 merit criteria: innovation and potential for improvement (criterion 2), patient-centeredness (criterion 4), and patient and stakeholder engagement on the research team (criterion 7). After the initial phase 2 scoring, these reviewers met in person as part of panels composed of 21 reviewers (the 4 from the initial phase plus additional scientist, stakeholder, and patient reviewers identified as described earlier) and led by a chairperson on 18 November 2012. Applications that scored in the top two thirds based on the average of the 4 lead reviewers’ scores in phase 2 before the meeting were discussed in the larger 21-person panels (98 applications). At the meeting, verbal input was provided by the lead scientific reviewers, stakeholder reviewer, and patient reviewer. Lead reviewer scores were made available to all reviewers during the discussion. After discussion, each proposal received a final overall score from the 21 scientific, patient, and stakeholder reviewers on the panels, including revised scores from the 4 lead reviewers. All reviewers in both phases were required to complete a conflicts-of-interest disclosure statement on any financial relationships with health care entities and were required to recuse themselves from reviewing applications or participating in discussions or scoring in the case of actual or perceived financial, professional, or personal associations with an applicant or an applicant's institution.

A total of 98 conflicts were noted on 69 of 480 applications. Scoring from all phases was done on a scale from 1 (exceptional) to 9 (poor), with numerically higher scores indicating weaker proposals. Final scores used in the analyses were multiplied by 10 for simplification purposes and to avoid using decimals. The 25 applications with the best (and lowest) scores, based on average postdiscussion scores of all participating reviewers from phase 2 (up to 21 scores), were approved for funding by PCORI's Board of Governors.

Agreement Among and Within Scientific, Patient, and Stakeholder Reviewer Scores Before and After Discussion

We conducted exploratory analyses of the level of agreement between scientist scores and patient and stakeholder scores before and after the in-person panel discussions. For visual comparison, scatterplots of stakeholder-versus-scientist and patient-versus-scientist scores were developed for both prediscussion and postdiscussion scores by application. We present 2 sets of data: those from the relevant 4 lead reviewers only and those resulting from the postdiscussion scores of all 21 reviewers. In addition, scatterplots were also developed to examine changes in prediscussion versus postdiscussion scores by type of reviewer (patient, stakeholder, or scientist).

We further explored the level of agreement between scientist scores and patient and stakeholder scores before and after discussion using Bland–Altman plots. This approach has been proposed to assess the degree of agreement between 2 continuous measures that overcomes some of the limitations of correlation coefficients (11). The A Bland–Altman plot provides a graphical comparison of the difference between 2 measures (such as scientist and stakeholder scores) on the y-axis against the average of the 2 measures on the x-axis. When the measures are in perfect agreement (that is, no difference), all points lie along a horizontal line at Y = 0. Dotted lines denote a 2-SD range above and below the mean difference, called 95% limits of agreement—limits, within which we would expect 95% of differences in scores to lie (12). Bland–Altman plots are helpful in assessing the magnitude of differences across a range of mean scores or if the difference between measures is related to its magnitude. Analyses and figures were conducted using the Stata statistical software package (StataCorp, College Station, Texas) (13). For visual simplicity, the “jitter” function (available in the Stata software package) was used to space points on figures where data lie on top of each other (14). Loess lines were drawn to highlight possible relationships in both scatterplots and Bland–Altman plots. These are weighted linear regression lines, which smooth the data. To examine the degree of change in prediscussion to postdiscussion scores within each reviewer type, we calculated the proportion of applications for which a lead reviewer changed his or her score by at least 10 points and by at least 20 points.

Disposition of Applications

We graphically described the ranks of applications after phase 1 and compared these with rankings of final scores from phase 2 to determine whether different applications were funded on the basis of the 2-phase review compared with what would have been funded with only 1 phase of scientific review. In addition, we reviewed the factors reported by reviewers during the in-panel discussion to better understand changes in scores for the projects that moved in and out of the top-25 rankings between phases 1 and 2. We used summary statements developed by PCORI-contracted Merit Review Officers who attended the review.

Surveys and focus groups were conducted to obtain feedback from the reviewers on all phases of the peer-review process. Scientific, patient, and stakeholder reviewers completed Web-based surveys after phase 2 (response rate was 39% for scientific reviewers and 42% for patients and stakeholders). Separate focus groups for scientific reviewers and for patient and stakeholder reviewers were also conducted. A random sample of scientific reviewers was invited to participate, and all patient and stakeholder reviewers were invited to participate. In total, 21 scientific reviewers from phase 1 and 6 scientific reviewers and 26 patient and stakeholder reviewers from phase 2 participated. Themes from the open-ended survey comments and the focus groups were narratively identified by the 2 of the authors.

Approval for this part of the study was obtained from the MaGil Institutional Review Board (Rockville, Maryland).

Agreement Among Scientific, Patient, and Stakeholder Reviewer Scores Before and After Discussion

Before discussion, no meaningful relation emerged among average scores given by stakeholders and scientists or by patients and scientists (Figure 2, A [top panels]). Among the 4 lead reviewers after discussion, there was mild association among scores in each comparison. In Figure 2 (A, bottom panels), Loess smoother regression lines show a slight positive linear trend after discussion for both scientists versus stakeholders and scientists versus patients. Likewise, Bland–Altman plots in Figure 3 (A) show mild increases in agreement after discussion between scientist and patient reviewers and between scientist and stakeholder reviewers as evidenced by Loess lines closer to an average difference of Y = 0. The 95% limits of agreement remain mostly unchanged (approximately ±30 points) across all 4 comparisons. In all 4 panels, more disagreement at higher (worse) scores in the negative direction indicate that patients and stakeholders tend to score more critically than scientists. This trend is more pronounced for the prediscussion scores and is stronger when comparing scientists with patients than when comparing scientists with stakeholders, although Loess lines may be strongly influenced by a few relatively large differences between scientists and patients or stakeholders among applications scoring poorly (high) on average.

Grahic Jump Location
Figure 2.

Scatterplots for application-level analysis.

A score of 10 is exceptional; 90 is poor. A circle indicates a funded application; nonfunded applications are marked by a green X. The regression lines are Loess smoothing lines relating average scientist scores to average stakeholder or patient scores. Points have been shifted using the “jitter” function for visual simplicity. Each point represents an individual application (n = 98). A. The relationship between stakeholder (left) and patient (right) scores compared with scientist scores before (top panels) and after (bottom panels) discussion among the lead 4 reviewers only. B. The relationship between stakeholder (left) and patient (right) scores compared with scientist scores after discussion among all reviewers.

Grahic Jump Location
Grahic Jump Location
Figure 3.

Bland–Altman plots for application-level analysis.

A score of 10 is exceptional; 90 is poor. A circle indicates a funded application; nonfunded applications are marked by a green X. The regression lines are Loess smoothing lines relating average scientist scores to average stakeholder or patient scores. Points have been shifted using the “jitter” function for visual simplicity. Points above the horizontal line at Y = 0 indicate that scientists scored better than patients or stakeholders for a particular application and vice versa. If there was complete agreement in scores between types of reviewers, points would cluster along the horizontal line at Y = 0. The upper and lower dotted lines are at the 95% upper and lower limits of agreement. Each point represents an individual application (n = 98). A. The level of agreement between stakeholder (left) and patient (right) reviewer scores and scientific scores before (top panels) and after (bottom panels) discussions among lead 4 reviewers only. B. The level of agreement between stakeholder (left) and patient (right) scores and scientific scores after discussion among all reviewers.

Grahic Jump Location

When scores were averaged from among all 21 reviewers after discussion, we saw strong levels of agreement. In Figure 2 (B), scatterplots display a positive association for both scientists versus stakeholders and scientists versus patients. In Figure 3 (B), Bland–Altman plots show a tight clustering of the difference in scores around the Y = 0 line for both comparisons, with relatively narrow 95% limits of agreement (−14 to 14 for scientists vs. patients and −11 to 10 for scientists vs. stakeholders). Here, mildly increased disagreement at higher (worse) average scores in a positive direction indicate average scientist scores that are slightly more critical than both patients and stakeholders at this end of the scale.

Agreement Between Prediscussion and Postdiscussion Scores Within Scientific, Patient, and Stakeholder Reviewer Groups

Consistent positive linear relationships between prediscussion and postdiscussion scores are seen in scatterplots for all 3 reviewer types (Figure 4, A). However, there was greater variability in the changing of scores for patient reviewers than for either stakeholders or scientists (Figure 4, B), as reflected in the wider 95% limits of agreement in the Bland–Altman plots (−21 to 15 for scientists, −22 to 16 for stakeholders, and −27 to 20 for patients). Within all 3 reviewer types, we see a slight negative average agreement in the score range of 30 to 50 (Loess line below Y = 0), indicating a small increase (worsening) in scores from before to after discussion. We also see closer agreement at very low (strong) and very high (weak) scores, indicating little movement for the strongest and weakest applications.

Grahic Jump Location
Figure 4.

Scatterplots and Bland–Altman plots for reviewer-level analysis.

A score of 10 is exceptional; a score of 90 is poor. The regression lines are Loess smoothing lines relating scientist (left), stakeholder (middle), and patient (right) prediscussion scores to postdiscussion scores. Each point represents an individual application. A. Scatterplots represent the relationship within scientist (left), stakeholder (middle), and patient (right) reviewer groups before and after discussion. Points have been shifted using the “jitter” function for visual simplicity. B. Bland–Altman plots show the level of agreement within scientist (left), stakeholder (middle), and patient (right) reviewer groups before and after discussion. Points above the horizontal line at Y = 0 indicate prediscussion scores worse than postdiscussion for a particular reviewer and vice versa. If there was complete agreement in scores before to after discussion, points would cluster along the horizontal line at Y = 0. The upper and lower dotted lines are at the 95% upper and lower limits of agreement.

Grahic Jump Location

Overall, primary reviewers changed their scores from prediscussion to postdiscussion by 10 points in either direction 42% of the time and 20 points 14% of the time (Appendix Table 3). Among reviewer types, patients were most likely to increase or decrease their scores than were scientists or stakeholders. For 35% of applications, the difference between scientist and stakeholder scores decreased from before to after discussion (24% increased). For 41% of applications, the difference between scientist and patient scores decreased from before to after discussion (23% increased) (data not shown). This further confirms the mild trend toward convergence in scores after discussion.

Table Jump PlaceholderAppendix Table 3. Change in Reviewer Scores From Before Discussion to After Discussion 
Disposition of Applications

Five percent (n = 25) of the 480 proposals submitted were funded, representing those applications with the best (lowest) average overall scores after discussion in phase 2. Figure 5 shows that, of the 25 funded awards, only 13 ranked in the top 25 after the phase 1 review by scientists (median score, 23 [range, 10 to 23]), whereas 8 ranked between 26 and 50 (all 8 received a score of 27) and 4 ranked between 51 and 152 (median score, 35 [range, 30 to 37]). The overall median score in phase 1 for all applications was 47, suggesting that the 12 applications that “moved up” into the final top 25 were judged to be relatively strong proposals even by phase 1 reviewers.

Grahic Jump Location
Figure 5.

Disposition of projects.

Phase 2 data are final rankings only (phase 2A is not shown). *Sixteen applications tied for the 11th best score after phase 1.

Grahic Jump Location

Factors cited frequently and prominently in summary statements of applications that had moved either into or out of the top 25 between phases included the strength of patient and stakeholder engagement, relevance to patients of the proposed study outcomes, appropriateness of the study design, and potential effect of the study (defined as whether the question was important and the research would be likely to change clinical practice). This was largely expected because these factors relate closely to major PCORI review criteria.

Reviewer Feedback on the Merit Review

Themes that emerged from the survey and focus groups included scientists' appreciation of the perspectives offered by patients and stakeholders and recognition of a collegial and respectful process. However, challenges were also reported, including scientists’ concern about nonscientists’ level of technical expertise and some nonscientists being considered less authoritative than scientists. Difficulties in understanding the unique PCORI review criteria (such as patient-centeredness) were also reported. Suggestions for improvement were offered by respondents. Several ideas for breaking down the hierarchy among reviewers were offered, including alternating the order of oral presentation by reviewer types, adding a stakeholder or patient co-chair, and reducing the use of language that implies distinction (such as scientific and nonscientific). Many patient reviewers requested more interaction with scientific reviewers before the in-person review panel (for example, in reviewer training or via e-mail). Many reviewers suggested that PCORI use only 1 phase of review incorporating scientists, patients, and stakeholders.

This article adds to a limited amount of literature on the effects of including patients and stakeholders alongside scientists in the peer review of research applications.

Before face-to-face discussions, scores varied significantly among scientists, patients, and stakeholders, indicating potential differences in how these 3 reviewer groups applied PCORI criteria and assessed applications. After discussion, scores among primary reviewers showed some mild convergence, and scores of the panel as a whole showed much greater agreement, suggesting that consensus was built during the face-to-face meeting. Of the 3 types of reviewers, patients tended to change their individual scores before and after discussion slightly more than scientists and stakeholders.

Of the 25 awards funded, only 13 ranked in the top 25 after phase 1 review by scientists alone. Thus, if awards had been restricted to the initial scientific review, a different set of applications would have been funded. Although these changes in rankings could be due to the inclusion of patients and stakeholders reviewers, other factors in phase 2 may have affected the final rankings. The scientific reviewers in phase 2 were not the same as those in phase 1. The phase 2 reviewers met for in-person discussions on the applications, whereas phase 1 reviewers conducted their reviews without meetings. All phase 2 reviewers, including scientists, were encouraged to focus especially on patient-centeredness, quality of plans for engagement of patients and stakeholders, and the likelihood that the project findings could affect practice. Thus, we cannot attribute the changes in rankings solely to the inclusion of patient and stakeholder reviewers. The extent to which inclusion of patient and stakeholder reviewers resulted in the selection of applications that are more patient-centered specifically, as well as the effects of including these reviewers on the methodological rigor of the proposals selected for funding, remains unclear. Further research, including possible experimental studies, to assess the relative merits, difficulties, and outcomes of alternative review processes is needed to fully understand the effect of a multistakeholder merit-review process.

Regardless of whether patient and stakeholder reviewers change rankings, it is certainly not possible yet to ascertain whether the inclusion of their input benefited the funded portfolio. Future metrics for evaluating the review process should include both quantitative and qualitative outcomes and should aim to capture the scientific effect of the funded research and its effect on health, advancement of the translational continuum, improved dissemination, and reproducibility and validity of the research. In response to feedback and insights from reviewers after this inaugural review cycle, PCORI has made changes to the peer-review process. The 2-phase approach has been replaced by a 1-phase review to simplify the process. Patient and stakeholder reviewers will have the opportunity (but will not be required) to review on all criteria instead of selected ones.

Several limitations are important to mention. Our results were based on a relatively small group of patient, stakeholder, and scientific reviewers, which limited our ability to generalize findings to other settings or to future PCORI reviews. Most reviewers bring several perspectives to the process (such as patient, caregiver, and clinician), and this heterogeneity makes it difficult to meaningfully distinguish among “patients” and “other stakeholders.” For example, some patient and stakeholder reviewers also affiliated as researchers. More research into reviewer characteristics and how they contribute to the quality of reviews will be useful, including the effect of reviewer conflicts of interests. PCORI also lacked data on final scores for each of the individual review criteria (only overall scores were required from reviewers in phase 2), limiting our ability to fully understand which criteria were most changed in the discussion process. This particular merit-review process was multifaceted, with 2 stages, 2 different groups of scientific reviewers, and different training for scientists and stakeholders, making it more difficult to attribute causality in explaining the changes in scores.

In terms of assessing review processes, PCORI's Methodology Committee has called for new studies, ideally with experimental designs, that assess different methods for engaging patients with diverse views and preferences and funneling their input into the peer-review process (3). A qualitative analysis of recordings of actual panel interactions may also provide further insights into this complex process. Future work would benefit from more metrics for evaluation (including evaluating whether certain criteria are more important to certain reviewers than others, which PCORI will be able to measure in future cycles), from incorporating other perspectives (for example, the panel chair and research applicants), and from long-term evaluation of the effect of funded research. In the absence of long-term outcomes for the consequences of any review choices and for the long-term effect of specific research projects, it is difficult to claim that there is a gold standard that can verify which research proposals are really the best.

PCORI will need to build on both the successes and challenges of incorporating patients and stakeholders in scientific peer review to reach its goal of funding a patient-centered research portfolio. The distinction among patients, stakeholders, and scientists may perpetuate the sense that researchers rank somewhat higher in the hierarchy of reviewers. The future of patient-centered merit review may reside in considering all reviewers as part of a heterogeneous group who bring different expertise and insights to bear on the quality of applications. Future applicants will succeed if they can make a clear case for the importance and value of the research to patients and stakeholders as well as scientists.

Patient Protection and Affordable Care Act, Pub. L. No. 111-148, 124 Stat 119 (2010):318-9.
 
Washington AE, Lipstein SH. The Patient-Centered Outcomes Research Institute—promoting better information, decisions, and health. N Engl J Med. 2011; 365:e31.
PubMed
CrossRef
 
Patient-Centered Outcomes Research Institute Methodology Committee.  The PCORI Methodology Report. 2013. Accessed at www.pcori.org/assets/2013/11/PCORI-Methodology-Report.pdf on 20 January 2014.
 
Andejeski Y, Breslau ES, Hart E, Lythcott N, Alexander L, Rich I, et al, U.S. Army Medical Research and Materiel Command Fiscal Year 1995 Breast Cancer Research Program Integration Panel. Benefits and drawbacks of including consumer reviewers in the scientific merit review of breast cancer research. J Womens Health Gend Based Med. 2002; 11:119-36.
PubMed
CrossRef
 
Rich IM, Andejeski Y, Alciati MH, Crawford Bisceglio I, Breslau ES, McCall L, et al. Perspective from the Department of Defense Breast Cancer Research Program. Breast Dis. 1998; 10:33-45.
PubMed
 
National Institutes of Health.  2007–2008 Peer Review Self-Study. Final Draft. 2008. Accessed at http://enhancing-peer-review.nih.gov/meetings/NIHPeerReviewReportFINALDRAFT.pdf on 14 June 2013.
 
Andejeski Y, Bisceglio IT, Dickersin K, Johnson JE, Robinson SI, Smith HS, et al. Quantitative impact of including consumers in the scientific review of breast cancer research proposals. J Womens Health Gend Based Med. 2002; 11:379-88.
PubMed
CrossRef
 
Kotchen TA, Spellecy R.  Peer Review: A Research Priority. 2012. Accessed at www.pcori.org/assets/Peer-Review-A-Research-Priority.pdf on 14 June 2013.
 
Boudreau KJ, Guinan EC, Lakhani KR, Riedl C.  The Novelty Paradox & Bias for Normal Science: Evidence from Randomized Medical Grant Proposal Evaluations. HBS Working Paper no. 13-053. 2012. Accessed at http://hbswk.hbs.edu/item/7173.html on 20 June 2013.
 
Nicholson JM, Ioannidis JP. Research grants: conform and be funded. Nature. 2012; 492:34-6.
PubMed
 
Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986; 1:307-10.
PubMed
CrossRef
 
Bland JM, Altman DG. Applying the right statistics: analyses of measurement studies. Ultrasound Obstet Gynecol. 2003; 22:85-93.
PubMed
CrossRef
 
StataCorp. Stata Statistical Software: Release 12. College Station, TX: StataCorp; 2011.
 
StataCorp. Stata 12 Base Reference Manual. College Station, TX: Stata Pr; 2011.
 

Figures

Grahic Jump Location
Figure 1.

Inaugural peer-review process.

PCORI = Patient-Centered Outcomes Research Institute.

Grahic Jump Location
Grahic Jump Location
Figure 2.

Scatterplots for application-level analysis.

A score of 10 is exceptional; 90 is poor. A circle indicates a funded application; nonfunded applications are marked by a green X. The regression lines are Loess smoothing lines relating average scientist scores to average stakeholder or patient scores. Points have been shifted using the “jitter” function for visual simplicity. Each point represents an individual application (n = 98). A. The relationship between stakeholder (left) and patient (right) scores compared with scientist scores before (top panels) and after (bottom panels) discussion among the lead 4 reviewers only. B. The relationship between stakeholder (left) and patient (right) scores compared with scientist scores after discussion among all reviewers.

Grahic Jump Location
Grahic Jump Location
Figure 3.

Bland–Altman plots for application-level analysis.

A score of 10 is exceptional; 90 is poor. A circle indicates a funded application; nonfunded applications are marked by a green X. The regression lines are Loess smoothing lines relating average scientist scores to average stakeholder or patient scores. Points have been shifted using the “jitter” function for visual simplicity. Points above the horizontal line at Y = 0 indicate that scientists scored better than patients or stakeholders for a particular application and vice versa. If there was complete agreement in scores between types of reviewers, points would cluster along the horizontal line at Y = 0. The upper and lower dotted lines are at the 95% upper and lower limits of agreement. Each point represents an individual application (n = 98). A. The level of agreement between stakeholder (left) and patient (right) reviewer scores and scientific scores before (top panels) and after (bottom panels) discussions among lead 4 reviewers only. B. The level of agreement between stakeholder (left) and patient (right) scores and scientific scores after discussion among all reviewers.

Grahic Jump Location
Grahic Jump Location
Figure 4.

Scatterplots and Bland–Altman plots for reviewer-level analysis.

A score of 10 is exceptional; a score of 90 is poor. The regression lines are Loess smoothing lines relating scientist (left), stakeholder (middle), and patient (right) prediscussion scores to postdiscussion scores. Each point represents an individual application. A. Scatterplots represent the relationship within scientist (left), stakeholder (middle), and patient (right) reviewer groups before and after discussion. Points have been shifted using the “jitter” function for visual simplicity. B. Bland–Altman plots show the level of agreement within scientist (left), stakeholder (middle), and patient (right) reviewer groups before and after discussion. Points above the horizontal line at Y = 0 indicate prediscussion scores worse than postdiscussion for a particular reviewer and vice versa. If there was complete agreement in scores before to after discussion, points would cluster along the horizontal line at Y = 0. The upper and lower dotted lines are at the 95% upper and lower limits of agreement.

Grahic Jump Location
Grahic Jump Location
Figure 5.

Disposition of projects.

Phase 2 data are final rankings only (phase 2A is not shown). *Sixteen applications tied for the 11th best score after phase 1.

Grahic Jump Location

Tables

Table Jump PlaceholderAppendix Table 1. PCORI's 8 Merit-Review Criteria Used in November 2012 
Table Jump PlaceholderAppendix Table 2. Characteristics of Patient and Stakeholder Reviewers 
Table Jump PlaceholderAppendix Table 3. Change in Reviewer Scores From Before Discussion to After Discussion 

Videos

Author Insight Video - Rachael L. Fleurence, PhD

References

Patient Protection and Affordable Care Act, Pub. L. No. 111-148, 124 Stat 119 (2010):318-9.
 
Washington AE, Lipstein SH. The Patient-Centered Outcomes Research Institute—promoting better information, decisions, and health. N Engl J Med. 2011; 365:e31.
PubMed
CrossRef
 
Patient-Centered Outcomes Research Institute Methodology Committee.  The PCORI Methodology Report. 2013. Accessed at www.pcori.org/assets/2013/11/PCORI-Methodology-Report.pdf on 20 January 2014.
 
Andejeski Y, Breslau ES, Hart E, Lythcott N, Alexander L, Rich I, et al, U.S. Army Medical Research and Materiel Command Fiscal Year 1995 Breast Cancer Research Program Integration Panel. Benefits and drawbacks of including consumer reviewers in the scientific merit review of breast cancer research. J Womens Health Gend Based Med. 2002; 11:119-36.
PubMed
CrossRef
 
Rich IM, Andejeski Y, Alciati MH, Crawford Bisceglio I, Breslau ES, McCall L, et al. Perspective from the Department of Defense Breast Cancer Research Program. Breast Dis. 1998; 10:33-45.
PubMed
 
National Institutes of Health.  2007–2008 Peer Review Self-Study. Final Draft. 2008. Accessed at http://enhancing-peer-review.nih.gov/meetings/NIHPeerReviewReportFINALDRAFT.pdf on 14 June 2013.
 
Andejeski Y, Bisceglio IT, Dickersin K, Johnson JE, Robinson SI, Smith HS, et al. Quantitative impact of including consumers in the scientific review of breast cancer research proposals. J Womens Health Gend Based Med. 2002; 11:379-88.
PubMed
CrossRef
 
Kotchen TA, Spellecy R.  Peer Review: A Research Priority. 2012. Accessed at www.pcori.org/assets/Peer-Review-A-Research-Priority.pdf on 14 June 2013.
 
Boudreau KJ, Guinan EC, Lakhani KR, Riedl C.  The Novelty Paradox & Bias for Normal Science: Evidence from Randomized Medical Grant Proposal Evaluations. HBS Working Paper no. 13-053. 2012. Accessed at http://hbswk.hbs.edu/item/7173.html on 20 June 2013.
 
Nicholson JM, Ioannidis JP. Research grants: conform and be funded. Nature. 2012; 492:34-6.
PubMed
 
Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986; 1:307-10.
PubMed
CrossRef
 
Bland JM, Altman DG. Applying the right statistics: analyses of measurement studies. Ultrasound Obstet Gynecol. 2003; 22:85-93.
PubMed
CrossRef
 
StataCorp. Stata Statistical Software: Release 12. College Station, TX: StataCorp; 2011.
 
StataCorp. Stata 12 Base Reference Manual. College Station, TX: Stata Pr; 2011.
 

Letters

NOTE:
Citing articles are presented as examples only. In non-demo SCM6 implementation, integration with CrossRef’s "Cited By" API will populate this tab (http://www.crossref.org/citedby.html).

Comments

Submit a Comment
Submit a Comment

Summary for Patients

Clinical Slide Sets

Terms of Use

The In the Clinic® slide sets are owned and copyrighted by the American College of Physicians (ACP). All text, graphics, trademarks, and other intellectual property incorporated into the slide sets remain the sole and exclusive property of the ACP. The slide sets may be used only by the person who downloads or purchases them and only for the purpose of presenting them during not-for-profit educational activities. Users may incorporate the entire slide set or selected individual slides into their own teaching presentations but may not alter the content of the slides in any way or remove the ACP copyright notice. Users may make print copies for use as hand-outs for the audience the user is personally addressing but may not otherwise reproduce or distribute the slides by any means or media, including but not limited to sending them as e-mail attachments, posting them on Internet or Intranet sites, publishing them in meeting proceedings, or making them available for sale or distribution in any unauthorized form, without the express written permission of the ACP. Unauthorized use of the In the Clinic slide sets will constitute copyright infringement.

Toolkit

Want to Subscribe?

Learn more about subscription options

Advertisement
Related Articles
Journal Club
Topic Collections
PubMed Articles

Want to Subscribe?

Learn more about subscription options

Forgot your password?
Enter your username and email address. We'll send you a reminder to the email address on record.
(Required)
(Required)