David M. Kent, MD, MS; David van Klaveren, PhD; Jessica K. Paulus, ScD; Ralph D'Agostino, PhD; Steve Goodman, MD, MHS, PhD; Rodney Hayward, MD; John P.A. Ioannidis, MD, DSc; Bray Patrick-Lake, MFS; Sally Morton, PhD; Michael Pencina, PhD; Gowri Raman, MBBS, MS; Joseph S. Ross, MD, MHS; Harry P. Selker, MD, MSPH; Ravi Varadhan, PhD; Andrew Vickers, PhD; John B. Wong, MD; Ewout W. Steyerberg, PhD
From Predictive Analytics and Comparative Effectiveness (PACE) Center, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, Massachusetts (D.M.K., J.K.P., J.B.W.); Erasmus Medical Center, Rotterdam, the Netherlands, and Predictive Analytics and Comparative Effectiveness (PACE) Center, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, Massachusetts (D.V.); Boston University, Boston, Massachusetts (R.D.); Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, California (S.G., J.P.I.); University of Michigan, Ann Arbor, Michigan (R.H.); Duke Clinical Research Institute, Duke University, Durham, North Carolina (B.P., M.P.); Virginia Polytechnic Institute and State University, Blacksburg, Virginia (S.M.); Center for Clinical Evidence Synthesis, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, Massachusetts (G.R.); Schools of Medicine and Public Health, Yale University, New Haven, Connecticut (J.S.R.); Center for Cardiovascular Health Services Research, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, and Tufts Clinical and Translational Science Institute, Boston, Massachusetts (H.P.S.); Center on Aging and Health, Johns Hopkins University, Baltimore, Maryland (R.V.); Memorial Sloan Kettering Cancer Center, New York, New York (A.V.); Tufts Medical Center, Boston, Massachusetts; and Leiden University Medical Center, Leiden, the Netherlands (E.W.S.).
Disclaimer: The views, statements, and opinions presented in this work are solely the responsibility of the authors and do not necessarily represent the views of the Patient-Centered Outcomes Research Institute (PCORI), its Board of Governors, or its Methodology Committee.
Acknowledgment: The authors thank Mark Adkins, Teddy Balan, and Dan Sjoberg for excellent technical support in analyses included in figures and the appendix tables that support the PATH Statement (26). They also thank the Annals of Internal Medicine editors and reviewers, whose thoughtful feedback greatly improved this work. They thank Jennifer Lutz and Christine Lundquist for assistance with copyediting and creating exhibits.
Financial Support: Development of the PATH Statement was supported through contract SA.Tufts.PARC.OSCO.2018.01.25 from the PCORI Predictive Analytics Resource Center. This work was also informed by a 2018 conference (“Evidence and the Individual Patient: Understanding Heterogeneous Treatment Effects for Patient-Centered Care”) convened by the National Academy of Medicine and funded through a PCORI Eugene Washington Engagement Award (1900-TMC).
Disclosures: Dr. Kent reports grants from PCORI during the conduct of the study. Dr. Goodman reports personal fees from PCORI outside the submitted work. Dr. Pencina reports grants from PCORI (Tufts Subaward) during the conduct of the study; grants from Sanofi/Regeneron, Amgen, and Bristol-Myers Squibb outside the submitted work; and personal fees from Boehringer Ingelheim and Merck outside the submitted work. Dr. Ross reports personal fees from PCORI during the conduct of the study and grants from the U.S. Food and Drug Administration, Medtronic, Johnson & Johnson, the Centers for Medicare & Medicaid Services, Blue Cross Blue Shield Association, the Agency for Healthcare Research and Quality, the National Institutes of Health (National Heart, Lung, and Blood Institute), and Laura and John Arnold Foundation outside the submitted work. Dr. Varadhan reports personal fees from Tufts University during the conduct of the study. Dr. Vickers reports grants from the National Institutes of Health during the conduct of the study. Dr. Wong reports grants from PCORI during the conduct of the study. Dr. Steyerberg reports royalties from Springer for his book Clinical Prediction Models. Authors not named here have disclosed no conflicts of interest. Disclosures can also be viewed at www.acponline.org/authors/icmje/ConflictOfInterestForms.do?msNum=M18-3668.
Corresponding Author: David M. Kent, MD, MS, Predictive Analytics and Comparative Effectiveness (PACE) Center, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, 800 Washington Street, Box 63, Boston, MA 02111; e-mail, email@example.com.
Current Author Addresses: Drs. Kent, Paulus, Raman, and Selker: Predictive Analytics and Comparative Effectiveness (PACE) Center, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, 800 Washington Street, Box 63, Boston, MA 02111.
Dr. van Klaveren: Erasmus University Medical Center, Doctor Molewaterplein 40, 3015 GD Rotterdam, the Netherlands.
Dr. D'Agostino: Boston University Mathematics and Statistics Department, 111 Cummington Street, Boston, MA 02215.
Dr. Goodman: Stanford University School of Medicine, 150 Governor's Lane, Room T265, Stanford, CA 94305.
Dr. Hayward: VA Ann Arbor Health Services Research and Development, 2800 Plymouth Road, Building 14, G100-36, Ann Arbor, MI 48109.
Dr. Ioannidis: Stanford Prevention Research Center, 1265 Welch Road, Stanford, CA 94305.
Ms. Patrick-Lake: Evidation Health, 167 2nd Avenue, San Mateo, CA 94401.
Dr. Morton: Virginia Tech, North End Center Suite 4300, 300 Turner Street NW, Blacksburg, VA 24061.
Dr. Pencina: Duke Clinical Research Institute, 200 Trent Street, Durham, NC 27710.
Dr. Ross: Yale University School of Medicine, PO Box 208093, New Haven, CT 06520.
Dr. Varadhan: Johns Hopkins University, Division of Biostatistics and Bioinformatics, 550 North Broadway, Suite 1103-A, Baltimore, MD 21205.
Dr. Vickers: Memorial Sloan Kettering Cancer Center, 485 Lexington Avenue, 2nd Floor, New York, NY 10017.
Dr. Wong: Tufts Medical Center, 800 Washington Street #302, Boston, MA 02111.
Dr. Steyerberg: Erasmus University Medical Center, PO Box 2040, 3055 PC Rotterdam, the Netherlands.
Author Contributions: Conception and design: D.M. Kent, J.K. Paulus, R. Hayward, J.P.A. Ioannidis, B. Patrick-Lake, J.S. Ross, A. Vickers, J.B. Wong, E.W. Steyerberg.
Analysis and interpretation of the data: D.M. Kent, J.K. Paulus, R. D'Agostino, R. Hayward, J.P.A. Ioannidis, R. Varadhan, J.B. Wong, E.W. Steyerberg.
Drafting of the article: D.M. Kent, J.K. Paulus, R. D'Agostino, S. Goodman, J.P.A. Ioannidis, A. Vickers, J.B. Wong.
Critical revision of the article for important intellectual content: D.M. Kent, D. van Klaveren, J.K. Paulus, R. D'Agostino, S. Goodman, R. Hayward, J.P.A. Ioannidis, S. Morton, M. Pencina, G. Raman, J.S. Ross, H.P. Selker, R. Varadhan, A. Vickers, J.B. Wong, E.W. Steyerberg.
Final approval of the article: D.M. Kent, D. van Klaveren, J.K. Paulus, R. D'Agostino, S. Goodman, R. Hayward, J.P.A. Ioannidis, B. Patrick-Lake, S. Morton, M. Pencina, G. Raman, J.S. Ross, H.P. Selker, R. Varadhan, A. Vickers, J.B. Wong, E.W. Steyerberg.
Provision of study materials or patients: D.M. Kent, J.B. Wong.
Statistical expertise: D.M. Kent, D. van Klaveren, R. D'Agostino, R. Hayward, J.P.A. Ioannidis, S. Morton, R. Varadhan, A. Vickers, J.B. Wong, E.W. Steyerberg.
Obtaining of funding: D.M. Kent, J.K. Paulus, J.B. Wong.
Administrative, technical, or logistic support: D.M. Kent, J.K. Paulus, G. Raman, H.P. Selker, J.B. Wong.
Collection and assembly of data: D.M. Kent, J.K. Paulus, G. Raman, J.B. Wong.
The PATH (Predictive Approaches to Treatment effect Heterogeneity) Statement was developed to promote the conduct of, and provide guidance for, predictive analyses of heterogeneity of treatment effects (HTE) in clinical trials. The goal of predictive HTE analysis is to provide patient-centered estimates of outcome risk with versus without the intervention, taking into account all relevant patient attributes simultaneously, to support more personalized clinical decision making than can be made on the basis of only an overall average treatment effect. The authors distinguished 2 categories of predictive HTE approaches (a “risk-modeling” and an “effect-modeling” approach) and developed 4 sets of guidance statements: criteria to determine when risk-modeling approaches are likely to identify clinically meaningful HTE, methodological aspects of risk-modeling methods, considerations for translation to clinical practice, and considerations and caveats in the use of effect-modeling approaches. They discuss limitations of these methods and enumerate research priorities for advancing methods designed to generate more personalized evidence. This explanation and elaboration document describes the intent and rationale of each recommendation and discusses related analytic considerations, caveats, and reservations.
The scale dependence of HTE.
All 3 scenarios are drawn from hypothetical trials with the same overall results (outcome rate, 8.8% in the control group [open circles] vs. 6.6% in the treatment group [closed circles]) and depict outcomes in low-risk groups (75% of patients, Q1-3) and high-risk groups (25% of patients, Q4) (where control event rates are 5% and 20%, respectively). Plots in the left, middle, and right column display outcome risks, relative effects, and absolute effects, respectively. In the first row, effect heterogeneity is absent on the relative scale but present on the absolute scale. In the second row, effect heterogeneity is present on the relative scale but absent on the absolute scale. In the third row, effect heterogeneity is present on both the relative and the absolute scale. The statistical significance of HTE is typically tested on the relative scale (middle column) because regression analyses are often performed on these scales. Provided sufficient statistical power, analyses 2 and 3 would show statistically significant HTE. However, regardless of the scale of the analysis, the clinical importance of HTE should generally be evaluated on the absolute scale. When absolute effects span a decisionally important threshold, which depends on the treatment burden (e.g., harms and costs), HTE is said to be clinically important. In this example, for illustratrive purposes we have arbitrarily set a decisionally relevant threshold at a 1–percentage point reduction in outcome risk. Here, although HTE is present on the absolute scale in both analyses 1 and 3, clinically important heterogeneity is present only in the third analysis, where the treatment that is beneficial on average may not be worth the treatment burden for many (indeed, most) patients. Of note, the presence of statistically significant interaction (on the relative scale) does not imply clinically important HTE, and the absence of statistically significant interaction does not imply the absence of clinically important HTE. It is also important to note that testing heterogeneity on the relative scale does not test a specific causal hypothesis regarding effect modification (regardless of the subgrouping variable) but merely tests the hypothesis that relative effects are the same in one group vs. another. Establishing causal interaction effects is not necessary to improve the targeting of therapy. We also note that this diagram makes the simplifying assumption of uniform treatment burdens across all levels of risk. In practice, adverse events may vary across risk groups, and the threshold is also sensitive to patient values and preferences. HTE = heterogeneity of treatment effects; Q1 = first risk quarter (lowest); Q2 = second risk quarter; Q3 = third risk quarter; Q4 = fourth risk quarter (highest).
Effects of lifestyle modification and metformin vs. usual care in patients with prediabetes at different risks for diabetes.
This figure presents HTE analysis of the DPP (Diabetes Prevention Program) trial as a function of baseline risk (32). Event rates (top), hazard ratios (middle), and absolute effects (bottom) are shown. Both lifestyle modification (left) and metformin (right) are compared with usual care as a function of baseline risk. For lifestyle modification, a consistent 58% reduction in the hazard of developing diabetes over 3 y was found across all levels of risk. This consistent relative effect yields HTE on the absolute scale of potential clinical importance. In contrast, the effects of metformin are heterogeneous on both the hazard ratio scale and the absolute scale. Penalized splines were used to model the relationship between the linear predictor of risk and the time-to-event outcome. Vertical lines denote 95% CIs, and P values are based on the null hypothesis of no effect modification tested using the linear predictor of risk in a Cox model. In the hazard ratio graphs, the dashed lines show the average effects in the trial and the horizontal lines at 1.0 refer to the null effect on this scale. The horizontal lines at 0 in the absolute risk reduction graphs refer to the null effect on this scale. Prediction of incident diabetes with an external model derived from the Framingham cohort yielded a similar pattern (33). HTE = heterogeneity of treatment effect; Q1 = first risk quarter (lowest); Q2 = second risk quarter; Q3 = third risk quarter; Q4 = fourth risk quarter (highest).
Value of a risk-modeling approach when the average treatment effect in a trial (treatment A) is near a decision threshold.
This figure depicts the anticipated influence of a risk-modeling approach in 2 trials testing different treatments in the same population, one (treatment A) with a slightly favorable benefit–harm tradeoff and the other (treatment B) with an extremely favorable benefit–harm tradeoff. Under both conditions, the control event rate is 25% and the MCSD (i.e., the absolute benefit that would justify the experimental therapy) is 3 percentage points. (For simplicity, we show a single MCSD, with gray shading corresponding to portions of the population that should not be treated, but this value varies according to individual patient values and preferences.) A risk-modeling approach would be of substantially greater value for the trial of treatment A, with the slightly favorable tradeoff (RRR, 0.15; absolute risk difference, 3.75% [just above the MCSD]), than for the trial of treatment B, with the extremely favorable tradeoff (RRR, 0.50; risk difference, 12.5% [substantially above the MCSD]). The distributions show the anticipated risk differences that emerge with a constant RRR when the same moderately predictive risk prediction model (i.e., with a c-statistic of about 0.70) is applied to the population. In the slightly favorable treatment condition (A), harms outweigh benefits in almost half of the trial population (43%) despite overall results showing benefit on average. In the extremely favorable treatment condition (B), treatment remains worthwhile in almost the entire population (97%). Thus, applying the risk-modeling approach is very valuable in the low-benefit condition because it reclassifies many patients as treatment-unfavorable who would otherwise have been treated on the basis of the overall result. MCSD = minimal clinically significant difference; RRR = relative risk reduction.
Schematized (left) and actual (right) risk-based heterogeneous treatment effects.
Q1 = first risk quarter (lowest); Q2 = second risk quarter; Q3 = third risk quarter; Q4 = fourth risk quarter (highest). A. Schematic results in a trial for a hypothetical intervention that lowers the odds of an outcome by 25% (odds ratio, 0.75) but has an absolute treatment-related harm of 1%. Outcome risks (top), observed odds ratios (middle), and risk differences (bottom) are shown. Overall trial results are dependent on the average risk for the enrolled trial population. When the average risk is about 7% (as in this example), a well-powered study would detect a positive overall treatment benefit (shown by the horizontal dashed line in the middle and bottom panels). However, a prediction model with a c-statistic of 0.75 generates the risk distribution in the top panel of the figure. A treatment-by-risk interaction emerges (middle). Regardless of whether this interaction is statistically significant, examination of treatment effects on the absolute risk difference scale (bottom) shows harm in the low-risk group and very substantial benefit in the high-risk group, both of which are obscured by the overall summary results. Conventional “1-variable-at-a-time” subgroup analyses are typically inadequate to disaggregate patients into groups that are sufficiently heterogeneous for risk, so benefit–harm tradeoffs can misleadingly seem to be consistent across the trial population. Although this figure shows idealized relationships between risk and treatment effects, these relationships will be sensitive to how risk is described (i.e., what variables are in the risk model). Baseline risk has a logit-normal distribution, with μ = −3 and σ = 1 (the log odds are normally distributed). Adapted from reference 3. Q1 = first risk quarter (lowest); Q2 = second risk quarter; Q3 = third risk quarter; Q4 = fourth risk quarter (highest). B. Stratified results of RITA-3 (Randomized Intervention Trial of unstable Angina 3) (64). The RITA-3 trial (n = 1810) tested early intervention vs. conservative management of non–ST-segment elevation acute coronary syndrome. Results for the outcome of death or nonfatal myocardial infarction at 5 y are shown, stratified into equal-sized risk quarters using an internally derived risk model; the highest-risk quarter is substratified into halves (groups 4a and 4b). Event rates with 95% CIs (top), odds ratios (middle), and risk differences (bottom) are shown. The risk model comprises the following easily obtainable clinical characteristics: age, sex, diabetes, prior myocardial infarction, smoking status, heart rate, ST-segment depression, angina severity, left bundle branch block, and treatment strategy. As in the schematic diagram to the left, the average treatment effect seen in the summary results (horizontal dashed line in middle and bottom panels) closely reflects the effect in patients in risk group 3, whereas half of patients (risk groups 1 and 2) receive no treatment benefit from early intervention. Absolute benefit (bottom) in the primary outcome was very pronounced in the eighth of patients at highest risk (risk group 4b). A statistically significant risk-by-treatment interaction can be seen when results are expressed in the odds ratio scale (middle) (the interaction P value is from a likelihood ratio test for adding an interaction between the linear predictor of risk and treatment assignment). Such a pattern can emerge if early intervention is associated with some procedure-related risks that are evenly distributed over all risk groups, eroding benefit in low-risk but not high-risk patients, as illustrated schematically in the left panel.
Risk heterogeneity increases with higher discrimination, and EQRR increases with increasing c-statistic, especially at low outcome rates.
The curves depict the relationship between the c-statistic and EQRR—that is, the risk in the highest quartile compared with the risk in the lowest quartile—for different outcome rates across 32 trials (46). Unsurprisingly, the degree of risk heterogeneity (as represented by the EQRR) is strongly related to the discriminatory power of the prediction model. The relationship is strongest when overall outcome rates are low. The c-statistic and EQRR both reflect how well the risk factors predict the outcome in a given population. For reference, in a trial with an outcome rate of 15%, a predictive model with a c-statistic of 0.80 is anticipated to yield an outcome rate that is 13-fold higher in the highest risk quartile than in the lowest risk quartile. When the outcome rate is lower (5%), this ratio is expected to be >20-fold for a model with similar discrimination. Patient groups with such different outcome risks are unlikely to have similar benefit–harm tradeoffs for most therapies, even though they may be included in the same trial. EQRR = extreme-quartile risk ratio.
Table 1. Mathematical Dependence of Treatment Effect on CER
Table 2. A Meta-research Agenda for Predictive Approaches to HTE
Table 3. Hypothetical Example Presentation of the Effects of Model-Based Decision Making
Table 4. Methodological Literature on the Conduct of Regression-Modeling Approaches to HTE Analysis
Evaluating model performance: a comparison of conventional outcome risk calibration in control and treatment groups vs. benefit calibration.
These data are box plots of predicted and observed hypothetical examples of event rates divided by quartiles of predicted risk in the control and treatment groups of a hypothetical randomized controlled trial (500 simulations) (top). These rates seem to demonstrate appropriate model calibration. However, examining the same data for predicted and observed benefit (differences in event rates) by quarters of predicted benefit (bottom) reveals very poor model calibration at the extreme quarters. This poor calibration occurs because miscalibration for the risk difference includes error from both control and treatment groups and because the scale of risk difference is much smaller than that of outcome risk. These data were generated from a simulation of a prediction model that included 12 treatment effect interactions, 6 of which represented true interactions. The boxes represent, in line with the Tukey definition, the 25% quantile to the 75% quantile (with the median shown). The lower and upper whiskers include the most extreme observations within the range of 1.5 times the interquartile range, from the 25% and 75% quantiles, respectively.
Kent DM, van Klaveren D, Paulus JK, et al. The Predictive Approaches to Treatment effect Heterogeneity (PATH) Statement: Explanation and Elaboration. Ann Intern Med. 2020;172:W1–W25. [Epub ahead of print 12 November 2019]. doi: https://doi.org/10.7326/M18-3668
Download citation file:
Published: Ann Intern Med. 2020;172(1):W1-W25.
Published at www.annals.org on 12 November 2019
Research and Reporting Methods.
Results provided by:
Copyright © 2020 American College of Physicians. All Rights Reserved.
Print ISSN: 0003-4819 | Online ISSN: 1539-3704
Conditions of Use