## Abstract

The PATH (Predictive Approaches to Treatment effect Heterogeneity) Statement was developed to promote the conduct of, and provide guidance for, predictive analyses of heterogeneity of treatment effects (HTE) in clinical trials. The goal of predictive HTE analysis is to provide patient-centered estimates of outcome risk with versus without the intervention, taking into account all relevant patient attributes simultaneously, to support more personalized clinical decision making than can be made on the basis of only an overall average treatment effect. The authors distinguished 2 categories of predictive HTE approaches (a “risk-modeling” and an “effect-modeling” approach) and developed 4 sets of guidance statements: criteria to determine when risk-modeling approaches are likely to identify clinically meaningful HTE, methodological aspects of risk-modeling methods, considerations for translation to clinical practice, and considerations and caveats in the use of effect-modeling approaches. They discuss limitations of these methods and enumerate research priorities for advancing methods designed to generate more personalized evidence. This explanation and elaboration document describes the intent and rationale of each recommendation and discusses related analytic considerations, caveats, and reservations.

*future*patients who do and do not benefit from a treatment to optimize decision making for individual patients (29). By accounting for multiple variables simultaneously, predictive HTE analysis is foundational to the concept of personalization in evidence-based medicine (4).

## Distinct Approaches to PATH

*difference*in expected outcome risks under 2 alternative treatments, conditional on important clinical variables. A fuller introduction to risk and effect modeling is presented in prior literature (4).

## Clarification of Terms and PATH Statement Scope

## HTE Analysis for Causal Interaction Versus for Prediction and Decision Making

*causal inferences*depend on interpretation of model

*inputs*(that is, model covariates). The PATH guidance does

*not*address causal interpretations of HTE. These analyses are important for identifying biomarkers that might biologically interact with therapy. Many methodologists believe that interaction on a multiplicative (relative) scale is stronger evidence in support of a causal interaction than interaction on an absolute scale (although this is by no means a universal view) (34–40). Nevertheless, we note that treatment-by-covariate interactions (on any scale) are generally descriptive measures of association (when the covariate is not randomly assigned, as in a factorial trial) because an interacting covariate may be acting as a proxy for many measured and unmeasured variables. To attribute a change in the treatment effect to the covariate, we would need to control for all relevant differences in these other variables (that is, observed and unobserved confounders) across levels of the subgrouping factor. In any event, demonstrating causal interaction is not necessary for “predictive” HTE analyses that seek to target therapies to those who most benefit.

*inferences for clinical decision making*depend on interpretation of model

*outputs*. Because of this, such analyses have been called “predictive” HTE analyses (4, 8). The PATH guidance is limited to predictive approaches to HTE. The goal of predictive HTE analysis is to develop models that can be used to predict which of 2 or more treatments will be better for a particular individual, taking into account multiple relevant variables (4, 8). Clinically important HTE occurs when variation in the risk difference across patient subgroups spans a decisionally important threshold, which depends on treatment burden (including treatment-related harms and costs). It is generally assessed on the absolute scale, regardless of the scale of the analysis. Figure 1 illustrates the scale dependence of effect heterogeneity. We also note that controlling for confounding factors (that is, factors that differ between levels of the subgrouping variable) is not necessary for prediction (35, 41).

## PATH Statement Criteria for When Risk Modeling Is Likely to Be of Value

### Included Criteria

*1. When an overall treatment effect is well established. (Subgroup results [including risk-based subgroup results] from overall null trials should be interpreted cautiously.)*

*2. When the benefits and harms or burdens of a given intervention are finely balanced (that is, of similar magnitude on average), increasing the sensitivity of the treatment decision to risk prediction.*

*net*benefit is near 0), any additional prognostic or predictive information is likely to be especially useful for determining the better therapy for a particular patient.

*left*) using plausible assumptions may be important in motivating research and should generally be included in research proposals and protocols.

_{2}score (and its variants) is used to target anticoagulation to patients with nonvalvular atrial fibrillation, but patients with higher CHADS

_{2}scores are also known to be at higher risk for anticoagulation-related hemorrhage (65). Given the potential (positive or negative) correlation between benefits and harms, we recommend that harms be reported in each risk stratum to support stratum-specific evaluation of benefit–harm tradeoffs (recommendation 9 in Figure 3 of the PATH Statement [26]).

*3. When treatments are associated with a nontrivial amount of serious harm or burden, increasing the importance of careful patient selection.*

*qualitative*interaction—meaning that some patients benefit while others are harmed. By definition, qualitative interactions do not arise where treatments are innocuous. In the presence of a small amount of treatment-related harm, the harm may be quantitatively negligible among high-risk patients but sufficient to erode much (or all) of the benefit in low-risk patients (Figure 4). The importance of risk modeling for HTE in treatments with treatment-related harm has been shown in simulation studies (49, 66) and observed empirically for carotid endarterectomy (51), stroke prevention in nonvalvular atrial fibrillation (67, 68), and medical or mechanical reperfusion in ST-segment elevation myocardial infarction (64, 69, 70). Treatment-related harm may be reflected in the primary outcome or ascertained as a separate outcome (such as acute kidney injury, major hemorrhage [54, 71], or serious bone fractures) (55, 72). Risk modeling may also be appropriate for particularly burdensome interventions (for example, major lifestyle commitments [32, 73] and treatment-related costs) (74, 75).

*4. When several large, well-conducted RCTs of contemporary interventions are available and appropriate for pooling in individual patient meta-analysis.*

*5. When substantial, identifiable heterogeneity of risk in the trial population is anticipated.*

*6. When there is strong preliminary evidence that a prediction model is clinically useful for treatment selection, or when models are in current use for treatment selection.*

_{2}score (67, 68) (and its variations [84]), the atherosclerotic cardiovascular disease score (47), and chest pain tools (85, 86) may be considered a marker of the risk sensitivity of these decisions. Similarly, the widespread use of certain diagnostic prediction models in emergency departments to rule out rare but serious conditions (such as cervical spine fracture [87], intracranial hemorrhage [88], and pulmonary embolism [89]) in low-risk patients to reduce the harms and burdens of further diagnostic testing is a marker of the risk sensitivity of this class of decisions. Such consensually established, implicitly revealed, risk-sensitive decisions remain relatively uncommon. Moreover, randomized data are relatively scarce, and risks may change meaningfully over time. Hence, opportunities to reexamine the risk-specific benefits (or validate predictions of benefit) in new trial data are highly valuable.

*7. When the clinical variables in the proposed models are routinely available in clinical care*.

### Explication of Excluded Criteria

*When the outcome rate is lower.*

*right*). These skewed distributions follow from the logistic regression scale (log odds) and Cox regression scale (log hazard [48]). This makes the average risk (and treatment benefit) misleading even for typical patients enrolled in the trial (48, 92). Nevertheless, the expert panelists disagreed about whether a low outcome rate was a useful criterion to identify worthwhile target trials for risk modeling. Outcome rate is estimated from empirical data with unavoidable uncertainty and unknown generalizability in other populations, which may have higher outcome rates.

*When the 2 treatments are clinically very different (for example, medicine vs. surgery).*

## Justification of Guidance on Risk-Modeling Strategies to Identify HTE

### General

*1. Reporting RCT results stratified by a risk model is encouraged when overall trial results are positive to better understand the distribution of effects across the trial population.*

*2. Predictive approaches to HTE require close integration of clinical and statistical reasoning and expertise.*

### Identify or Develop a Model

*3. When available, apply a high-quality, externally developed, compatible risk model to stratify trial results.*

*4. When a high-quality, externally developed model is unavailable, consider developing a model using the entire trial population to stratify trial results; avoid modeling on the control group only.*

*5. When developing new risk models or updating externally developed risk models, specify the analytic data plan before examining trial data and follow guidance for best practices for prediction model development.*

### Apply the Model, and Report Results

*6. Report metrics for model performance for outcome risk prediction on the RCT, including measures of discrimination and calibration (when appropriate).*

*7. Report distribution of predicted risk (or the risk score) in each group of the trial and in the overall study population.*

*8. Report outcome rates and both relative and absolute risk reduction across risk strata.*

*9. When there are important treatment-related harms, these harms should be reported in each risk stratum to support stratum-specific evaluation of benefit–harm tradeoffs.*

_{2}DS

_{2}-VASc (84), and ABCD2 (115), may be useful for trial risk stratification but do not yield predictions for calibration.

*relative*effect is constant (Figure 4 [

*right*] is an example). In addition, it permits evaluation of the assumption of a constant relative effect across risk strata (see recommendation 10 in Figure 3 of the PATH Statement [26] and explanation in discussion under recommendation 10 here). Alternatively, treatment effects can be presented by continuous risk, as seen in Figure 2, rather than by quantiles (which are sample-dependent). As discussed earlier, examining variation in relative treatment effects may be particularly important when even a small amount of treatment-related harm exists (3, 66). In time-to-event analysis, treatment effects should be analyzed and reported by cumulative incidence curves. Relative treatment effect estimates can be summarized by hazard ratios over a clinically meaningful time horizon (or several such horizons). Absolute treatment effect estimates can be summarized by cumulative incidences at a clinically meaningful time point (or several such points). In reporting risk-stratified results, authors provide readers with the information needed to easily determine the amount of variation in risk difference or number needed to treat and relative effects. These stratum-specific results can provide a rough guide for clinical interpretation, which can be further refined for clinical implementation by continuous modeling (recommendation 3 in Figure 4 of the PATH Statement [26]).

_{2}scores (indicating higher stroke risk and greater potential benefit from anticoagulation) also have substantially higher risk for bleeding (72). Patients with higher risk for stroke recurrence, according to a recurrence risk score, may benefit more from pioglitazone but also have higher risk for pioglitazone-related bone fracture (65). Because of the potential correlation between these 2 risk dimensions (that is, between risk for the primary outcome and risk for treatment-related harm), event rates for these harms should be presented at a level of disaggregation that is congruent with that of the primary outcome so that readers can determine benefit–harm tradeoffs within risk strata (recommendation 9 in Figure 3 of the PATH Statement [26]).

*10. To test the consistency of the relative treatment effect across prognostic risk, a continuous measure of risk (for example, the logit of risk) may be used in an interaction term with treatment group indicator.*

*middle right*) (116). A visual (nonparametric) exploration of how the relative effect varies across values of outcome risk may ensure the appropriateness of linear effect modification. Testing for a nonlinear interaction between risk and treatment (for example, using the logit of risk in a quadratic term, or with another flexible nonlinear shape [117, 118]) may also be useful. However, such an interaction test may be poorly powered to detect deviations from linearity, particularly when only a single trial with a limited number of events is the substrate for modeling. Moreover, once the existence of an overall treatment effect is established, determining the risk-specific treatment effect should be considered an estimation problem (rather than a hypothesis-testing problem). Flexibly modeling the treatment effect across risk strata, or simply reporting the effects across subgroups defined by quantiles (such as quartiles), provides useful information regardless of the

*P*value of the interaction terms testing effect modification on the relative scale. Standard errors across levels of risk can be estimated through a proportional interactions model (119, 120). Most important, the presence or absence of a statistically significant treatment interaction term (on the relative scale) should not be conflated with the presence or absence of clinically important HTE (on the absolute scale) (see discussion of recommendation 1 under Justification of Caveats and Considerations Before Moving to Clinical Practice, below).

## Justification of Caveats and Considerations Before Moving to Clinical Practice

*1. Clinical interpretation of HTE should stress differences in the absolute treatment effects across risk groups: The statistical significance of effect modification on the relative scale should not be conflated with the clinical significance of absolute treatment effect estimates.*

*2. External validation and calibration of risk prediction is important for translation of risk-specific treatment effects into clinical practice.*

*internally valid*estimates of treatment effects within trial risk strata, implementation of an

*externally valid*prognostic model is necessary for translation into practice (107). Finding clinically important HTE across risk strata within a trial with an endogenous model provides an important impetus for developing and implementing an externally valid prognostic model. Of note, external validity is a general concern for RCT results and their subgroup analyses and is not confined to results subgrouped using prediction models (122).

*3. Clinical implementation may be supported by translating multivariable risk-based subgroup analysis into models yielding continuous treatment effect predictions to avoid artifactual discontinuities in estimation at the quantile boundary of an outcome risk group.*

*legend*).

## Treatment Effect Modeling to Identify HTE

### Considerations Regarding the Inclusion of Rigorously Selected Effect Modifiers

*1. When highly credible relative effect modifiers have been identified, they should be incorporated into prediction models using multiplicative treatment-by-covariate interaction terms.*

*A. Credibility should be evaluated using rigorous multidimensional criteria and should not rely solely on statistical criteria (such as*P

*value thresholds).*

*P*value) is a measure of the statistical strength of the interaction effect in the data being analyzed. The PATH group endorses the rigorous and multidimensional approach recommended in ICEMAN to identify highly credible interaction terms. Examples of highly credible effect modifiers include symptom onset to treatment time for thrombolytic therapy for acute myocardial infarction or acute ischemic stroke (126, 127), gender as a modifier of the effect of thiazolidinediones on fracture risk (128–130), and urinary protein excretion as a modifier of the effect of angiotensin-converting enzyme inhibition on the progression of chronic kidney disease (77, 131).

*P*values (or other statistical criteria) in general are influential in how subgroup analyses are interpreted (and are included in ICEMAN criteria) and because interaction effects are poorly estimated in trials of conventional size (even when these are pooled), treatment interaction terms selected for inclusion are likely to overestimate the true interaction effects (that is, from overfitting) (4, 30, 134). Therefore, even when only highly credible interaction terms are included, we recommend model-building procedures that take into account model complexity (that is, approaches using regularization or penalization) whenever interactions are included (recommendation 3 in Figure 5 of the PATH Statement [26]).

### Caveats and Considerations for Data-Driven Effect Modeling

*2. Avoid 1-variable-at-a-time null hypothesis testing or stepwise selection (such as backward selection or forward selection) strategies to select single-variable relative effect modifiers.*

*3. Avoid the use of regression methods that do not take into account model complexity when estimating coefficients (for example, “conventional” unpenalized maximum-likelihood regression) when 1 or more treatment-by-covariate interaction terms are included in a treatment effect model.*

*4. Avoid evaluating models that predict treatment benefit using only conventional metrics for outcome prediction (for example, based on discrimination and calibration of outcome risk prediction).*

## Special Considerations for Evaluating Models That Predict Benefit

*x*of 100 patients with a predicted risk of

*x*% actually have the outcome?”) and discrimination (“What is the probability that patients with the outcome have a higher predicted risk than those without the outcome?”). Evaluating a prediction model intended to predict treatment effect using these usual metrics related to outcome risk prediction (such as the c-statistic) does not provide information on how well the model performs for predicting benefit and informing treatment decisions. Efforts to develop measures to assess model accuracy for predicting benefit (in particular, evaluating measures of discrimination for benefit) are hampered by the fundamental problem of causal inference for the individual. That is, individual patient treatment effects are inherently unobservable because only 1 of the possible outcomes is observed for each patient (the actual outcome they experienced under the treatment to which they were randomly assigned and not the counterfactual outcome under the alternative) (162).

*n*-of-1 trials [27, 169]), all of these methods evaluate evidence personalization

*indirectly*by evaluating whether a particular prediction–decision strategy optimizes benefits for a population (58)—which occurs when treatments are optimized for each individual.

## Limitations of the PATH Statement

*n*-of-1—or multiperson

*n*-of-1—trials, which some consider the only means of estimating “person-level” treatment effects. We anticipate that observational studies will play an increasingly important role in studying both treatment effects and HTE, but the PATH Statement does not address HTE in observational studies (except to stress that methods of debiasing treatment comparisons to support HTE are a research priority [Table 2]). Although each of the approaches we describe is consistent with the broad goal of evidence personalization, the methods are sufficiently distinct to be beyond the scope of this statement. Notwithstanding the limitations, we emphasize that the PATH Statement applies to the comparison of treatments as well as the comparison of treatment versus no treatment.

*if*a patient adheres to treatment, estimated with a per protocol or adherence-adjusted analysis [180–184]) is often considered the most appropriate estimand for shared decision making in the individual patient. However, as with observational studies, estimating the direct treatment effect can be done only with methods based on unverifiable assumptions; misspecifying a model predicting nonadherence (or using an instrumental variable approach when causes of nonadherence or dropout are complex) can lead to biased estimates of treatment effects. An intention-to-treat analysis is generally believed to yield an unbiased estimate of the treatment policy, although this may be less appropriate for shared decision making. More research is needed regarding optimal ways to combine predictive HTE approaches with approaches that estimate direct treatment or adherence-adjusted effects.

## Discussion

## References

**Kravitz RL, Duan N, Braslow J.**Evidence-based medicine, heterogeneity of treatment effects, and the trouble with averages. Milbank Q. 2004;82:661-87. [PMID: 15595946]- ,KentDM,PaulusJK,van KlaverenDet alThe Predictive Approaches to Treatment effect Heterogeneity (PATH) statement.Ann Intern Med12 November 2019. [Epub ahead of print]
- ,GreenlandS,RothmanKJ.LashTLConcepts of interaction. In: Rothman KJ, Greenland S, Lash TL, eds. Modern Epidemiology. 3rd ed.PhiladelphiaLippincott Williams & Wilkins2008
- ,VanderWeeleTJ.KnolMJA tutorial on interaction.Epidemiol Methods201433372
- ,VanderWeeleTJ.RobinsJMEmpirical and counterfactual conditions for sufficient cause interactions.Biometrika2008954961
**Harrell F, Lazzeroni L.**EHRs and RCTs: outcome prediction vs. optimal treatment selection. 2017. Accessed at www.fharrell.com/post/ehrs-rcts on 1 May 2019.**Harrell F.**Viewpoints on heterogeneity of treatment effect and precision medicine. 2018. Accessed at www.fharrell.com/post/hteview on 1 May 2019.- ,DeeksJJ.AltmanDGEffect measures for meta-analysis of trials with binary outcomes. In: Egger M, Davey Smith G, Altman DG, eds. Systematic Reviews in Health Care: Meta-Analysis in Context. 3rd ed.LondonBMJ Publishing Group2003
- ,CostaF,van KlaverenD,JamesSet alPRECISE-DAPT Study InvestigatorsDerivation and validation of the predicting bleeding complications in patients undergoing stent implantation and subsequent dual antiplatelet therapy (PRECISE-DAPT) score: a pooled analysis of individual-patient datasets from clinical trials.Lancet2017389102534
- ,ViscoliCM,KentDM,ConwitRet alIRIS Trial Investigators.Scoring system to optimize pioglitazone therapy after stroke based on fracture risk.Stroke2018STROKEAHA118022745
- ,ThuneJJ,HoefstenDE.LindholmMGDanish Multicenter Randomized Study on Fibrinolytic Therapy Versus Acute Coronary Angioplasty in Acute Myocardial Infarction (DANAMI)-2 InvestigatorsSimple risk stratification at admission to identify patients with reduced mortality from primary angioplasty.Circulation2005112201721
- ,KernanWN,ViscoliCM,DearbornJLet alInsulin Resistance Intervention After Stroke (IRIS) Trial InvestigatorsTargeting pioglitazone hydrochloride therapy after stroke or transient ischemic attack according to pretreatment risk for stroke or myocardial infarction.JAMA Neurol201774131927
- ,SelkerHP,BeshanskyJR,GriffithJLet alUse of the acute cardiac ischemia time-insensitive predictive instrument (ACI-TIPI) to assist with triage of patients with chest pain or other symptoms suggestive of acute cardiac ischemia. A multicenter, controlled clinical trial.Ann Intern Med199812984555
- .SteyerbergEWClinical Prediction Models: A Practical Approach to Development, Valdiation, and UpdatingNew YorkSpringer2009
- .HarrellFERegression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. 2nd ed.New YorkSpringer2015
**Abadie A, Chingos MM, West MR.**Endogenous stratification in randomized experiments. NBER Working Paper no. w19742. Cambridge, MA: National Bureau of Economic Research; 2013. Accessed at http://ssrn.com/abstract=2370198 on 1 May 2019.- .HarrellFERegression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival AnalysisNew YorkSpringer2001
- ,RoystonP.SauerbreiWMultivariable Model-Building: A Pragmatic Approach to Regression Analysis Based on Fractional Polynomials for Modelling Continuous VariablesChichester, United KingdomJ Wiley2008
- .SchandelmaierSEvaluating the Credibility of Effect Modification Claims in Randomized Controlled Trials and Meta-analysesHamilton, Ontario, CanadaMcMaster Univ2019
- ,EmbersonJ,LeesKR,LydenPet alStroke Thrombolysis Trialists' Collaborative GroupEffect of treatment delay, age, and stroke severity on the effects of intravenous thrombolysis with alteplase for acute ischaemic stroke: a meta-analysis of individual patient data from randomised trials.Lancet2014384192935
- ,BazelierMT,de VriesF,VestergaardPet alRisk of fracture with thiazolidinediones: an individual patient data meta-analysis.Front Endocrinol (Lausanne)2013411
- ,PaulusJK,RamanG,RekkasAet alWhite paper, appendix 1: methods and results of evidence review committee search. The Predictive Approaches to Treatment effect Heterogeneity (PATH) Statement.Washington, DCPatient-Centered Outcomes Research Institute2018
- ,AbadieA,ChingosMM.WestMREndogenous stratification in randomized experiments.Rev Econ Stat201810056780
- .SimonRSensitivity, specificity, PPV, and NPV for predictive biomarkers.J Natl Cancer Inst2015107
- ,JanesH,PepeMS,McShaneLMet alThe fundamental difficulty with evaluating the accuracy of biomarkers for guiding treatment.J Natl Cancer Inst2015107
- ,FineJP.PencinaMOn the quantitative assessment of predictive biomarkers [Editorial].J Natl Cancer Inst2015107
- ,SelkerHP,BeshanskyJR.GriffithJLTPI Trial InvestigatorsUse of the electrocardiograph-based thrombolytic predictive instrument to assist thrombolytic and reperfusion therapy for acute myocardial infarction. A multicenter, randomized, controlled, clinical effectiveness trial.Ann Intern Med20021378795
- National Research CouncilThe Prevention and Treatment of Missing Data in Clinical TrialsWashington, DCNational Academies Pr2010
- International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use.Addendum to ICH E9(R1): Statistical Principles for Clinical Trials.Estimands and Sensitivity Analysis in Clinical Trials2017
- ,RatitchB,BellJ,MallinckrodtCet alChoosing estimands in clinical trials: putting the ICH E9(R1) into practiceTher Innov Regul Sci20192168479019838827
- ,MallinckrodtCH,BellJ,LiuGet alAligning estimators with estimands in clinical trials: putting the ICH E9(R1) guidelines into practiceTher Innov Regul Sci20192168479019836979
- ,HernánMA,Hernández-DíazS.RobinsJMRandomized trials analyzed as observational studies.Ann Intern Med20131595602

## This feature is available only to Registered Users

Subscribe/Learn More## 0 Comments