Skip Navigation

Information for Authors - General Statistical Guidance

Return to the main Information for Authors page.

 

1. Presentation

Issue

Reporting Guideline

Percentages

Report percentages to one decimal place (i.e., xx.x%) when sample size is ≥200.

To avoid the appearance of a level of precision that is not present with small samples, do not use decimal places (i.e., xx%, not xx.xx%) when sample size is <200.

Standard deviations

Use “mean (SD)” rather than “mean ± SD” notation. The ± symbol is ambiguous and can represent standard deviation or standard error.

Standard errors

Report confidence intervals, rather than standard errors, when possible.

P values

For P values between 0.001 and 0.20, please report the value to the nearest thousandth. For P values greater than 0.20, please report the value to the nearest hundredth. For P values less than 0.001, report as “P<0.001.”

“Trend”

Only use the word trend when describing a test for trend or dose-response.

Avoid the term trend when referring to P values near but not below 0.05. In such instances, simply report a difference and the confidence interval of the difference (if appropriate) with or without the P value.

Descriptive tables

In tables that simply describe the characteristics of 2 or more groups (e.g., Table 1 of a clinical trial):

  • Report averages with standard deviations, not standard errors, when data are normally distributed.

  • Report median (minimum, maximum) or median (25th, 75th percentile [interquartile range, or IQR] when data are not normally distributed.

  • Avoid reporting P values as there can be imbalance when P values are not significant (because of small sample size) and balance when P values are significant (because of large sample size).

Figures

When developing informative graphics, follow these simple rules of thumb:

  • Avoid pie charts and 3-dimensional graphics.

  • Avoid simple bar plots that do not present measures of variability.

  • For meta-analysis forest plots, provide the raw data (numerators and denominators) in the margins.

  • For survival plots, provide the numbers of people at risk by group and time below the horizontal axis.

Reproducibility

Describe the statistical methods with enough detail to enable a knowledgeable reader with access to the original data to verify the reported results (ICMJE Recommendations).

Statistical software and code

Specify in the statistical analysis section the statistical software—version, manufacturer, and the specific functions, procedures, or programs—used for analyses.
For Bayesian methods, provide full code, including starting values and priors, within an appendix.
When statistical code is provided within an appendix, it should be well-annotated for comprehension by interested readers. (Localio AR, Goodman SN, Meibohm A, et al. Statistical code to support the scientific story. Ann Intern Med. 2018. doi:10.7326/M17-3431)

Technical appendix

Provide more detailed methods and results (e.g., sensitivity analyses) that cannot be described within the main body of the paper within an appendix.


2. Multivariable Analysis

Screening covariates

Approaches that select factors for inclusion in a multivariable model only if the factors are “statistically significant” in “bivariate screening” are not optimal. A factor can be a confounder even if it is not statistically significant by itself because it changes the effect of the exposure of interest when it is included in the model, or because it is a confounder only when included with other covariates.

Useful resource:

  • Sun GW, Shook TL, Kay GL. Inappropriate use of bivariable analysis to screen risk factors for use in multivariable analysis. J Clin Epidemiol. 1996;49:907-16. PMID: 8699212

Model building

Authors should avoid stepwise methods of model building, except for the narrow application of hypothesis generation for subsequent studies. Stepwise methods include forward, backward, or combined procedures for the inclusion and exclusion of variables in a statistical model based on predetermined P value criteria. Better strategies than P value driven approaches for selecting variables are those that use external clinical judgment. Authors might use a bootstrap procedure to determine which variables, under repeated sampling, would end up in the model using stepwise variable selection procedures. Regardless, authors should tell readers how model fit was assessed, which interactions were explored and how and why they were explored, and the results of those assessments.

Useful resources:

  • Collett D, Stepniewska K. Some practical issues in binary data analysis. Statist Med. 1999;18:2209-21. PMID: 10474134

  • Mickey RM, Greenland S. The impact of confounder selection criteria on effect estimation. Am J Epidemiol. 1989;129:125-37. PMID: 2910056

  • Steyerberg EW, Eijkemans MJC, Harrell FE, Jr., Habbema JDF. Prognostic modeling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. Statist Med. 2000;19:1059-79. PMID: 10790680

  • Steyerberg EW, Eijkemans MJC, Habbema DF. Stepwise selection in small data sets: a simulation study of bias in logistic regression analysis. J Clin Epidemiol. 1999;52:935-42. PMID: 10513756

  • Altman D, Andersen PK. Bootstrap investigation of the stability of a Cox regression model. Statist Med. 1989;8:771-83. PMID:2672226

  • Mick R, Ratain MJ. Bootstrap validation of pharmacodynamic models defined via stepwise linear regression. Clin Pharmacol Ther. 1994;56:217-22. PMID: 8062499

  • Harrell FE, Jr, et al. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statist Med. 1996;15:361-87. PMID: 8668867

Tables reporting multivariable analyses

Authors sometimes present tables that compare one by one an outcome with multiple individual factors followed by a multivariable analysis that adjusts for confounding. If confounding is present, as is often the case, the one-way comparisons are simply intermediate steps that offer little useful information for the reader. In general, omit presenting these intermediate steps in the manuscript and do not focus on them in the Results or Discussion.

3. Measurement Error

If several risk factors for disease are considered in a logistic regression model and some of these risk factors are measured with error, the point and interval estimates of relative risk corresponding to any of these factors may be biased either toward or away from the null value; the direction of bias is never certain. In addition to potentially biased estimates, confidence intervals of correctly adjusted estimates will be wider, sometime substantially, than naïve confidence intervals. Authors are encouraged to consult the references below for strategies to address this problem.

Useful resources:

  • Rosner B, Spiegelman D, Willett WC. Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. Am J Epidemiol. 1990;132:734-45. PMID: 2403114

  • Carroll R. Measurement Error in epidemiologic studies. In Encyclopedia of Biostatistics. New York: John Wiley & Sons; 1998. ISBN: 0471975761

4. Measures of Effect and Risk

Clinically meaningful estimates

Authors should report results for meaningful metrics rather than reporting raw results. For example, rather than reporting the log odds ratio from a logistic regression, authors should transform coefficients into the appropriate measure of effect size, odds ratio, relative risk, or risk difference. Estimates, such as an odds ratio or relative risk, should not be reported for a 1-unit change in the factor of interest if a 1-unit change lacks clinical meaning (age, mm Hg of blood pressure, or any other continuous or interval measurement with small units). All estimates should reflect a clinically meaningful change, along with 95% confidence bounds.

Between-group differences

For comparisons of interventions (e.g., trials), focus on between- group differences, with 95% confidence intervals of the differences, and not on within-group differences. State the results using absolute numbers (numerator/denominator) when feasible. When discussing effects, refer to the confidence intervals rather than P values and point out for readers if the confidence intervals exclude the possibility of significant clinical benefit or harm.

Odds ratios and predicted probabilities

Authors often report odds ratios for multivariable results when the odds ratio is difficult to interpret or not meaningful. First, the odds ratio might overstate the effect size when the reference risk is high. For example, if the reference risk is 25% (odds = 0.33) and the odds ratio is 3.0, the relative risk is only 2.0. Statements such as “3-fold increased risk” or “3 times the risk” are incorrect. Second, readers want an easily understood measure of the level of risk (and the confidence intervals) for different groups of patients as defined by treatment, exposure, and covariates. Consider providing a table of predicted probabilities for each of the factors of interest, and confidence intervals of those predicted probabilities. Moreover, a multiway table that cross-classifies predicted probabilities by the most important factor and then adjusts for the remaining factors will often be more meaningful than a table of adjusted odds ratios. Standard commercial software can produce predicted probabilities and confidence bounds.

Useful resource:

  • Altman DG, Deeks JJ, Sackett DL. Odds ratios should be avoided when events are common. BMJ. 1998;317:1318. PMID: 9804732

Hazard Ratios and Standardized Cumulative Incidence

Authors often report results from analysis of survival or time-to-event data using hazard ratios estimated from proportional hazards Cox models. Hazard ratios are notoriously difficult to interpret clinically, may be sensitive to the length of follow-up, and rely on model assumptions, such as proportional hazards. In addition, presenting estimates of effect in both absolute and relative terms increases the likelihood that results will be correctly interpreted. For all of these reasons, we recommend that authors present cumulative incidence curves (inverted Kaplan-Meier plots) along with tabular summaries of absolute differences in cumulative incidence, with 95% confidence bounds, at meaningful times, when reporting results from survival analyses. When such an analysis requires covariate adjustment, authors can estimate and present covariate-standardized (weighted) cumulative incidence curves with differences in adjusted cumulative incidence at meaningful times.

Useful resources:

  • Hernan MA. The hazards of hazard ratios. Epidemiology. 2010;21:13-5. PMID: 20010207

  • Uno H, Wittes J, Fu H, et al. Alternatives to hazard ratios for comparing the efficacy or safety of therapies in noninferiority studies. Ann Intern Med. 2015;163:127-134. doi:10.7326/M14-1741

  • Therneau T, Crowson CS, Atkinson EJ. Adjusted Survival Curves. http://cran.r-project.org/web/packages/survival/vignettes/adjcurve.pdf

  • Cole SR, Hernan MA. Adjusted survival curves with inverse probability weights. Comput Methods Programs Biomed. 2004;75:45-49. PMID: 15158046

  • Zhang X, Zhang MJ. SAS macros for estimation of direct adjusted cumulative incidence curves under proportional subdistribution hazards models. Comput Methods Programs Biomed. 2011;101(1):87-93. doi:10.1016/j.cmpb.2010.07.005

  • Storer BE, Gooley TA, Jones MP. Adjusted estimates for time-to-event endpoints. Lifetime Data Anal. 2008;14(4):484-495. doi:10.1007/s10985-008-9098-9.

5. Missing Data

Missing variables

Always report the frequency of missing variables and how the analysis handled missing data. Consider adding a column to tables or a row under figures that makes clear the amount of missing data. Avoid using a simple indicator or dummy variable to represent a missing value. Replacing missing predictors with dummy variables or missing indicators generally leads to biased estimates.

Useful resources:

  • Sterne JA, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009;338:b2393. PMID: 19564179

  • Vach W, Blettner M. Biased estimation of the odds ratio in case-control studies due to the use of ad hoc methods or correcting for missing values of confounding variables. Am J Epidemiol. 1991;134:895-907. PMID: 1670320

  • Vach W, Blettner M. Missing data in epidemiologic studies. In Encyclopedia of Biostatistics. New York: John Wiley & Sons; 1998:2641-54. ISBN: 0471975761

  • Greenland S, Finkle WD. A critical look at methods for handling missing covariates in epidemiologic regression analyses. Am J Epidemiol. 1995;142:1255-64. PMID: 7503045

  • Allison PD. Missing Data. Thousand Oaks, California: Sage Publications; 2002. ISBN: 0761916725

Missing Outcomes

Always report the frequency of missing outcomes and follow-up data, reasons and any patterns for the missing data, and how you handled missing data in the analyses. Do not use a last observation carried forward approach (LOCF) to address incomplete follow-up even if the original protocol prespecified that approach for handling missing data. LOCF approaches understate variability and result in bias. The direction of the bias is not predictable. Although the method of addressing missing data may have little import on findings when the proportion of missing data is small (e.g., <5%), authors should avoid using outdated or biased methods to address incomplete follow-up. Appropriate methods for handling missing data include imputation, pattern-mixture (mixed) models, and selection models. Application of these methods requires consideration of the patterns and potential mechanisms behind the missing data.

Useful resources:

  • Fitzmaurice GM, Laird NM, Ware JH. Applied Longitudinal Analysis. New York: John Wiley & Sons; 2011: chapters 17 and 18. ISBN: 0470380277

  • Molenberghs G, Kenward MG. Missing Data in Clinical Studies. London: John Wiley & Sons; 2007. ISBN: 0470849811

  • Molenberghs G, Verbeke G. Models for Discrete Longitudinal Data. New York: Springer; 2005: chapters 26-32. ISBN: 0387251448

  • National Research Council. The Prevention and Treatment of Missing Data in Clinical Trials. Panel on Handling Missing Data in Clinical Trials. Committee on National Statistics, Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press; 2010. ISBN: 0309158145 www.nap.edu/catalog/12955.html

  • Liao J, Stack CB. Annals Understanding Clinical Research: implications of missing data due to dropout. Ann Intern Med. 2017;166(8):596-598. doi:10.7326/M17-0195

6. Longitudinal Analyses

Consider using longitudinal analyses if outcome data were collected at more than 1 time point. Some methodological and reporting options follow. With an appropriate model for longitudinal analysis, you can report differences within groups over time, differences between groups, and differences across groups of their within-group changes over time (usually the key contrast of interest). You can control for any confounding that might emerge, such as a difference in a variable (e.g., body weight) among those who remained in the study until completion. Longitudinal analysis options include a population averaged analysis (generalized estimating equations [GEEs], for example) that estimates the time by treatment interaction and adjusts variance for the repeated measures within individuals over time. Another option is a mixed effects model, with random effects for patient, and the estimate of interest being the time by treatment interaction. In choosing a model, consider whether any missing data are missing at random (i.e., “ignorable” missing data) or missing dependent on the observed data (i.e., informative missing data). In GEE analyses, missing data are assumed to be missing completely at random independent of both observed and unobserved data. In random coefficient analysis, missing data are assumed missing at random dependent on observed data but not on unobserved data.

Useful resources:

  • Fitzmaurice GM, Laird NM, Ware JH. Applied Longitudinal Analysis. New York: John Wiley & Sons; 2011. ISBN: 0470380277

  • Singer JD, Willett JB. Applied Longitudinal Data Analysis. New York: Oxford University Press; 2003. ISBN: 0195152964

  • Twisk JWR. Applied Longitudinal Data Analysis for Epidemiology: A Practical Guide. New York: Cambridge University Press; 2003 ISBN: 0521819768

7. Sensitivity Analysis and Unmeasured Confounding

Analyses of observational data that attempt to assess causality between an exposure and an outcome are generally subject to confounding due to unmeasured or omitted covariates. In these settings, authors should carry out formal sensitivity analysis to assess how strong an unmeasured confounder would need to be to explain away an observed association. One relatively easy to use method that does not require strong assumptions is the E-value, proposed by VanderWeele and Ding (2017), an article that contains many references, includes a technical appendix, and points to available software. After conducting formal sensitivity, or bias analysis, authors can discuss the likelihood that the reported results are due to residual confounding.

Useful resources:

  • VanderWeele TJ, Ding P. Sensitivity analysis in observational research: introducing the E-value. Ann Intern Med. 2017;167:268-74.doi:10.7326/M16-2607

  • Ding P, VanderWeele TJ. Sensitivity analysis without assumptions. Epidemiology. 2016;27:368-77. PMID: 26841057

  • Lin DY, Psaty BM, Kronmal RA. Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics. 1998;54:948-63. PMID: 9750244

8. Meta-analysis

Issues to Consider Before Pooling

Consider all sources of clinical variation across studies (e.g., study populations, interventions or comparators, outcome definition, and timing) when making decisions about how and when to pool quantitatively.

Note that potential sources of methodological or clinical heterogeneity (e.g., risk of bias, intensity of an intervention; particular study population) identified a priori provide the strongest and most meaningful basis for explaining study heterogeneity. State whether you identified potential sources of heterogeneity prior to initiating your review and analyses, and describe whether and how you carried out subgroup or sensitivity analyses or meta-regression to explore that heterogeneity. Make clear which such analyses were prespecified and which were not.

Consider the number of studies and their relative size when deciding whether or not to pool. It is true that you can compute a pooled treatment effect when you have at least 2 studies; but you need to ask yourself, "Does it make clinical or methodological sense to do so?"

When the included studies naturally fall into subgroups based on patient populations or clinical features, it is often more informative to stratify the primary analysis based on the subgroups.

Statistical tests of heterogeneity or visual inspection of the variation via a forest plot are insufficient guides to pooling when there are less than 10 studies.

When the studies provide inconsistent estimates that vary widely, an overall treatment effect may not represent the actual treatment effect. In this case, a narrative presentation and critique can be clinically more informative. Such discussion should consider characteristics of these studies and patient populations that might account for the observed differences in effect sizes.

When there are a small number of studies that vary greatly in size, it may be more clinically informative to consider the larger study or studies separately from the smaller ones. For example, suppose you have 3 relatively small homogeneous studies and 1 fairly large trial based on the same population of patients. If the study estimates from the smaller trials are consistent with that observed in the larger trial, pooling is appropriate. If the study estimates from the smaller trials are quite different from the larger trial, a pooled estimate may not provide a good summary of the evidence. In this case, a more detailed description and analysis of the information from the larger trial alone can be more clinically informative.

Useful resource:

Choosing an Appropriate Pooling Method

Use pooling methods that are appropriate for the data. For example, when it is appropriate to pool studies whose estimates vary widely, please be aware of the literature on the inadequate performance of the DerSimonian-Laird method for estimating confidence bounds and P values when the number of studies is small or when there are substantive differences among study estimates. In such situations, use one of the more robust alternative random-effects estimators such as the profile likelihood method, the Sidik-Jonkman estimator with the Hartung-Knapp small sample adjustment or hierarchical Bayesian models, any of which provide a better accounting of uncertainty.

Useful resources:

Pooling Studies With Low Event Rates

When summarizing studies with 0 or very low event rates, avoid using methods, such as the Peto or the Mantel-Haenszel method with the standard 0.5 continuity correction. These methods underestimate variance and result in confidence intervals that are too narrow. Either the exact Mantel-Haenszel without continuity correction or the treatment group continuity correction provides the reasonably robust and accurate estimates when there are zero events in one of the treatment groups.

Useful resources:

  • Bradburn MJ, Deeks JJ, Berlin JA, Localio AR. Much ado about nothing: a comparison of the performance of meta-analytical methods with rare events. Stat Med. 2007;26:53-77. PMID: 16596572

  • Sweeting MJ, Sutton AJ, Lambert PC. What to add to nothing? Use and avoidance of continuity corrections in meta-analysis of sparse data. Stat Med.2004;23:1351-75. PMID: 15116347

  • Mulrow CD, Cornell JE, Localio AR. Rosiglitazone: a thunderstorm from scarce and fragile data. Ann Intern Med. 2007;147:585-587.

  • Cai T, Parast L, Ryan L. Meta-analysis for rare events. Stat Med. 2010;29(20):2078-89. PMID: 20623822

  • Rucker G, Schwarzer G, Carpenter J, Olkin I. Why add anything to nothing? The arcsine difference as a measure of treatment effect in meta-analysis with zero cells. Stat Med. 2009;28(5):721-38. PMID: 19072749

Documenting Your Methods

Identify the specific statistical model used to pool the data, evaluate statistical heterogeneity, and construct subgroup analyses. Specify the software platform (SAS, Stata, R, etc.) as well as the actual program and options specified for each of the analyses. For example, specify the exact random effects method used to pool treatment effects or compute your meta-regression. If you use meta-regression, specify how the independent variables are coded for the model. Simple binary variables and mean values are easily understood and modeled. Note that proportions are naturally nonlinear and are best represented and modeled on either the logit or arcsine scale.

For more complex analyses, such as network meta-analyses and hierarchical Bayesian, provide a detailed technical appendix that includes software code annotated for reader comprehension.

Useful resource:

Additional Guidance

Avoid using funnel plots and regression tests for small study effects when there are too few studies to adequately assess small study effects (less than 10).

Avoid using outdated or overly simplistic methods for risk of bias assessments and summary quality scores (e.g., Jadad scale). Consider using the Cochrane Risk of Bias tool for clinical trials. The ROBIN-I or the Newcastle-Ottawa are good tools for assessing risk of bias for observational studies, and the QUADAS II for diagnostic test studies.

Provide numerator and denominator data for the individual trials in forest plots.

9. Statistical Significance and P Values

Avoid interpreting results based upon statistical significance alone, and follow the principles of proper use and interpretation of the p-value from the American Statistical Association. (ASA's Statement on Statistical Significance and P-values) Consider the clinical importance of observed differences and the width of 95% confidence intervals when interpreting results. In situations where results are consistent with 'no difference' be sure to differentiate results that are indeterminate (consistent with clinically meaningful benefits) from those that are negative (rule-out clinically meaningful benefits).

Useful resources:

10. Figures and Tables

The following references give useful information about the design and format of informative tables and figures:

  • Tufte ER. The Visual Display of Quantitative Information. Cheshire CT: Graphic Press; 1983: 178. ISBN: 0961392142

  • Wainer, H. How to display data badly. The American Statistician. 1984;38:137-47.

  • Wainer H. Visual Revelations: Graphical Tales of Fate and Deception From Napoleon Bonaparte to Ross Perot. New Jersey: Lawrence Erlbaum Associates.;1997. ISBN: 038794902X

  • Pocock SJ, Clayton TC, Altman DG. Survival plots of time-to-event outcomes in clinical trials: good practice and pitfalls. Lancet 2002;359:1686-89. PMID: 12020548

Return to the main Information for Authors page.

×

You need a subscription to this content to use this feature.

×
PDF Downloads Require Access to the Full Article.
Annals of Internal Medicine
To receive access to the full text of freely available articles, alerts, and more. You will be directed to acponline.org to complete your registration.
×
Access to this Free Content Requires Users to be Registered and Logged In. Please Choose One of the Following Options
Annals of Internal Medicine
To receive access to the full text of freely available articles, alerts, and more. You will be directed to acponline.org to complete your registration.
×