0

Analysis

 

Search: control-f. Print: control-p

Information for Authors


Statistical Guidelines


Presentation
Issue Notes

Percentages   

 

Report percentages to one decimal place (i.e., xx.x%) when sample size is >=200.

To avoid the appearance of a level of precision that is not present with small samples, do not use decimal places (i.e., xx%, not xx.xx%) when sample size is < 200.

Standard deviations

 

Use “mean (SD)” rather than “mean ± SD” notation. The ± symbol is ambiguous and can represent standard deviation or standard error.

Standard errors

 

Report confidence intervals, rather than standard errors, when possible.

P values

 

For P values between 0.001 and 0.20, please report the value to the nearest thousandth. For P values greater than 0.20, please report the value to the nearest hundredth. For P values less than 0.001, report as “P<0.001.”

“Trend”

 

Use the word trend when describing a test for trend or dose-response.

Avoid the term trend when referring to P values near but not below 0.05. In such instances, simply report a difference and the confidence interval of the difference (if appropriate) with or without the P value.

Statistical software

 

Specify in the statistical analysis section the statistical software—version, manufacturer, manufacturer’s location, and the specific functions, procedures, or programs—used for analyses.
Cox models When reporting the findings from Cox proportional hazards models:
  • Do not describe hazard ratios as relative risks.
  • Do report how the assumption of proportional hazards was tested, and what the test showed.

Descriptive tables

 

In tables that simply describe characteristics of 2 or more groups (e.g., Table 1 of a clinical trial):
  • Report averages with standard deviations, not standard errors, when data are normally distributed.
  • Report median (minimum, maximum) or median (25th, 75th percentile [interquartile range, or IQR]) when data are not normally distributed.
  • Avoid reporting P values as there can be imbalance when p’s are not significant (because of small sample size) and balance when P values are significant (because of large sample size).

Tables reporting multivariable analyses

 

Authors sometimes present tables that compare one by one an outcome with multiple individual factors followed by a multivariable analysis that adjusts for confounding. If confounding is present, as is often the case, the one-way comparisons are simply intermediate steps that offer little useful information for the reader. In general, omit presenting these intermediate steps in the manuscript and do not focus on them in the Results or Discussion.

Tables and figures (general)

 

The following references give useful information about the design and format of informative tables and figures:

Tufte ER. The Visual Display of Quantitative Information. Cheshire CT: Graphic Press; 1983, p 178. ISBN: 0961392142

Wainer, H. How to display data badly. The American Statistician 1984; 38:137-147. Google Scholar

Wainer H. Visual Revelations: graphical tales of fate and deception from Napoleon Bonaparte to Ross Perot. New Jersey: Lawrence Erlbaum Associates, Inc.;1997. ISBN: 038794902X

Pocock SJ, Clayton TC, Altman DG. Survival plots of time-to-event outcomes in clinical trials: good practice and pitfalls. Lancet 2002; 359:1686-89. PMID: 12020548

Also, follow a few simple rules of thumb:

  1. Avoid pie charts.
  2. Avoid simple bar plots that do not present measures of variability.
  3. Provide raw data (numerators and denominators) in the margins of meta-analysis forest plots.
  4. Depict numbers of people at risk at different times in survival plots. (see Pocock et al. above).

Multivariable Analysis TOP

Screening covariates

Approaches that select factors for inclusion in a multivariable model only if the factors are “statistically significant” in “bivariate screening” are not optimal. A factor can be a confounder even if it is not statistically significant by itself because it changes the effect of the exposure of interest when it is included in the model, or because it is a confounder only when included with other covariates.

Reference

Sun GW, Shook TL, Kay GL. Inappropriate use of bivariable analysis to screen risk factors for use in multivariable analysis. J Clin Epidemiol. 1996;49:907-16. PMID: 8699212

Model building

Authors should avoid stepwise methods of model building, except for the narrow application of hypothesis generation for subsequent studies. Stepwise methods include forward, backward, or combined procedures for the inclusion and exclusion of variables in a statistical model based on predetermined P value criteria. Better strategies than P value driven approaches for selecting variables are those that use external clinical judgment. Authors might use a bootstrap procedure to determine which variables, under repeated sampling, would end up in the model using stepwise variable selection procedures. Regardless, authors should tell readers how model fit was assessed, how and which interactions were explored, and the results of those assessments.

References

Collett D, Stepniewska K. Some practical issues in binary data analysis. Statist Med. 1999;18:2209-21. PMID: 10474134

Mickey RM, Greenland S. The impact of confounder selection criteria on effect estimation. Am J Epidemiol. 1989;129:125-37. PMID: 2910056

Steyerberg EW, Eijkemans MJC, Harrell FE, Jr., Habbema JDF. Prognostic modeling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. Statist Med. 2000;19:1059-1079. PMID: 10790680

Steyerberg EW, Eijkemans MJC, Habbema DF. Stepwise selection in small data sets: a simulation study of bias in logistic regression analysis. J Clin Epidemiol. 1999;52:935-42. PMID: 10513756

Altman D, Andersen PK. Bootstrap investigation of the stability of a Cox regression model. Statist Med. 1989;8:771-83. PMID: 2672226

Mick R, Ratain MJ. Bootstrap validation of pharmacodynamic models defined via stepwise linear regression. Clin Pharmacol Ther. 1994;56:217-22. PMID: 8062499

Harrell FE, Jr, et al. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statist Med. 1996;15:361-87. PMID: 8668867

Measurement Error

If several risk factors for disease are considered in a logistic regression model and some of these risk factors are measured with error, the point and interval estimates of relative risk corresponding to any of these factors may be biased either toward or away from the null value; the direction of bias is never certain. In addition to potentially biased estimates, confidence intervals of correctly adjusted estimates will be wider, sometime substantially, than naïve confidence intervals. Authors are encouraged to consult the references below for strategies to address this problem.

References

Rosner B, Spiegelman D, Willett WC. Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. Am J Epidemiol. 1990;132:734-45. PMID: 2403114

Carroll R. Measurement Error in Epidemiologic Studies. In Encyclopedia of Biostatistics. New York: John Wiley & Sons; 1998. ISBN: 0471975761.

Measures of Effect and Risk TOP

Clinically meaningful estimates

Authors should report results for meaningful metrics rather than reporting raw results. For example, rather than reporting the log odds ratio from a logistic regression, authors should transform coefficients into the appropriate measure of effect size, odds ratio, relative risk, or risk difference. Don’t give readers an estimate, such as an odds ratio or relative risk, for a one unit change in the factor of interest when a 1-unit change lacks clinical meaning (age, mm Hg of blood pressure, or any other continuous or interval measurement with small units). All estimates should reflect a clinically meaningful change, along with 95% confidence bounds.

Between-group differences

For comparisons of interventions (e.g., trials), focus on between- group differences, with 95% confidence intervals of the differences, and not on within-group differences. State the results using absolute numbers (numerator/denominator) when feasible. When discussing effects, refer to the confidence intervals rather than P values and point out for readers if the confidence intervals exclude the possibility of significant clinical benefit or harm.

Odds ratios and predicted probabilities

Authors often report odds ratios for multivariable results when the odds ratio is difficult to interpret or not meaningful. First, the odds ratio might overstate the effect size when the reference risk is high. For example, if the reference risk is 25% (odds = 0.33) and the odds ratio is 3.0, the relative risk is only 2.0. Statements such as “3-fold increased risk” or “3 times the risk” are incorrect. Second, readers want an easily understood measure of the level of risk (and the confidence intervals) for different groups of patients as defined by treatment, exposure, and covariates. Consider providing a table of predicted probabilities for each of the factors of interest, and confidence intervals of those predicted probabilities. Moreover, a multiway table that cross classifies predicted probabilities by the most important factor and then adjusts for the remaining factors will often be more meaningful than a table of adjusted odds ratios. Standard commercial software can produce predicted probabilities and confidence bounds.

Reference

Altman DG, Deeks JJ, Sackett DL. Odds ratios should be avoided when events are common. BMJ. 1998;317:1318. PMID: 9804732

Missing Data TOP

Missing variables

Always report the frequency of missing variables and how the analysis handled missing data. Consider adding a column to tables or a row under figures that makes clear the amount of missing data. Avoid using a simple indicator or dummy variable to represent a missing value. Replacing missing predictors with dummy variables or missing indicators generally leads to biased estimates.

References

Sterne, White, Carlin, Spratt, Royston, Kenward, Wood and Carpenter. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009; 338:b2393. PMCID: PMC2714692  

Vach W, Blettner M. Biased estimation of the odds ratio in case-control studies due to the use of ad hoc methods or correcting for missing values of confounding variables. Am J Epidemiol. 1991;134:895-907. PMID: 1670320

Vach W, Blettner M. Missing data in epidemiologic studies. In Encyclopedia of Biostatistics. New York: John Wiley & Sons; 1998:2641-2654. ISBN: 0471975761

Greenland S, Finkle WD. A critical look at methods for handling missing covariates in epidemiologic regression analyses. Am J Epidemiol. 1995;142:1255-64. PMID: 7503045

Allison PD. Missing Data. Thousand Oaks, California: Sage Publications, Inc., 2002. ISBN: 0761916725

Missing Outcomes

Always report the frequency of missing outcomes and follow-up data; reasons and any patterns for the missing data; and how you handled missing data in the analyses. Do not use a last observation carried forward approach (LOCF) to address incomplete follow-up even if the original protocol prespecified that approach for handling missing data. LOCF approaches understate variability and result in bias. The direction of the bias is not predictable. Although the method of addressing missing data may have little import on findings when the proportion of missing data is small (e.g., <5%), authors should avoid using outdated or biased methods to address incomplete follow-up. Appropriate methods for handling missing data include imputation, pattern-mixture (mixed) models, and selection models. Application of these methods requires consideration of the patterns and potential mechanisms behind the missing data.

References

Fitzmaurice GM, Laird NM, Ware JH. Applied Longitudinal Analysis. New York; John Wiley & Sons:2011:chapters 17 and 18. ISBN: 0470380277

Molenberghs G and Kenward MG. Missing Data in Clinical Studies. London: John Wiley & Sons 2007. ISBN: 0470849811

Molenberghs G, Verbeke G. Models for Discrete Longitudinal Data. New York: Springer;2005:chapters 26-32. ISBN: 0387251448

National Research Council. The Prevention and Treatment of Missing Data in Clinical Trials. Panel on Handling Missing Data in Clinical Trials. Committee on National Statistics, Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press 2010. ISBN: 0309158145  www.nap.edu/catalog/12955.html

Longitudinal Analyses TOP

Consider using longitudinal analyses if outcome data were collected at more than 1 time point. With an appropriate model for longitudinal analysis, you can report differences within groups over time, differences between groups, and differences across groups of their within-group changes over time (usually the key contrast of interest). You can control for any confounding that might emerge, such as a difference in a variable (e.g., body weight) among those who remained in the study until completion. Longitudinal analysis options include a population averaged analysis (generalized estimating equations [GEEs], for example) that estimates the time by treatment interaction and adjusts variance for the repeated measures within individuals over time. Another option is a mixed effects model, with random effects for patient, and the estimate of interest being the time by treatment interaction. In choosing a model, consider whether any missing data are missing at random (i.e. “ignorable” missing data) or missing dependent on the observed data (i.e. informative missing data). In GEE analyses, missing data are assumed to be missing completely at random independent of both observed and unobserved data. In random coefficient analysis, missing data are assumed missing at random dependent on observed data but not on unobserved data.

Reference

Fitzmaurice GM, Laird NM and Ware JH. Applied Longitudinal Analysis. New York: John Wiley & Sons 2011. ISBN: 0470380277.

Singer JD and Willett JB. Applied Longitudinal Data Analysis. New York: Oxford University Press 2003. ISBN: 0195152964

Twisk JWR. Applied longitudinal data analysis for epidemiology: a practical guide. Cambridge University Press. New York 2003 ISBN: 0521819768

Buy Now

to gain full access to the content and tools.

Want to Subscribe?

Learn more about subscription options

Forgot your password?
Enter your username and email address. We'll send you a reminder to the email address on record.
(Required)
(Required)