Jan P. Vandenbroucke, MD; Erik von Elm, MD; Douglas G. Altman, DSc; Peter C. Gøtzsche, MD; Cynthia D. Mulrow, MD; Stuart J. Pocock, PhD; Charles Poole, ScD; James J. Schlesselman, PhD; Matthias Egger, MD; for the STROBE initiative
Note: The following individuals have contributed to the content and elaboration of the STROBE Statement: Douglas G. Altman, Maria Blettner, Paolo Boffetta, Hermann Brenner, Geneviève Chêne, Cyrus Cooper, George Davey-Smith, Erik von Elm, Matthias Egger, France Gagnon, Peter C. Gøtzsche, Philip Greenland, Sander Greenland, Claire Infante-Rivard, John Ioannidis, Astrid James, Giselle Jones, Bruno Ledergerber, Julian Little, Margaret May, David Moher, Hooman Momen, Alfredo Morabia, Hal Morgenstern, Cynthia D. Mulrow, Fred Paccaud, Stuart J. Pocock, Charles Poole, Martin Röösli, Dietrich Rothenbacher, Kenneth Rothman, Caroline Sabin, Willi Sauerbrei, Lale Say, James J. Schlesselman, Jonathan Sterne, Holly Sydall, Jan P. Vandenbroucke, Ian White, Susan Wieland, Hywel Williams, and Guang Yong Zou.
Acknowledgments: The authors thank Gerd Antes, Kay Dickersin, Shah Ebrahim, Richard Lilford, and Drummond Rennie for supporting the STROBE Initiative. They also thank the following institutions that have hosted working meetings of the coordinating group: Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland; Department of Social Medicine, University of Bristol, Bristol, United Kingdom; London School of Hygiene & Tropical Medicine, London, United Kingdom; Nordic Cochrane Centre, Copenhagen, Denmark; and Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom. Finally, they thank the 4 anonymous reviewers who provided helpful comments on a previous draft of this paper.
Grant Support: The workshop was funded by the European Science Foundation. Additional funding was received from the Medical Research Council Health Services Research Collaboration and the National Health Services Research & Development Methodology Programme. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Potential Financial Conflicts of Interest: None disclosed.
Requests for Single Reprints: Matthias Egger, MD, Institute of Social and Preventive Medicine, Finkenhubelweg 11, CH-3012 Bern, Switzerland; e-mail, firstname.lastname@example.org.
Current Author Addresses: Dr. Vandenbroucke: Department of Clinical Epidemiology, Leiden University Medical Center, PO Box 9600, 2300 RC Leiden, the Netherlands.
Drs. von Elm and Egger: University of Bern, Institute of Social and Preventive Medicine, Finkenhubelweg 11, CH-3012 Bern, Switzerland.
Dr. Altman: Centre for Statistics in Medicine, Wolfson College Annexe, Linton Road, Oxford OX2 6UD, United Kingdom.
Dr. Gøtzsche: The Nordic Cochrane Centre, Rigshospitalet, Department 7112, Blegdamsvej 9, DK-2100 Copenhagen Ø, Denmark.
Dr. Mulrow: American College of Physicians, 190 N. Independence Mall West, Philadelphia, PA 19106-1572.
Dr. Pocock: Medical Statistics Unit, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, United Kingdom.
Dr. Poole: Department of Epidemiology, University of North Carolina School of Public Health, Pittsboro Road, Chapel Hill, NC 27599-7435.
Dr. Schlesselman: Biostatistics Facility, University of Pittsburgh Cancer Institute, Sterling Plaza, Suite 325, 201 North Craig Street, Pittsburgh, PA 15213.
Much medical research is observational. The reporting of observational studies is often of insufficient quality. Poor reporting hampers the assessment of the strengths and weaknesses of a study and the generalizability of its results. Taking into account empirical evidence and theoretical considerations, a group of methodologists, researchers, and editors developed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) recommendations to improve the quality of reporting of observational studies.
The STROBE Statement consists of a checklist of 22 items, which relate to the title, abstract, introduction, methods, results, and discussion sections of articles. Eighteen items are common to cohort studies, case–control studies, and cross-sectional studies, and 4 are specific to each of the 3 study designs. The STROBE Statement provides guidance to authors about how to improve the reporting of observational studies and facilitates critical appraisal and interpretation of studies by reviewers, journal editors, and readers.
This explanatory and elaboration document is intended to enhance the use, understanding, and dissemination of the STROBE Statement. The meaning and rationale for each checklist item are presented. For each item, 1 or several published examples and, where possible, references to relevant empirical studies and methodological literature are provided. Examples of useful flow diagrams are also included. The STROBE Statement, this document, and the associated Web site (www.strobe-statement.org) should be helpful resources to improve reporting of observational research.
Appendix Table. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: Checklist of Items That Should Be Addressed in Reports of Observational Studies
Main Study Designs Covered by STROBE
“Leukaemia incidence among workers in the shoe and boot manufacturing industry: a case–control study” (18).
“Background: The expected survival of HIV-infected patients is of major public health interest.
Objective: To estimate survival time and age-specific mortality rates of an HIV-infected population compared with that of the general population.
Design: Population-based cohort study.
Setting: All HIV-infected persons receiving care in Denmark from 1995 to 2005.
Patients: Each member of the nationwide Danish HIV Cohort Study was matched with as many as 99 persons from the general population according to sex, date of birth, and municipality of residence.
Measurements: The authors computed Kaplan–Meier life tables with age as the time scale to estimate survival from age 25 years. Patients with HIV infection and corresponding persons from the general population were observed from the date of the patient's HIV diagnosis until death, emigration, or 1 May 2005.
Results: 3990 HIV-infected patients and 379 872 persons from the general population were included in the study, yielding 22 744 (median, 5.8 y/person) and 2 689 287 (median, 8.4 y/person) person-years of observation. Three percent of participants were lost to follow-up. From age 25 years, the median survival was 19.9 years (95% CI, 18.5 to 21.3) among patients with HIV infection and 51.1 years (CI, 50.9 to 51.5) among the general population. For HIV-infected patients, survival increased to 32.5 years (CI, 29.4 to 34.7) during the 2000 to 2005 period. In the subgroup that excluded persons with known hepatitis C coinfection (16%), median survival was 38.9 years (CI, 35.4 to 40.1) during this same period. The relative mortality rates for patients with HIV infection compared with those for the general population decreased with increasing age, whereas the excess mortality rate increased with increasing age.
Limitations: The observed mortality rates are assumed to apply beyond the current maximum observation time of 10 years.
Conclusions: The estimated median survival is more than 35 years for a young person diagnosed with HIV infection in the late highly active antiretroviral therapy era. However, an ongoing effort is still needed to further reduce mortality rates for these persons compared with the general population” (21).
“Concerns about the rising prevalence of obesity in children and adolescents have focused on the well-documented associations between childhood obesity and increased cardiovascular risk and mortality in adulthood. Childhood obesity has considerable social and psychological consequences within childhood and adolescence, yet little is known about social, socioeconomic, and psychological consequences in adult life. A recent systematic review found no longitudinal studies on the outcomes of childhood obesity other than physical health outcomes and only two longitudinal studies of the socioeconomic effects of obesity in adolescence. Gortmaker et al. found that US women who had been obese in late adolescence in 1981 were less likely to be married and had lower incomes seven years later than women who had not been overweight, while men who had been overweight were less likely to be married. Sargent et al. found that UK women, but not men, who had been obese at 16 years in 1974 earned 7.4% less than their nonobese peers at age 23.…We used longitudinal data from the 1970 British birth cohort to examine the adult socioeconomic, educational, social, and psychological outcomes of childhood obesity” (26).
“Our primary objectives were to 1) determine the prevalence of domestic violence among female patients presenting to four community-based, primary care, adult medicine practices that serve patients of diverse socioeconomic background and 2) identify demographic and clinical differences between currently abused patients and patients not currently being abused” (27).
“We used a case-crossover design, a variation of a case–control design that is appropriate when a brief exposure (driver's phone use) causes a transient rise in the risk of a rare outcome (a crash). We compared a driver's use of a mobile phone at the estimated time of a crash with the same driver's use during another suitable time period. Because drivers are their own controls, the design controls for characteristics of the driver that may affect the risk of a crash but do not change over a short period of time. As it is important that risks during control periods and crash trips are similar, we compared phone activity during the hazard interval (time immediately before the crash) with phone activity during control intervals (equivalent times during which participants were driving but did not crash) in the previous week” (28).
“The Pasitos Cohort Study recruited pregnant women from Women, Infant, and Child clinics in Socorro and San Elizario, El Paso County, Texas and maternal-child clinics of the Mexican Social Security Institute in Ciudad Juarez, Mexico from April 1998 to October 2000. At baseline, prior to the birth of the enrolled cohort children, staff interviewed mothers regarding the household environment. In this ongoing cohort study, we target follow-up exams at 6-month intervals beginning at age 6 months” (36).
“Participants in the Iowa Women's Health Study were a random sample of all women ages 55 to 69 years derived from the state of Iowa automobile driver's license list in 1985, which represented approximately 94% of Iowa women in that age group.…Follow-up questionnaires were mailed in October 1987 and August 1989 to assess vital status and address changes.…Incident cancers, except for nonmelanoma skin cancers, were ascertained by the State Health Registry of Iowa…. The Iowa Women's Health Study cohort was matched to the registry with combinations of first, last, and maiden names, zip code, birth date, and social security number” (38).
“Cutaneous melanoma cases diagnosed in 1999 and 2000 were ascertained through the Iowa Cancer Registry…. Controls, also identified through the Iowa Cancer Registry, were colorectal cancer patients diagnosed during the same time. Colorectal cancer controls were selected because they are common and have a relatively long survival, and because arsenic exposure has not been conclusively linked to the incidence of colorectal cancer” (39).
“We retrospectively identified patients with a principal diagnosis of myocardial infarction (code 410) according to the International Classification of Diseases, 9th Revision, Clinical Modification, from codes designating discharge diagnoses, excluding the codes with a fifth digit of 2, which designates a subsequent episode of care…A random sample of the entire Medicare cohort with myocardial infarction from February 1994 to July 1995 was selected…To be eligible, patients had to present to the hospital after at least 30 minutes but less than 12 hours of chest pain and had to have ST-segment elevation of at least 1 mm on 2 contiguous leads on the initial electrocardiogram” (40).
“For each patient who initially received a statin, we used propensity-based matching to identify 1 control who did not receive a statin according to the following protocol. First, propensity scores were calculated for each patient in the entire cohort on the basis of an extensive list of factors potentially related to the use of statins or the risk of sepsis. Second, each statin user was matched to a smaller pool of nonstatin users by sex, age (plus or minus 1 year), and index date (plus or minus 3 months). Third, we selected the control with the closest propensity score (within 0.2 SD) to each statin user in a 1:1 fashion and discarded the remaining controls” (46).
“We aimed to select 5 controls for every case from among individuals in the study population who had no diagnosis of autism or other pervasive developmental disorders (PDD) recorded in their general practice record and who were alive and registered with a participating practice on the date of the PDD diagnosis in the case. Controls were individually matched to cases by year of birth (up to 1 year older or younger), sex, and general practice. For each of 300 cases, 5 controls could be identified who met all the matching criteria. For the remaining 994, 1 or more controls was excluded…” (47).
Matching in Case–Control Studies
“Only major congenital malformations were included in the analyses. Minor anomalies were excluded according to the exclusion list of European Registration of Congenital Anomalies (EUROCAT). If a child had more than 1 major congenital malformation of 1 organ system, those malformations were treated as 1 outcome in the analyses by organ system…In the statistical analyses, factors considered potential confounders were maternal age at delivery and number of previous parities. Factors considered potential effect modifiers were maternal age at reimbursement for antiepileptic medication and maternal age at delivery” (55).
“Total caffeine intake was calculated primarily using U.S. Department of Agriculture food composition sources. In these calculations, it was assumed that the content of caffeine was 137 mg per cup of coffee, 47 mg per cup of tea, 46 mg per can or bottle of cola beverage, and 7 mg per serving of chocolate candy. This method of measuring (caffeine) intake was shown to be valid in both the NHS I cohort and a similar cohort study of male health professionals…Self-reported diagnosis of hypertension was found to be reliable in the NHS I cohort” (60).
“Samples pertaining to matched cases and controls were always analyzed together in the same batch and laboratory personnel were unable to distinguish among cases and controls” (61).
“In most case–control studies of suicide, the control group comprises living individuals, but we decided to have a control group of people who had died of other causes…. With a control group of deceased individuals, the sources of information used to assess risk factors are informants who have recently experienced the death of a family member or close associate—and are therefore more comparable to the sources of information in the suicide group than if living controls were used” (64).
“Detection bias could influence the association between Type 2 diabetes mellitus (T2DM) and primary open-angle glaucoma (POAG) if women with T2DM were under closer ophthalmic surveillance than women without this condition. We compared the mean number of eye examinations reported by women with and without diabetes. We also recalculated the relative risk for POAG with additional control for covariates associated with more careful ocular surveillance (a self-report of cataract, macular degeneration, number of eye examinations, and number of physical examinations)” (65).
“The number of cases in the area during the study period determined the sample size” (73).
“A survey of postnatal depression in the region had documented a prevalence of 19.8%. Assuming depression in mothers with normal-weight children to be 20% and an odds ratio of 3 for depression in mothers with a malnourished child, we needed 72 case–control sets (1 case to 1 control) with an 80% power and 5% significance” (74).
“Patients with a Glasgow Coma Scale less than 8 are considered to be seriously injured. A GCS of 9 or more indicates less serious brain injury. We examined the association of GCS in these two categories with the occurrence of death within 12 months from injury” (80).
“The adjusted relative risk was calculated using the Mantel–Haenszel technique, when evaluating if confounding by age or gender was present in the groups compared. The 95% confidence interval (CI) was computed around the adjusted relative risk, using the variance according to Greenland and Robins and Robins et al.” (93).
“Sex differences in susceptibility to the 3 lifestyle-related risk factors studied were explored by testing for biologic interaction according to Rothman: a new composite variable with 4 categories (a−b−, a−b+, a+b−, and a+b+) was redefined for sex and a dichotomous exposure of interest, where a− and b− denote absence of exposure. RR was calculated for each category after adjustment for age. An interaction effect is defined as departure from additivity of absolute effects, and excess RR caused by interaction (RERI) was calculated:
where RR(a+b+) denotes RR among those exposed to both factors where RR(a−b−) is used as reference category (RR = 1.0). Ninety-five percent CIs were calculated as proposed by Hosmer and Lemeshow. RERI of 0 means no interaction” (103).
Interaction (Effect Modification): The Analysis of Joint Effects
“Our missing data analysis procedures used missing at random (MAR) assumptions. We used the MICE (multivariate imputation by chained equations) method of multiple multivariate imputation in STATA. We independently analysed 10 copies of the data, each with missing values suitably imputed, in the multivariate logistic regression analyses. We averaged estimates of the variables to give a single mean estimate and adjusted standard errors according to Rubin's rules” (106).
Missing Data: Problems and Possible Solutions
“In treatment programmes with active follow-up, those lost to follow-up and those followed up at 1 year had similar baseline CD4 cell counts (median 115 cells per µL and 123 cells per µL), whereas patients lost to follow-up in programmes with no active follow-up procedures had considerably lower CD4 cell counts than those followed up (median 64 cells per µL and 123 cells per µL).…Treatment programmes with passive follow-up were excluded from subsequent analyses” (116).
“We used McNemar's test, paired t test, and conditional logistic regression analysis to compare dementia patients with their matched controls for cardiovascular risk factors, the occurrence of spontaneous cerebral emboli, carotid disease, and venous to arterial circulation shunt” (117).
“The standard errors (SE) were calculated using the Taylor expansion method to estimate the sampling errors of estimators based on the complex sample design.…The overall design effect for diastolic blood pressure was found to be 1.9 for men and 1.8 for women, and for systolic blood pressure, it was 1.9 for men and 2.0 for women” (118).
“Because we had a relatively higher proportion of ‘missing’ dead patients with insufficient data (38/148 = 25.7%) as compared to live patients (15/437 = 3.4%)…, it is possible that this might have biased the results. We have, therefore, carried out a sensitivity analysis. We have assumed that the proportion of women using oral contraceptives in the study group applies to the whole (19.1% for dead, and 11.4% for live patients), and then applied two extreme scenarios: either all the exposed missing patients used second-generation pills or they all used third-generation pills” (120).
“Of the 105 freestanding bars and taverns sampled, 13 establishments were no longer in business and 9 were located in restaurants, leaving 83 eligible businesses. In 22 cases, the owner could not be reached by telephone despite 6 or more attempts. The owners of 36 bars declined study participation.…The 25 participating bars and taverns employed 124 bartenders, with 67 bartenders working at least 1 weekly daytime shift. Fifty-four of the daytime bartenders (81%) completed baseline interviews and spirometry; 53 of these subjects (98%) completed follow-up” (129).
“The main reasons for nonparticipation were the participant was too ill or had died before interview (cases 30%, controls < 1%), nonresponse (cases 2%, controls 21%), refusal (cases 10%, controls 29%), and other reasons (refusal by consultant or general practitioner, non-English speaking, mental impairment) (cases 7%, controls 5%)” (140).
Example of a flow diagram.
From reference 141.
Table 1. Characteristics of the Study Base at Enrollment, Castellana G (Italy), 1985–1986
Table 2. Symptom End Points Used in Survival Analysis
“During the 4366 person-years of follow-up (median 5.4, maximum 8.3 years), 265 subjects were diagnosed as having dementia, including 202 with Alzheimer's disease” (149).
Table 3. Rates of HIV-1 Seroconversion by Selected Sociodemographic Variables, 1990–1993
Table 4. Exposure among Liver Cirrhosis Cases and Controls
Table 5. Prevalence of Current Asthma and Diagnosed Hay Fever, by AverageAlternaria alternata Antigen Level in the Household
“We initially considered the following variables as potential confounders by Mantel–Haenszel stratified analysis:…The variables we included in the final logistic regression models were those…that produced a 10% change in the odds ratio after the Mantel–Haenszel adjustment” (155).
Table 6. Relative Rates of Rehospitalization, by Treatment in Patients in Community Care after First Hospitalization Due to Schizophrenia and Schizoaffective Disorder
Table 7. Polychlorinated Biphenyls in Cord Serum
“10 years' use of HRT [hormone replacement therapy] is estimated to result in five (95% CI 3-7) additional breast cancers per 1000 users of oestrogen-only preparations and 19 (15–23) additional cancers per 1000 users of oestrogen–progestagen combinations” (163).
Measures of Association, Effect, and Impact
Table 8. Analysis of Oral Contraceptive Use, Presence of Factor V Leiden Allele, and Risk for Venous Thromboembolism
Table 9. Sensitivity of the Rate Ratio for Cardiovascular Outcome to an Unmeasured Confounder
“We hypothesized that ethnic minority status would be associated with higher levels of cardiovascular disease (CVD) risk factors, but that the associations would be explained substantially by socioeconomic status (SES). Our hypothesis was not confirmed. After adjustment for age and SES, highly significant differences in body mass index, blood pressure, diabetes, and physical inactivity remained between white women and both black and Mexican-American women. In addition, we found large differences in CVD risk factors by SES, a finding that illustrates the high-risk status of both ethnic minority women as well as white women with low SES” (199).
“Since the prevalence of counseling increases with increasing levels of obesity, our estimates may overestimate the true prevalence. Telephone surveys also may overestimate the true prevalence of counseling. Although persons without telephones have similar levels of overweight as persons with telephones, persons without telephones tend to be less educated, a factor associated with lower levels of counseling in our study. Also of concern is the potential bias caused by those who refused to participate as well as those who refused to respond to questions about weight. Furthermore, because data were collected cross-sectionally, we cannot infer that counseling preceded a patient's attempt to lose weight” (200).
“Any explanation for an association between death from myocardial infarction and use of second generation oral contraceptives must be conjectural. There is no published evidence to suggest a direct biologic mechanism, and there are no other epidemiologic studies with relevant results.…The increase in absolute risk is very small and probably applies predominantly to smokers. Due to the lack of corroborative evidence, and because the analysis is based on relatively small numbers, more evidence on the subject is needed. We would not recommend any change in prescribing practice on the strength of these results” (120).
“How applicable are our estimates to other HIV-1-infected patients? This is an important question because the accuracy of prognostic models tends to be lower when applied to data other than those used to develop them. We addressed this issue by penalizing model complexity, and by choosing models that generalized best to cohorts omitted from the estimation procedure. Our database included patients from many countries from Europe and North America who were treated in different settings. The range of patients was broad: men and women from teenagers to elderly people were included, and the major exposure categories were well represented. The severity of immunodeficiency at baseline ranged from not measurable to very severe, and viral load from undetectable to extremely high” (215).
Vandenbroucke JP, Elm EV, Altman DG, et al, for the STROBE initiative. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): Explanation and Elaboration. Ann Intern Med. 2007;147:W–163–W–194. doi: https://doi.org/10.7326/0003-4819-147-8-200710160-00010-w1
Download citation file:
Published: Ann Intern Med. 2007;147(8):W-163-W-194.
Results provided by:
Copyright © 2020 American College of Physicians. All Rights Reserved.
Print ISSN: 0003-4819 | Online ISSN: 1539-3704
Conditions of Use