Academia and the Profession |

Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): Explanation and Elaboration FREE

Jan P. Vandenbroucke, MD; Erik von Elm, MD; Douglas G. Altman, DSc; Peter C. Gøtzsche, MD; Cynthia D. Mulrow, MD; Stuart J. Pocock, PhD; Charles Poole, ScD; James J. Schlesselman, PhD; Matthias Egger, MD, STROBE initiative
[+] Article, Author, and Disclosure Information

From Leiden University Medical Center, Leiden, the Netherlands; Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland; University Medical Centre, Freiburg, Germany; Cancer Research UK/NHS Centre for Statistics in Medicine, Oxford, United Kingdom; Nordic Cochrane Centre, Rigshospitalet, Copenhagen, Denmark; University of Texas Health Science Center, San Antonio, Texas; London School of Hygiene and Tropical Medicine, London, United Kingdom; University of North Carolina School of Public Health, Chapel Hill, North Carolina; University of Pittsburgh Graduate School of Public Health and University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania; and University of Bristol, Bristol, United Kingdom.

Note: The following individuals have contributed to the content and elaboration of the STROBE Statement: Douglas G. Altman, Maria Blettner, Paolo Boffetta, Hermann Brenner, Geneviève Chêne, Cyrus Cooper, George Davey-Smith, Erik von Elm, Matthias Egger, France Gagnon, Peter C. Gøtzsche, Philip Greenland, Sander Greenland, Claire Infante-Rivard, John Ioannidis, Astrid James, Giselle Jones, Bruno Ledergerber, Julian Little, Margaret May, David Moher, Hooman Momen, Alfredo Morabia, Hal Morgenstern, Cynthia D. Mulrow, Fred Paccaud, Stuart J. Pocock, Charles Poole, Martin Röösli, Dietrich Rothenbacher, Kenneth Rothman, Caroline Sabin, Willi Sauerbrei, Lale Say, James J. Schlesselman, Jonathan Sterne, Holly Sydall, Jan P. Vandenbroucke, Ian White, Susan Wieland, Hywel Williams, and Guang Yong Zou.

Acknowledgments: The authors thank Gerd Antes, Kay Dickersin, Shah Ebrahim, Richard Lilford, and Drummond Rennie for supporting the STROBE Initiative. They also thank the following institutions that have hosted working meetings of the coordinating group: Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland; Department of Social Medicine, University of Bristol, Bristol, United Kingdom; London School of Hygiene & Tropical Medicine, London, United Kingdom; Nordic Cochrane Centre, Copenhagen, Denmark; and Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom. Finally, they thank the 4 anonymous reviewers who provided helpful comments on a previous draft of this paper.

Grant Support: The workshop was funded by the European Science Foundation. Additional funding was received from the Medical Research Council Health Services Research Collaboration and the National Health Services Research & Development Methodology Programme. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Potential Financial Conflicts of Interest: None disclosed.

Requests for Single Reprints: Matthias Egger, MD, Institute of Social and Preventive Medicine, Finkenhubelweg 11, CH-3012 Bern, Switzerland; e-mail, strobe@ispm.unibe.ch.

Current Author Addresses: Dr. Vandenbroucke: Department of Clinical Epidemiology, Leiden University Medical Center, PO Box 9600, 2300 RC Leiden, the Netherlands.

Drs. von Elm and Egger: University of Bern, Institute of Social and Preventive Medicine, Finkenhubelweg 11, CH-3012 Bern, Switzerland.

Dr. Altman: Centre for Statistics in Medicine, Wolfson College Annexe, Linton Road, Oxford OX2 6UD, United Kingdom.

Dr. Gøtzsche: The Nordic Cochrane Centre, Rigshospitalet, Department 7112, Blegdamsvej 9, DK-2100 Copenhagen Ø, Denmark.

Dr. Mulrow: American College of Physicians, 190 N. Independence Mall West, Philadelphia, PA 19106-1572.

Dr. Pocock: Medical Statistics Unit, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, United Kingdom.

Dr. Poole: Department of Epidemiology, University of North Carolina School of Public Health, Pittsboro Road, Chapel Hill, NC 27599-7435.

Dr. Schlesselman: Biostatistics Facility, University of Pittsburgh Cancer Institute, Sterling Plaza, Suite 325, 201 North Craig Street, Pittsburgh, PA 15213.

Ann Intern Med. 2007;147(8):W-163-W-194. doi:10.7326/0003-4819-147-8-200710160-00010-w1
Text Size: A A A

Much medical research is observational. The reporting of observational studies is often of insufficient quality. Poor reporting hampers the assessment of the strengths and weaknesses of a study and the generalizability of its results. Taking into account empirical evidence and theoretical considerations, a group of methodologists, researchers, and editors developed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) recommendations to improve the quality of reporting of observational studies.The STROBE Statement consists of a checklist of 22 items, which relate to the title, abstract, introduction, methods, results, and discussion sections of articles. Eighteen items are common to cohort studies, case–control studies, and cross-sectional studies, and 4 are specific to each of the 3 study designs. The STROBE Statement provides guidance to authors about how to improve the reporting of observational studies and facilitates critical appraisal and interpretation of studies by reviewers, journal editors, and readers.This explanatory and elaboration document is intended to enhance the use, understanding, and dissemination of the STROBE Statement. The meaning and rationale for each checklist item are presented. For each item, 1 or several published examples and, where possible, references to relevant empirical studies and methodological literature are provided. Examples of useful flow diagrams are also included. The STROBE Statement, this document, and the associated Web site (www.strobe-statement.org) should be helpful resources to improve reporting of observational research.

Editor's Note: In order to encourage dissemination of the STROBE Statement, this article is being published simultaneously in Annals of Internal Medicine, Epidemiology, and PLoS Medicine. It is freely accessible on the Annals of Internal Medicine Web site (www.annals.org) and will also be published on the Web sites of Epidemiology and PLoS Medicine. The authors jointly hold the copyright of this article. For details on further use, see the STROBE Web site (www.strobe-statement.org).

Rational health care practices require knowledge about the etiology and pathogenesis, diagnosis, prognosis, and treatment of diseases. Randomized trials provide valuable evidence about treatments and other interventions. However, much of clinical or public health knowledge comes from observational research (1). About 9 of 10 research papers published in clinical specialty journals describe observational research (23).

Reporting of observational research is often not detailed and clear enough to assess the strengths and weaknesses of the investigation (45). To improve the reporting of observational research, we developed a checklist of items that should be addressed: the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement (Appendix Table). Items relate to the title, abstract, introduction, methods, results, and discussion sections of articles. The STROBE Statement has recently been published in several journals (6). Our aim is to ensure clear presentation of what was planned, done, and found in an observational study. We stress that the recommendations are not prescriptions for setting up or conducting studies, nor do they dictate methodology or mandate a uniform presentation.

Table Jump PlaceholderAppendix Table.  The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: Checklist of Items That Should Be Addressed in Reports of Observational Studies

STROBE provides general reporting recommendations for descriptive observational studies and studies that investigate associations between exposures and health outcomes. STROBE addresses the 3 main types of observational studies: cohort, case–control, and cross-sectional studies. Authors use diverse terminology to describe these study designs. For instance, “follow-up study” and “longitudinal study” are used as synonyms for “cohort study,” and “prevalence study” as a synonym for “cross-sectional study.” We chose the present terminology because it is in common use. Unfortunately, terminology is often used incorrectly (7) or imprecisely (8). In Box 1, we describe the hallmarks of the 3 study designs.

Observational studies serve a wide range of purposes, from reporting a first hint of a potential cause of a disease to verifying the magnitude of previously reported associations. Ideas for studies may arise from clinical observations or from biological insight. Ideas may also arise from informal looks at data that lead to further explorations. Like a clinician who has seen thousands of patients and notes 1 that strikes her attention, the researcher may note something special in the data. Adjusting for multiple looks at the data may not be possible or desirable (9), but further studies to confirm or refute initial observations are often needed (10). Existing data may be used to examine new ideas about potential causal factors, and may be sufficient for rejection or confirmation. In other instances, studies follow that are specifically designed to overcome potential problems with previous reports. The latter studies will gather new data and will be planned for that purpose, in contrast to analyses of existing data. This leads to diverse viewpoints, for example, on the merits of looking at subgroups or the importance of a predetermined sample size. STROBE tries to accommodate these diverse uses of observational research—from discovery to refutation or confirmation. Where necessary, we will indicate in what circumstances specific recommendations apply.

This paper is linked to the shorter STROBE paper that introduced the items of the checklist in several journals (6), and forms an integral part of the STROBE Statement. Our intention is to explain how to report research well, not how research should be done. We offer a detailed explanation for each checklist item. Each explanation is preceded by an example of what we consider transparent reporting. This does not mean that the study from which the example was taken was uniformly well reported or well done; nor does it mean that its findings were reliable, in the sense that they were later confirmed by others: It only means that this particular item was well reported in that study. In addition to explanations and examples, we included boxes with supplementary information. These are intended for readers who want to refresh their memories about some theoretical points or be quickly informed about technical background details. A full understanding of these points may require studying the textbooks or methodological papers that are cited.

STROBE recommendations do not specifically address topics, such as genetic linkage studies, infectious disease modeling, or case reports and case series (1112). As many of the key elements in STROBE apply to these designs, authors who report such studies may nevertheless find our recommendations useful. For authors of observational studies that specifically address diagnostic tests, tumor markers, and genetic associations, STARD (13), REMARK (14), and STREGA (15) recommendations may be particularly useful.

We now discuss and explain the 22 items in the STROBE checklist (Appendix Table) and give published examples for each item. Some examples have been edited by removing citations or spelling out abbreviations. Eighteen items apply to all 3 study designs, whereas 4 are design-specific. Starred items (for example, item 8*) indicate that the information should be given separately for cases and controls in case–control studies, or exposed and unexposed groups in cohort and cross-sectional studies. We advise authors to address all items somewhere in their paper, but we do not prescribe a precise location or order. For instance, we discuss the reporting of results under a number of separate items, while recognizing that authors might address several items within a single section of text or in a table.

Title and Abstract

1(a) Indicate the study's design with a commonly used term in the title or the abstract.


“Leukaemia incidence among workers in the shoe and boot manufacturing industry: a case–control study” (18).


Readers should be able to easily identify the design that was used from the title or abstract. An explicit, commonly used term for the study design also helps ensure correct indexing of articles in electronic databases (1920).

1(b) Provide in the abstract an informative and balanced summary of what was done and what was found.


“Background: The expected survival of HIV-infected patients is of major public health interest.

Objective: To estimate survival time and age-specific mortality rates of an HIV-infected population compared with that of the general population.

Design: Population-based cohort study.

Setting: All HIV-infected persons receiving care in Denmark from 1995 to 2005.

Patients: Each member of the nationwide Danish HIV Cohort Study was matched with as many as 99 persons from the general population according to sex, date of birth, and municipality of residence.

Measurements: The authors computed Kaplan–Meier life tables with age as the time scale to estimate survival from age 25 years. Patients with HIV infection and corresponding persons from the general population were observed from the date of the patient's HIV diagnosis until death, emigration, or 1 May 2005.

Results: 3990 HIV-infected patients and 379 872 persons from the general population were included in the study, yielding 22 744 (median, 5.8 y/person) and 2 689 287 (median, 8.4 y/person) person-years of observation. Three percent of participants were lost to follow-up. From age 25 years, the median survival was 19.9 years (95% CI, 18.5 to 21.3) among patients with HIV infection and 51.1 years (CI, 50.9 to 51.5) among the general population. For HIV-infected patients, survival increased to 32.5 years (CI, 29.4 to 34.7) during the 2000 to 2005 period. In the subgroup that excluded persons with known hepatitis C coinfection (16%), median survival was 38.9 years (CI, 35.4 to 40.1) during this same period. The relative mortality rates for patients with HIV infection compared with those for the general population decreased with increasing age, whereas the excess mortality rate increased with increasing age.

Limitations: The observed mortality rates are assumed to apply beyond the current maximum observation time of 10 years.

Conclusions: The estimated median survival is more than 35 years for a young person diagnosed with HIV infection in the late highly active antiretroviral therapy era. However, an ongoing effort is still needed to further reduce mortality rates for these persons compared with the general population” (21).


The abstract provides key information that enables readers to understand a study and decide whether to read the article. Typical components include a statement of the research question, a short description of methods and results, and a conclusion (22). Abstracts should summarize key details of studies and should only present information that is provided in the article. We advise presenting key results in a numerical form that includes numbers of participants, estimates of associations, and appropriate measures of variability and uncertainty (for example, odds ratios with confidence intervals). We regard it insufficient to state only that an exposure is or is not significantly associated with an outcome.

A series of headings pertaining to the background, design, conduct, and analysis of a study may help readers acquire the essential information rapidly (23). Many journals require such structured abstracts, which tend to be of higher quality and more readily informative than unstructured summaries (2425).


The Introduction section should describe why the study was done and what questions and hypotheses it addresses. It should allow others to understand the study's context and judge its potential contribution to current knowledge.

2 Background/rationale: Explain the scientific background and rationale for the investigation being reported.


“Concerns about the rising prevalence of obesity in children and adolescents have focused on the well-documented associations between childhood obesity and increased cardiovascular risk and mortality in adulthood. Childhood obesity has considerable social and psychological consequences within childhood and adolescence, yet little is known about social, socioeconomic, and psychological consequences in adult life. A recent systematic review found no longitudinal studies on the outcomes of childhood obesity other than physical health outcomes and only two longitudinal studies of the socioeconomic effects of obesity in adolescence. Gortmaker et al. found that US women who had been obese in late adolescence in 1981 were less likely to be married and had lower incomes seven years later than women who had not been overweight, while men who had been overweight were less likely to be married. Sargent et al. found that UK women, but not men, who had been obese at 16 years in 1974 earned 7.4% less than their nonobese peers at age 23.…We used longitudinal data from the 1970 British birth cohort to examine the adult socioeconomic, educational, social, and psychological outcomes of childhood obesity” (26).


The scientific background of the study provides important context for readers. It sets the stage for the study and describes its focus. It gives an overview of what is known on a topic and what gaps in current knowledge are addressed by the study. Background material should note recent pertinent studies and any systematic reviews of pertinent studies.

3 Objectives: State specific objectives, including any prespecified hypotheses.


“Our primary objectives were to 1) determine the prevalence of domestic violence among female patients presenting to four community-based, primary care, adult medicine practices that serve patients of diverse socioeconomic background and 2) identify demographic and clinical differences between currently abused patients and patients not currently being abused” (27).


Objectives are the detailed aims of the study. Well-crafted objectives specify populations, exposures and outcomes, and parameters that will be estimated. They may be formulated as specific hypotheses or as questions that the study was designed to address. In some situations, objectives may be less specific, for example, in early discovery phases. Regardless, the report should clearly reflect the investigators' intentions. For example, if important subgroups or additional analyses were not the original aim of the study but arose during data analysis, they should be described accordingly (see items 4, 17, and 20).


The Methods section should describe what was planned and what was done in sufficient detail to allow others to understand the essential aspects of the study, to judge whether the methods were adequate to provide reliable and valid answers, and to assess whether any deviations from the original plan were reasonable.

4 Study design: Present key elements of study design early in the paper.


“We used a case-crossover design, a variation of a case–control design that is appropriate when a brief exposure (driver's phone use) causes a transient rise in the risk of a rare outcome (a crash). We compared a driver's use of a mobile phone at the estimated time of a crash with the same driver's use during another suitable time period. Because drivers are their own controls, the design controls for characteristics of the driver that may affect the risk of a crash but do not change over a short period of time. As it is important that risks during control periods and crash trips are similar, we compared phone activity during the hazard interval (time immediately before the crash) with phone activity during control intervals (equivalent times during which participants were driving but did not crash) in the previous week” (28).


We advise presenting key elements of study design early in the methods section (or at the end of the introduction) so that readers can understand the basics of the study. For example, authors should indicate that the study was a cohort study, which followed people over a particular time period, and describe the group of persons that comprised the cohort and their exposure status. Similarly, if the investigation used a case–control design, the cases and controls and their source population should be described. If the study was a cross-sectional survey, the population and the point in time at which the cross-section was taken should be mentioned. When a study is a variant of the 3 main study types, there is an additional need for clarity. For instance, for a case-crossover study, 1 of the variants of the case–control design, a succinct description of the principles was given in the example above (28).

We recommend that authors refrain from simply calling a study “prospective” or “retrospective,” because these terms are ill defined (29). One usage sees cohort and prospective as synonymous and reserves the word retrospective for case–control studies (30). A second usage distinguishes prospective and retrospective cohort studies according to the timing of data collection relative to when the idea for the study was developed (31). A third usage distinguishes prospective and retrospective case–control studies depending on whether the data about the exposure of interest existed when cases were selected (32). Some advise against using these terms (33), or adopting the alternatives “concurrent” and “historical” for describing cohort studies (34). In STROBE, we do not use the words prospective and retrospective or alternatives, such as concurrent and historical. We recommend that, whenever authors use these words, they define what they mean. Most importantly, we recommend that authors describe exactly how and when data collection took place.

The first part of the methods section might also be the place to mention whether the report is 1 of several from a study. If a new report is in line with the original aims of the study, this is usually indicated by referring to an earlier publication and by briefly restating the salient features of the study. However, the aims of a study may also evolve over time. Researchers often use data for purposes for which they were not originally intended, including, for example, official vital statistics that were collected primarily for administrative purposes, items in questionnaires that originally were only included for completeness, or blood samples that were collected for another purpose. For example, the Physicians' Health Study, a randomized controlled trial of aspirin and carotene, was later used to demonstrate that a point mutation in the factor V gene was associated with an increased risk of venous thrombosis, but not of myocardial infarction or stroke (35). The secondary use of existing data is a creative part of observational research and does not necessarily make results less credible or less important. However, briefly restating the original aims might help readers understand the context of the research and possible limitations in the data.

5 Setting: Describe the setting, locations, and relevant dates, including periods of recruitment, exposure, follow-up, and data collection.


“The Pasitos Cohort Study recruited pregnant women from Women, Infant, and Child clinics in Socorro and San Elizario, El Paso County, Texas and maternal-child clinics of the Mexican Social Security Institute in Ciudad Juarez, Mexico from April 1998 to October 2000. At baseline, prior to the birth of the enrolled cohort children, staff interviewed mothers regarding the household environment. In this ongoing cohort study, we target follow-up exams at 6-month intervals beginning at age 6 months” (36).


Readers need information on setting and locations to assess the context and generalizability of a study's results. Exposures, such as environmental factors and therapies, can change over time. Also, study methods may evolve over time. Knowing when a study took place and over what period participants were recruited and followed up places the study in historical context and is important for the interpretation of results.

Information about setting includes recruitment sites or sources (for example, electoral roll, outpatient clinic, cancer registry, or tertiary care center). Information about location may refer to the countries, towns, hospitals, or practices where the investigation took place. We advise stating dates rather than only describing the length of time periods. There may be different sets of dates for exposure, disease occurrence, recruitment, beginning and end of follow-up, and data collection. Of note, nearly 80% of 132 reports in oncology journals that used survival analysis included the starting and ending dates for accrual of patients, but only 24% also reported the date on which follow-up ended (37).

6 Participants:

6(a) Cohort study: Give the eligibility criteria, and the sources and methods of selection of participants. Describe methods of follow-up.


“Participants in the Iowa Women's Health Study were a random sample of all women ages 55 to 69 years derived from the state of Iowa automobile driver's license list in 1985, which represented approximately 94% of Iowa women in that age group.…Follow-up questionnaires were mailed in October 1987 and August 1989 to assess vital status and address changes.…Incident cancers, except for nonmelanoma skin cancers, were ascertained by the State Health Registry of Iowa…. The Iowa Women's Health Study cohort was matched to the registry with combinations of first, last, and maiden names, zip code, birth date, and social security number” (38).

6(a) Case–control study: Give the eligibility criteria, and the sources and methods of case ascertainment and control selection. Give the rationale for the choice of cases and controls.


“Cutaneous melanoma cases diagnosed in 1999 and 2000 were ascertained through the Iowa Cancer Registry…. Controls, also identified through the Iowa Cancer Registry, were colorectal cancer patients diagnosed during the same time. Colorectal cancer controls were selected because they are common and have a relatively long survival, and because arsenic exposure has not been conclusively linked to the incidence of colorectal cancer” (39).

6(a) Cross-sectional study: Give the eligibility criteria, and the sources and methods of selection of participants.


“We retrospectively identified patients with a principal diagnosis of myocardial infarction (code 410) according to the International Classification of Diseases, 9th Revision, Clinical Modification, from codes designating discharge diagnoses, excluding the codes with a fifth digit of 2, which designates a subsequent episode of care…A random sample of the entire Medicare cohort with myocardial infarction from February 1994 to July 1995 was selected…To be eligible, patients had to present to the hospital after at least 30 minutes but less than 12 hours of chest pain and had to have ST-segment elevation of at least 1 mm on 2 contiguous leads on the initial electrocardiogram” (40).


Detailed descriptions of the study participants help readers understand the applicability of the results. Investigators usually restrict a study population by defining clinical, demographic, and other characteristics of eligible participants. Typical eligibility criteria relate to age, gender, diagnosis, and comorbid conditions. Despite their importance, eligibility criteria often are not reported adequately. In a survey of observational stroke research, 17 of 49 reports (35%) did not specify eligibility criteria (5).

Eligibility criteria may be presented as inclusion and exclusion criteria, although this distinction is not always necessary or useful. Regardless, we advise authors to report all eligibility criteria and also to describe the group from which the study population was selected (for example, the general population of a region or country) and the method of recruitment (for example, referral or self-selection through advertisements).

Knowing details about follow-up procedures, including whether procedures minimized nonresponse and loss to follow-up and whether the procedures were similar for all participants, informs judgments about the validity of results. For example, in a study that used IgM antibodies to detect acute infections, readers needed to know the interval between blood tests for IgM antibodies so that they could judge whether some infections likely were missed because the interval between blood tests was too long (41). In other studies where follow-up procedures differed between exposed and unexposed groups, readers might recognize substantial bias due to unequal ascertainment of events or differences in nonresponse or loss to follow-up (42). Accordingly, we advise that researchers describe the methods used for following participants and whether those methods were the same for all participants, and that they describe the completeness of ascertainment of variables (see also item 14).

In case–control studies, the choice of cases and controls is crucial to interpreting the results, and the method of their selection has major implications for study validity. In general, controls should reflect the population from which the cases arose. Various methods are used to sample controls, all with advantages and disadvantages. For cases that arise from a general population, population roster sampling, random-digit dialing, or neighborhood or friend controls are used. Neighborhood or friend controls may present intrinsic matching on exposure (17). Controls with other diseases may have advantages over population-based controls, in particular for hospital-based cases, because they better reflect the catchment population of a hospital and have greater comparability of recall and ease of recruitment. However, they can present problems if the exposure of interest affects the risk of developing or being hospitalized for the control condition(s) (4344). To remedy this problem, often a mixture of the best defensible control diseases is used (45).

6(b) Cohort study: For matched studies, give matching criteria and number of exposed and unexposed.


“For each patient who initially received a statin, we used propensity-based matching to identify 1 control who did not receive a statin according to the following protocol. First, propensity scores were calculated for each patient in the entire cohort on the basis of an extensive list of factors potentially related to the use of statins or the risk of sepsis. Second, each statin user was matched to a smaller pool of nonstatin users by sex, age (plus or minus 1 year), and index date (plus or minus 3 months). Third, we selected the control with the closest propensity score (within 0.2 SD) to each statin user in a 1:1 fashion and discarded the remaining controls” (46).

6(b) Case–control study: For matched studies, give matching criteria and the number of controls per case.


“We aimed to select 5 controls for every case from among individuals in the study population who had no diagnosis of autism or other pervasive developmental disorders (PDD) recorded in their general practice record and who were alive and registered with a participating practice on the date of the PDD diagnosis in the case. Controls were individually matched to cases by year of birth (up to 1 year older or younger), sex, and general practice. For each of 300 cases, 5 controls could be identified who met all the matching criteria. For the remaining 994, 1 or more controls was excluded…” (47).


Matching is much more common in case–control studies, but occasionally, investigators use matching in cohort studies to make groups comparable at the start of follow-up. Matching in cohort studies makes groups directly comparable for potential confounders and presents fewer intricacies than with case–control studies. For example, it is not necessary to take the matching into account for the estimation of the relative risk (48). Because matching in cohort studies may increase statistical precision, investigators might allow for the matching in their analyses and thus obtain narrower confidence intervals.

In case–control studies, matching is done to increase a study's efficiency by ensuring similarity in the distribution of variables between cases and controls, in particular the distribution of potential confounding variables (4849). Because matching can be done in various ways, with 1 or more controls per case, the rationale for the choice of matching variables and the details of the method used should be described. Commonly used forms of matching are frequency matching (also called group matching) and individual matching. In frequency matching, investigators choose controls so that the distribution of matching variables becomes identical or similar to that of cases. Individual matching involves matching 1 or several controls to each case. Although intuitively appealing and sometimes useful, matching in case–control studies has a number of disadvantages, is not always appropriate, and needs to be taken into account in the analysis (see Box 2).

Even apparently simple matching procedures may be poorly reported. For example, authors may state that controls were matched to cases “within 5 years,” or using “5-year age bands.” Does this mean that, if a case was 54 years old, the respective control needed to be in the 5-year age band 50 to 54, or aged 49 to 59, which is within 5 years of age 54? If a wide (for example, 10-year) age band is chosen, there is a danger of residual confounding by age (see Box 4), for example, because controls may then be younger than cases on average.

7 Variables: Clearly define all outcomes, exposures, predictors, potential confounders, and effect modifiers. Give diagnostic criteria, if applicable.


“Only major congenital malformations were included in the analyses. Minor anomalies were excluded according to the exclusion list of European Registration of Congenital Anomalies (EUROCAT). If a child had more than 1 major congenital malformation of 1 organ system, those malformations were treated as 1 outcome in the analyses by organ system…In the statistical analyses, factors considered potential confounders were maternal age at delivery and number of previous parities. Factors considered potential effect modifiers were maternal age at reimbursement for antiepileptic medication and maternal age at delivery” (55).


Authors should define all variables considered for and included in the analysis, including outcomes, exposures, predictors, potential confounders, and potential effect modifiers. Disease outcomes require adequately detailed description of the diagnostic criteria. This applies to criteria for cases in a case–control study, disease events during follow-up in a cohort study, and prevalent disease in a cross-sectional study. Clear definitions and steps taken to adhere to them are particularly important for any disease condition of primary interest in the study.

For some studies, “determinant” or “predictor” may be appropriate terms for exposure variables and outcomes may be called “end points.” In multivariable models, authors sometimes use “dependent variable” for an outcome and “independent variable” or “explanatory variable” for exposure and confounding variables. The latter is not precise, as it does not distinguish exposures from confounders.

If many variables have been measured and included in exploratory analyses in an early discovery phase, consider providing a list with details on each variable in an appendix, additional table, or separate publication. Of note, the International Journal of Epidemiology recently launched a new section with “cohort profiles,” that includes detailed information on what was measured at different points in time in particular studies (5657). Finally, we advise that authors declare all “candidate variables” considered for statistical analysis, rather than selectively reporting only those included in the final models (see item 16a) (5859).

8 Data sources/measurement: For each variable of interest, give sources of data and details of methods of assessment (measurement). Describe comparability of assessment methods if there is more than one group.

Example 1

“Total caffeine intake was calculated primarily using U.S. Department of Agriculture food composition sources. In these calculations, it was assumed that the content of caffeine was 137 mg per cup of coffee, 47 mg per cup of tea, 46 mg per can or bottle of cola beverage, and 7 mg per serving of chocolate candy. This method of measuring (caffeine) intake was shown to be valid in both the NHS I cohort and a similar cohort study of male health professionals…Self-reported diagnosis of hypertension was found to be reliable in the NHS I cohort” (60).

Example 2

“Samples pertaining to matched cases and controls were always analyzed together in the same batch and laboratory personnel were unable to distinguish among cases and controls” (61).


The way in which exposures, confounders, and outcomes were measured affects the reliability and validity of a study. Measurement error and misclassification of exposures or outcomes can make it more difficult to detect cause–effect relationships, or may produce spurious relationships. Error in measurement of potential confounders can increase the risk of residual confounding (6263). It is helpful, therefore, if authors report the findings of any studies of the validity or reliability of assessments or measurements, including details of the reference standard that was used. Rather than simply citing validation studies (as in the first example), we advise that authors give the estimated validity or reliability, which can then be used for measurement error adjustment or sensitivity analyses (see items 12e and 17).

In addition, it is important to know if groups being compared differed with respect to the way in which the data were collected. This may be important for laboratory examinations (as in the second example) and other situations. For instance, if an interviewer first questions all the cases and then the controls, or vice versa, bias is possible because of the learning curve; solutions such as randomizing the order of interviewing may avoid this problem. Information bias may also arise if the compared groups are not given the same diagnostic tests or if 1 group receives more tests of the same kind than another (see item 9).

9 Bias: Describe any efforts to address potential sources of bias.

Example 1

“In most case–control studies of suicide, the control group comprises living individuals, but we decided to have a control group of people who had died of other causes…. With a control group of deceased individuals, the sources of information used to assess risk factors are informants who have recently experienced the death of a family member or close associate—and are therefore more comparable to the sources of information in the suicide group than if living controls were used” (64).

Example 2

“Detection bias could influence the association between Type 2 diabetes mellitus (T2DM) and primary open-angle glaucoma (POAG) if women with T2DM were under closer ophthalmic surveillance than women without this condition. We compared the mean number of eye examinations reported by women with and without diabetes. We also recalculated the relative risk for POAG with additional control for covariates associated with more careful ocular surveillance (a self-report of cataract, macular degeneration, number of eye examinations, and number of physical examinations)” (65).


Biased studies produce results that differ systematically from the truth (see Box 3). It is important for a reader to know what measures were taken during the conduct of a study to reduce the potential of bias. Ideally, investigators carefully consider potential sources of bias when they plan their study. At the stage of reporting, we recommend that authors always assess the likelihood of relevant biases. Specifically, the direction and magnitude of bias should be discussed and, if possible, estimated. For instance, in case–control studies, information bias can occur, but may be reduced by selecting an appropriate control group, as in the first example (64). Differences in the medical surveillance of participants were a problem in the second example (65). Consequently, the authors provide more detail about the additional data they collected to tackle this problem. When investigators have set up quality control programs for data collection to counter a possible “drift” in measurements of variables in longitudinal studies, or to keep variability at a minimum when multiple observers are used, these should be described.

Unfortunately, authors often do not address important biases when reporting their results. Among 43 case–control and cohort studies published from 1990 to 1994 that investigated the risk of second cancers in patients with a history of cancer, medical surveillance bias was mentioned in only 5 articles (66). A survey of reports of mental health research published during 1998 in 3 psychiatric journals found that only 13% of 392 articles mentioned response bias (67). A survey of cohort studies in stroke research found that 14 of 49 (28%) articles published from 1999 to 2003 addressed potential selection bias in the recruitment of study participants and 35 (71%) mentioned the possibility that any type of bias may have affected results (5).

10 Study size: Explain how the study size was arrived at.

Example 1

“The number of cases in the area during the study period determined the sample size” (73).

Example 2

“A survey of postnatal depression in the region had documented a prevalence of 19.8%. Assuming depression in mothers with normal-weight children to be 20% and an odds ratio of 3 for depression in mothers with a malnourished child, we needed 72 case–control sets (1 case to 1 control) with an 80% power and 5% significance” (74).


A study should be large enough to obtain a point estimate with a sufficiently narrow confidence interval to meaningfully answer a research question. Large samples are needed to distinguish a small association from no association. Small studies often provide valuable information, but wide confidence intervals may indicate that they contribute less to current knowledge in comparison with studies providing estimates with narrower confidence intervals. Also, small studies that show “interesting” or “statistically significant” associations are published more frequently than small studies that do not have “significant” findings. While these studies may provide an early signal in the context of discovery, readers should be informed of their potential weaknesses.

The importance of sample size determination in observational studies depends on the context. If an analysis is performed on data that were already available for other purposes, the main question is whether the analysis of the data will produce results with sufficient statistical precision to contribute substantially to the literature, and sample size considerations will be informal. Formal, a priori calculation of sample size may be useful when planning a new study (7576). Such calculations are associated with more uncertainty than is implied by the single number that is generally produced. For example, estimates of the rate of the event of interest or other assumptions central to calculations are commonly imprecise, if not guesswork (77). The precision obtained in the final analysis can often not be determined beforehand because it will be reduced by inclusion of confounding variables in multivariable analyses (78), the degree of precision with which key variables can be measured (79), and the exclusion of some individuals.

Few epidemiologic studies explain or report deliberations about sample size (45). We encourage investigators to report pertinent formal sample size calculations if they were done. In other situations, they should indicate the considerations that determined the study size (for example, a fixed available sample, as in the first example above). If the observational study was stopped early when statistical significance was achieved, readers should be told. Do not bother readers with post hoc justifications for study size or retrospective power calculations (77). From the point of view of the reader, confidence intervals indicate the statistical precision that was ultimately obtained. It should be realized that confidence intervals reflect statistical uncertainty only, and not all uncertainty that may be present in a study (see item 20).

11 Quantitative variables: Explain how quantitative variables were handled in the analyses. If applicable, describe which groupings were chosen, and why.


“Patients with a Glasgow Coma Scale less than 8 are considered to be seriously injured. A GCS of 9 or more indicates less serious brain injury. We examined the association of GCS in these two categories with the occurrence of death within 12 months from injury” (80).