0
Ideas and Opinions |

Stopping at Nothing? Some Dilemmas of Data Monitoring in Clinical Trials FREE

Steven N. Goodman, MD, MHS, PhD
[+] Article and Author Information

From Johns Hopkins University Schools of Medicine and Public Health, Baltimore, Maryland.


Disclaimer: The views and content herein are solely the responsibility of the author.

Acknowledgments: The author thanks Drs. Donald Berry, Thomas Louis, and Joel Greenhouse for their helpful comments on earlier versions of the manuscript.

Potential Financial Conflicts of Interest: None disclosed.

Requests for Single Reprints: Steven N. Goodman, MD, MHS, PhD, Johns Hopkins University Schools of Medicine and Public Health, 550 North Broadway, Suite 1103, Baltimore, MD 21205; e-mail, sgoodman@jhmi.edu.


Ann Intern Med. 2007;146(12):882-887. doi:10.7326/0003-4819-146-12-200706190-00010
Text Size: A A A

This commentary reviews the argument that clinical trials with data monitoring committees that use statistical stopping guidelines should generally not be stopped early for large observed efficacy differences because efficacy estimates may be exaggerated and there is minimal information on treatment harms. Overall, the average of estimates from trials that use these boundaries differs minimally from the true value. Estimates from a given trial that seem implausibly high can be moderated by using Bayesian methods. Data monitoring committees are not ethically required to precisely estimate a large efficacy difference if that difference differs convincingly from zero, and the requirement to detect harms and balance efficacy against harm depends on whether the nature of the harm is known or unknown before the trial.

The ethics of randomized, controlled trials (RCTs) are complex (12). Researchers must justify that the benefits of their research to society are commensurate with the risks or burdens placed on research participants. They must provide this justification before trial onset and, with interim data monitoring, recalibrate it during the trial. Society grants clinical researchers a special privilege to conduct experiments; society can revoke that privilege at any hint that the rights or interests of research participants are not being fully valued, even when the suspension of research activities causes demonstrable harm (3). The societal dispensation to do clinical experimentation is a fragile one, protected by a cocoon of oversight but ultimately based on trust that patient interests will not be unduly sacrificed on the altar of societal benefit.

Data monitoring committees (DMCs) play a central role in ensuring that individual and societal interests are balanced correctly as a trial progresses (4). Accumulating data can alter the risk–benefit balance in any particular trial, requiring that the “best laid plans” in the trial design be modified in light of that new information. Data monitoring brings clinical investigators face to face with a central dilemma of clinical trials, sitting at the intersection of ethics, statistics, and epistemology: When have we learned enough? This is an extraordinarily difficult question, as scientists will differ in their assessment of both how much we have learned and how much we need to learn. There is no clear ethical guidance on the matter; a utilitarian perspective will put more weight on the fate of future patients, whereas ethical theories that place more value on obligations and individual dignity will favor the interests of patients in the trial (5).

In this issue, Mueller and colleagues (6) enter this debate squarely in the utilitarian camp, arguing forcefully that the primary purpose of a trial is to get an accurate assessment of the risks and benefits associated with a given treatment. This is a desirable aim, but it is not the goal enshrined in the traditional hypothesis–test framework of study design; rather, that goal is to decide which treatment is more efficacious, with statistical control over how often false-positive and false-negative conclusions are made (7). As we shall see in the ensuing discussion, the goals of error control and accurate estimation can sometimes be in direct conflict.

Much of Mueller and colleagues' argument rests on claims that trials that stop early for efficacy produce efficacy estimates that are biased, that is, on average higher than the true effect, and they go so far as to declare such results as scientifically invalid. Using bias to judge a stopping rule is akin to moving the goal posts; most stopping rules are not designed to optimize estimation or eliminate bias. However, because accurate assessment of efficacy (and risk) is a worthy scientific goal, it is of interest to see how a trial that exactly follows the dictates of a statistical stopping rule would do on that score.

Mueller and colleagues focus on bias in the effect measure itself, whereas evaluation of estimation bias usually incorporates uncertainty by calculating how often the CI around the estimate includes the true value (8). But because inordinate emphasis is often placed on the observed effect estimate without consideration of the full range of the CI, bias in the effect estimate is of some interest. Claims about bias should be based on all estimates that arise from trials that use stopping guidelines, not just trials that are stopped early. If one considers all such outcomes, bias from trials that use conventional stopping guidelines is small (911); such trials do not greatly overstate the effects that they aim to measure. It is therefore reasonable to accept an estimate from such a trial as a valid estimate of effect.

It is also true that the estimates of effect from trials that have stopped early for efficacy tend to be higher than the true value (1012). How do we reconcile this with the previous claim of minimal bias? The same phenomenon is seen in trials with fixed sample sizes, which cannot be stopped on the basis of an observed efficacy difference and are indubitably unbiased. The key insight is that a trial that has been stopped early for efficacy is by definition statistically significant, usually highly so. If one takes just the significant results in one direction from any set of trial results, their average will necessarily be higher than the average of the whole set. The higher the significance, the larger that difference, and the smaller the sample size of the significant trials, the larger still.

Consider the simple example of a trial with a fixed sample size, designed to detect (with 80% power) a 10% reduction in mortality (from 50% to 40%), and assume that the true difference is indeed 10%. The observed risk difference from such a trial is an unbiased estimate of the true difference, but the average of results with P ≤ 0.05 is 11.2%, a 12% proportional overstatement. Looking at smaller P values typically used for early stopping, the average of results with P ≤ 0.01 is 12.4%, and of those with P ≤ 0.001, it is 14%—a 40% overstatement of effect. But we only know whether any given significant estimate is an overestimate when we already know the underlying truth. When we don't, we must fall back on the properties of the estimation procedure; if its properties (which could include unbiasedness) are good, we accept the estimate as a valid.

If we looked at small fixed-size trials that are underpowered for the true 10% difference, the overstatement would be even more dramatic. Thus, an empirical study of highly significant, unbiased RCTs would be expected to find a “bias” similar to that reported in a study of trials stopped early for efficacy (13), and the exact degree of overstatement would depend on the sizes of the trials studied. This is of concern only if we are not told of the trials with nonsignificant results (producing publication bias) or if more weight is put on significant findings than nonsignificant ones, which is an important sociologic problem but not a scientific one.

It is instructive to examine the full distribution of outcomes that can occur in a trial that follows a stopping rule. We will use the same trial design described earlier, but with 90% power, and we will apply the often-used O'Brien–Fleming stopping boundary (14) on a 1-sided basis, with 3 interim looks (4 overall, including the last one). This design requires very low P values to stop early, but a value close to 0.05 in the last stage. The stopping boundary, with the associated P value thresholds, is as follows:

Look 1: 23% difference (P ≤ 4 × 105)

Look 2: 12% difference (P ≤ 0.004)

Look 3: 8% difference (P ≤ 0.018)

Look 4: 6% difference (P ≤ 0.041)

The Figure shows the distribution of effect sizes, based on simulations of 100 000 trials, observed in this trial (“stopping”) compared with a fixed-sample size trial (“no stopping”) under 3 different scenarios; the true difference is 0%, 10%, or 20%. As expected, in the “no stopping” situation, the results are symmetrically distributed around the true value. In the trials monitored with the O'Brien–Fleming rule, the distribution of observed effects has a strange shape because of the possibility of stopping at the various interim looks. It is more spread out because the early stops produce smaller samples sizes and more variable estimates. The medians (and means) of each distribution are slightly higher but still very close to the true values (Figure). The most extreme results come from the first interim look with the smallest sample size. Those estimates have the widest CIs, and if the full CI was used in the interpretation, the estimated effect size would be less important. These large estimates occur most frequently when the true difference is large, in which the case for stopping a trial becomes compelling even if we are less sure exactly how big the benefit is.

Grahic Jump Location
Figure.
Distribution of observed effects in trials with and without stopping rules.

The trials were designed to have 90% power to detect a 10% mortality benefit (for example, 50% vs. 40%). Each panel corresponds to a different underlying true difference: no difference (top), 10% difference (middle), and 20% difference (bottom). The distribution of results is shown for trials of 2 designs: 1 using a 4-look O'Brien–Fleming stopping rule (“stopping”) and 1 using a fixed sample size (“no stopping”). Median effect size and 2.5% and 97.5% percentiles of each estimate are reported in parentheses. The mean sample size is reported for the “stopping” trial only: n = 1040 for the fixed sample size design.

Grahic Jump Location

The curves in the Figure assume a known true value for the true effect, showing that the average estimates from trials that use this stopping rule are quite near the truth. The estimate from a trial following a stopping guideline is therefore a pretty good guess if we do not know anything about the true value. But in RCTs, we almost always know something external to the results that may help us judge whether a large observed effect is an overestimate. External evidence includes other research on the treatment in question, findings of related RCTs, evidence supporting the proposed mechanism of effect, studies of other therapies for the same condition, and the design and execution of the trial itself (15). These can be used to construct a distribution of effect sizes that might a priori be considered plausible in a given trial.

Mueller and colleagues implicitly acknowledge the importance of external evidence when they describe the most concerning trials that were stopped early as those with findings that are “implausible” and that require “astute clinicians” to make an appropriate interpretation. For a result to be implausible or surprising, there must be prior evidence that led to a different expectation from what was observed. Conventional approaches to statistical inference do not formally incorporate prior evidence. Bayesian methods do, and they can clarify the issues posed by Mueller and colleagues.

Presume in the previous example that investigators found a 30% improvement in the mortality rate at the first interim look. This exceeds the 23% boundary set for the first look and might be regarded as surprising and perhaps implausible. A Bayesian approach encodes the a priori plausible range of results in the form of a prior probability distribution. Suppose that prior evidence indicated that the 10% difference used in the sample size calculation was the expected effect, with the 95% plausible range extending from 7% harm to 25% benefit. A Bayesian estimate that combined this prior distribution with the observed data would yield an estimate of a 23% difference (95% credible interval, 14% to 31%)—substantially less than and almost excluding the observed effect of 30%. This moderated estimate could be reported as the investigators' best guess of the true difference, much lower than the effect observed but still different from zero. There are also non-Bayesian approaches to adjusting the point estimate, but they are complex, are dependent on the stopping rule, and are not clearly related to subject-specific knowledge (11, 16).

The Bayesian approach formalizes the notion that surprising effects are probably overestimations and provides a more tempered estimate of the true difference (1720). If the main criticism against estimates from trials stopped very early is that the point estimate is exaggerated, that problem can be ameliorated without extending the trial by using Bayesian methods, which formally incorporate evidence-based skepticism that Mueller and colleagues suggest be used informally. In smaller trials, interim boundary points will necessarily represent large effects, but because they are based on relatively few participants, the Bayesian adjustment will be considerable.

The best correction of an implausible observed effect is achieved by combining a trial result with those of other similar trials. Bayesian adjustments are necessary only if other trials don't exist or further trials cannot be justified without the adjustment. A Bayesian correction with a prior distribution based on previous RCTs is mathematically equivalent to a standard meta-analysis (21).

The primary statistical goal of standard methods of clinical trial design is to decide which treatment is better with regard to the trial's primary end point. One goal of data monitoring is to expedite that decision if the data indicate a large difference and of the stopping guideline to do so in a way that will not increase the number of erroneous decisions. If the observed benefit is large, we are willing to estimate it imprecisely (that is, based on fewer patients) if we are confident that the true effect differs from zero. This is where decision and estimation goals can conflict. As one statistician has declared (22):

If reliable estimates are required for each treatment then it seems inevitable that a substantial number of patients must receive the inferior treatment … Then it must be recognized that the risks undertaken by volunteers in the experiment are mainly associated with estimation, rather than the need to discover which of the treatments is superior.

When treatments differ substantially in efficacy, monitored trials can dramatically reduce the number of deaths incurred during the study and speed dissemination of the result (23). The Figure shows us the relevant numbers for the example already cited. The potential reduction in sample size is 25% with a 10% true difference and about 55% when the true treatment difference is 20%. It is very difficult to predict how treatment of patients outside the trial will be affected by a DMC decision. Stopping early (with a larger effect) or late (with a larger sample) might have more impact or no impact at all. A utilitarian perspective can justify either decision.

Mueller and colleagues' position can be recast in a decision-making framework. These authors would like trials to ascertain not merely when relative efficacy is established, but also when relative efficacy is shown to exceed harm. This could be accomplished by defining an end point that combines harm and benefit, testing efficacy and safety separately, or defining a nonzero efficacy threshold that would offset a given degree of harm (2426). This requires that trial designs be able to measure harm, and it could increase sample size requirements by changing the null hypothesis from a zero effect to a nonzero degree of benefit needed to exceed the harm. If we observe a 20% benefit and want to be sure that this is statistically distinguishable not from zero but from a 10% threshold benefit, the required sample size increases substantially with or without stopping rules.

The implications of Mueller and colleagues' proposal to explicitly include treatment harms in data monitoring considerations depends critically on whether, before the study, the nature of the harm is known or unknown and whether this harm is likely to appear during the trial. If the harm is unknown and has not appeared once efficacy is established, the DMC should not be expected to support administration of a less effective therapy because its benefit might be offset by a harm that might be discovered if the trial were continued longer. Society would probably not tolerate that. Unsuspected late or rare adverse effects are often better ascertained through observational studies, continued follow-up of trial participants, postmarketing or outcomes research, or meta-analyses (2728). The capture and reporting of adverse event data in RCTs in general need to be improved so that risk signals can reliably emerge from meta-analyses (2931).

When the nature of the harm is known before the study but its frequency is uncertain, as in anticoagulation and stroke or estrogen therapy and breast cancer, the approach to data monitoring can include the harm (2426, 3233) and the plan can be prospectively discussed with institutional review boards, investigators, DMC members, and the patients themselves. In the Women's Health Initiative, the possibility of asynchronous harm and benefit was built into the stopping guidelines, and extensive preliminary work was done with the DMC to elicit their reactions to different possible observed patterns (3435).

In conclusion, Mueller and colleagues cast their argument in stark terms, such as “scientifically invalid,” “biased,” and “unethical.” A more nuanced view is that RCTs and DMCs must balance many competing and worthwhile medical, statistical, ethical, and social goals, which is why the literature in this area is so rich and why DMC deliberations that have been described are so difficult (3637, 35, 39). The DMCs typically weigh all of the concerns articulated by Mueller and colleagues, and more (36, 4041). As with juries, review panels, and other groups empowered with making difficult decisions (42), the outcomes of such deliberations can always be second-guessed, but algorithmic solutions to improve the process rarely do so. However, as implied by Mueller and colleagues' comments, not all DMCs have a sophisticated understanding of methodological issues and not all function optimally. The number of potential DMC participants with training and experience in the DMC process is relatively small, and efforts to expand that pool are badly needed (43).

This discussion should be viewed as part of a broader debate about the acceptable speed of medical progress. This pace is a conscious social choice that implicitly balances the interests of individuals against that of the broader population (4). If society perceives individual interests to have been excessively compromised for the collective good, investigators risk a social response that can seriously harm both the scientific enterprise and, paradoxically, that collective good (44). How heavy the hand should be on each side of the individual versus collective ethical scale feels personal to DMC members, but it is ultimately a societal choice to be determined through public discussion. Mueller and colleagues' perspective contributes to that discussion and will stimulate yet more conversation among scientists and the public on this critical issue.

Hellman S, Hellman DS.  Of mice but not men. Problems of the randomized clinical trial. N Engl J Med. 1991; 324:1585-9. PubMed
CrossRef
 
Passamani E.  Clinical trials—are they ethical? N Engl J Med. 1991; 324:1589-92. PubMed
 
Greenberg DS.  Johns Hopkins research returns to normal. Lancet. 2001; 358:393. PubMed
 
Slutsky AS, Lavery JV.  Data safety and monitoring boards. N Engl J Med. 2004; 350:1143-7. PubMed
 
Palmer CR, Rosenberger WF.  Ethics and practice: alternative designs for phase III randomized clinical trials. Control Clin Trials. 1999; 20:172-86. PubMed
 
Mueller PS, Montori VM, Bassler D, Koenig BA, Guyatt GH.  Ethical issues in stopping randomized trials early because of apparent benefit. Ann Intern Med. 2007; 146:878-81.
 
Goodman SN.  Toward evidence-based medical statistics. 1: The P value fallacy. Ann Intern Med. 1999; 130:995-1004. PubMed
 
Jennison C, Turnbull BW.  Repeated confidence intervals for group sequential clinical trials. Control Clin Trials. 1984; 5:33-45. PubMed
 
Chang MN, Wieand HS, Chang VT.  The bias of the sample proportion following a group sequential phase II clinical trial. Stat Med. 1989; 8:563-70. PubMed
 
Pocock SJ, Hughes MD.  Practical problems in interim analyses, with particular regard to estimation. Control Clin Trials. 1989; 10:209S-221S. PubMed
 
Fan XF, DeMets DL, Lan KK.  Conditional bias of point estimates following a group sequential test. J Biopharm Stat. 2004; 14:505-30. PubMed
 
Jennison C, Turnbull BW.  Statistical approaches to interim monitoring of medical trials: a review and commentary. Stat Sci. 1999; 5:299-317.
 
Montori VM, Devereaux PJ, Adhikari NK, Burns KE, Eggert CH, Briel M. et al.  Randomized trials stopped early for benefit: a systematic review. JAMA. 2005; 294:2203-9. PubMed
 
O'Brien PC, Fleming TR.  A multiple testing procedure for clinical trials. Biometrics. 1979; 35:549-56. PubMed
 
Wheatley K, Clayton D.  Be skeptical about unexpected large apparent treatment effects: the case of an MRC AML12 randomization. Control Clin Trials. 2003; 24:66-70. PubMed
 
Emerson SS, Fleming TR.  Parameter estimation following sequential hypothesis testing. Biometrika. 1990; 77:875-92.
 
Spiegelhalter DJ, Abrams KR, Myles JP.  Bayesian Approaches to Clinical Trials and Health-Care Evaluation. Chichester, UK: Wiley; 2004.
 
Spiegelhalter DJ, Freedman LS, Parmar MK.  Bayesian approaches to randomized trials. J R Stat Soc [Ser A]. 1994; 157:357-87.
 
Carlin CP, Louis TA.  Bayes and Empirical Bayes Methods for Data Analysis. London: Chapman & Hall; 1996.
 
Berry DA.  A case for Bayesianism in clinical trials. Stat Med. 1993;12:1377-93; discussion 1395-404. [PMID: 8248653]
 
Greenland S.  Bayesian perspectives for epidemiological research: I. Foundations and basic methods. Int J Epidemiol. 2006; 35:765-75. PubMed
 
Bather JA.  On the allocation of treatment in sequential medical trials. Int Stat Rev. 1985; 53:1-13.
 
Ellenberg SS, Fleming TR, DeMets DL.  Data Monitoring Committees in Clinical Trials: A Practical Perspective. Chichester, UK: Wiley; 2003.
 
Ashby D, Tan SB.  Where's the utility in Bayesian data-monitoring of clinical trials? Clin Trials. 2005;2:197-205; discussion 205-8. [PMID: 16279143]
 
Zhao Y, Grambsch PM, Neaton JD.  A decision rule for sequential monitoring of clinical trials with a primary and supportive outcome. Clin Trials. 2007; 4:140-53. PubMed
 
Gong J, Pinheiro JC, DeMets DL.  Estimating significance level and power comparisons for testing multiple endpoints in clinical trials. Control Clin Trials. 2000; 21:313-29. PubMed
 
Sedrakyan A, Atkins D, Treasure T.  The risk of aprotinin: a conflict of evidence. Lancet. 2006; 367:1376-7. PubMed
 
Enas GG, Goldstein DJ.  Defining, monitoring and combining safety information in clinical trials. Stat Med. 1995;14:1099-111; discussion 1113-6. [PMID: 7569503]
 
Ioannidis JP, Lau J.  Completeness of safety reporting in randomized trials: an evaluation of 7 medical areas. JAMA. 2001; 285:437-43. PubMed
 
Ioannidis JP, Evans SJ, Gotzsche PC, O'Neill RT, Altman DG, Schulz K, CONSORT Group.  Better reporting of harms in randomized trials: an extension of the CONSORT statement. Ann Intern Med. 2004; 141:781-8. PubMed
 
Ioannidis JP, Mulrow CD, Goodman SN.  Adverse events: the more you search, the more you find [Editorial]. Ann Intern Med. 2006; 144:298-300. PubMed
 
Jennison C, Turnbull BW.  Group sequential tests for bivariate response: interim analyses of clinical trials with both efficacy and safety endpoints. Biometrics. 1993; 49:741-52. PubMed
 
Tang DI, Geller NL, Pocock SJ.  On the design and analysis of randomized clinical trials with multiple endpoints. Biometrics. 1993; 49:23-30. PubMed
 
Freedman L, Anderson G, Kipnis V, Prentice R, Wang CY, Rossouw J. et al.  Approached to monitoring the results of long-term disease prevention trials: examples from the Women's Health Initiative. Control Clin Trials. 1996; 17:509-25. PubMed
 
Wittes J, Barrett-Connor E, Braunwald E, Chesney M, Cohen HJ, Demets DL, et al.  Monitoring the randomized trials of the Women's Health Initiative: the experience of the data and safety monitoring board. Clin Trials. 2007. [Forthcoming].
 
DeMets DL, Furberg CD, Friedman LM.  Data Monitoring in Clinical Trials: A Case Studies Approach. New York: Springer; 2005.
 
Wittes J.  Behind closed doors: the data monitoring board in randomized clinical trials. Stat Med. 1993; 12:419-24. PubMed
 
.  Practical aspects of decision making in clinical trials: the coronary drug project as a case study. The Coronary Drug Project Research Group. Control Clin Trials. 1981; 1:363-76. PubMed
 
Pocock SJ.  Current controversies in data monitoring for clinical trials. Clin Trials. 2006; 3:513-21. PubMed
 
Meier P.  Statistics and medical experimentation. Biometrics. 1975; 31:511-29. PubMed
 
Fleming TR, DeMets DL.  Monitoring of clinical trials: issues and recommendations. Control Clin Trials. 1993; 14:183-97. PubMed
 
Walker AE, McLeer SK, DAMOCLES Group.  Small group processes relevant to data monitoring committees in controlled clinical trials: an overview of reviews. Clin Trials. 2004; 1:282-96. PubMed
 
Grant AM, Altman DG, Babiker AB, Campbell MK, Clemens FJ, Darbyshire JH, et al. DAMOCLES study group.  Issues in data monitoring and interim analysis of trials. Health Technol Assess. 2005; 9:1-238, iii-iv. PubMed
 
Jonas H.  Philosophical reflections on experimenting with human subjects. Freund PA Experimentation with Human Subjects. New York: George Braziller; 1969; 304-15.
 

Figures

Grahic Jump Location
Figure.
Distribution of observed effects in trials with and without stopping rules.

The trials were designed to have 90% power to detect a 10% mortality benefit (for example, 50% vs. 40%). Each panel corresponds to a different underlying true difference: no difference (top), 10% difference (middle), and 20% difference (bottom). The distribution of results is shown for trials of 2 designs: 1 using a 4-look O'Brien–Fleming stopping rule (“stopping”) and 1 using a fixed sample size (“no stopping”). Median effect size and 2.5% and 97.5% percentiles of each estimate are reported in parentheses. The mean sample size is reported for the “stopping” trial only: n = 1040 for the fixed sample size design.

Grahic Jump Location

Tables

References

Hellman S, Hellman DS.  Of mice but not men. Problems of the randomized clinical trial. N Engl J Med. 1991; 324:1585-9. PubMed
CrossRef
 
Passamani E.  Clinical trials—are they ethical? N Engl J Med. 1991; 324:1589-92. PubMed
 
Greenberg DS.  Johns Hopkins research returns to normal. Lancet. 2001; 358:393. PubMed
 
Slutsky AS, Lavery JV.  Data safety and monitoring boards. N Engl J Med. 2004; 350:1143-7. PubMed
 
Palmer CR, Rosenberger WF.  Ethics and practice: alternative designs for phase III randomized clinical trials. Control Clin Trials. 1999; 20:172-86. PubMed
 
Mueller PS, Montori VM, Bassler D, Koenig BA, Guyatt GH.  Ethical issues in stopping randomized trials early because of apparent benefit. Ann Intern Med. 2007; 146:878-81.
 
Goodman SN.  Toward evidence-based medical statistics. 1: The P value fallacy. Ann Intern Med. 1999; 130:995-1004. PubMed
 
Jennison C, Turnbull BW.  Repeated confidence intervals for group sequential clinical trials. Control Clin Trials. 1984; 5:33-45. PubMed
 
Chang MN, Wieand HS, Chang VT.  The bias of the sample proportion following a group sequential phase II clinical trial. Stat Med. 1989; 8:563-70. PubMed
 
Pocock SJ, Hughes MD.  Practical problems in interim analyses, with particular regard to estimation. Control Clin Trials. 1989; 10:209S-221S. PubMed
 
Fan XF, DeMets DL, Lan KK.  Conditional bias of point estimates following a group sequential test. J Biopharm Stat. 2004; 14:505-30. PubMed
 
Jennison C, Turnbull BW.  Statistical approaches to interim monitoring of medical trials: a review and commentary. Stat Sci. 1999; 5:299-317.
 
Montori VM, Devereaux PJ, Adhikari NK, Burns KE, Eggert CH, Briel M. et al.  Randomized trials stopped early for benefit: a systematic review. JAMA. 2005; 294:2203-9. PubMed
 
O'Brien PC, Fleming TR.  A multiple testing procedure for clinical trials. Biometrics. 1979; 35:549-56. PubMed
 
Wheatley K, Clayton D.  Be skeptical about unexpected large apparent treatment effects: the case of an MRC AML12 randomization. Control Clin Trials. 2003; 24:66-70. PubMed
 
Emerson SS, Fleming TR.  Parameter estimation following sequential hypothesis testing. Biometrika. 1990; 77:875-92.
 
Spiegelhalter DJ, Abrams KR, Myles JP.  Bayesian Approaches to Clinical Trials and Health-Care Evaluation. Chichester, UK: Wiley; 2004.
 
Spiegelhalter DJ, Freedman LS, Parmar MK.  Bayesian approaches to randomized trials. J R Stat Soc [Ser A]. 1994; 157:357-87.
 
Carlin CP, Louis TA.  Bayes and Empirical Bayes Methods for Data Analysis. London: Chapman & Hall; 1996.
 
Berry DA.  A case for Bayesianism in clinical trials. Stat Med. 1993;12:1377-93; discussion 1395-404. [PMID: 8248653]
 
Greenland S.  Bayesian perspectives for epidemiological research: I. Foundations and basic methods. Int J Epidemiol. 2006; 35:765-75. PubMed
 
Bather JA.  On the allocation of treatment in sequential medical trials. Int Stat Rev. 1985; 53:1-13.
 
Ellenberg SS, Fleming TR, DeMets DL.  Data Monitoring Committees in Clinical Trials: A Practical Perspective. Chichester, UK: Wiley; 2003.
 
Ashby D, Tan SB.  Where's the utility in Bayesian data-monitoring of clinical trials? Clin Trials. 2005;2:197-205; discussion 205-8. [PMID: 16279143]
 
Zhao Y, Grambsch PM, Neaton JD.  A decision rule for sequential monitoring of clinical trials with a primary and supportive outcome. Clin Trials. 2007; 4:140-53. PubMed
 
Gong J, Pinheiro JC, DeMets DL.  Estimating significance level and power comparisons for testing multiple endpoints in clinical trials. Control Clin Trials. 2000; 21:313-29. PubMed
 
Sedrakyan A, Atkins D, Treasure T.  The risk of aprotinin: a conflict of evidence. Lancet. 2006; 367:1376-7. PubMed
 
Enas GG, Goldstein DJ.  Defining, monitoring and combining safety information in clinical trials. Stat Med. 1995;14:1099-111; discussion 1113-6. [PMID: 7569503]
 
Ioannidis JP, Lau J.  Completeness of safety reporting in randomized trials: an evaluation of 7 medical areas. JAMA. 2001; 285:437-43. PubMed
 
Ioannidis JP, Evans SJ, Gotzsche PC, O'Neill RT, Altman DG, Schulz K, CONSORT Group.  Better reporting of harms in randomized trials: an extension of the CONSORT statement. Ann Intern Med. 2004; 141:781-8. PubMed
 
Ioannidis JP, Mulrow CD, Goodman SN.  Adverse events: the more you search, the more you find [Editorial]. Ann Intern Med. 2006; 144:298-300. PubMed
 
Jennison C, Turnbull BW.  Group sequential tests for bivariate response: interim analyses of clinical trials with both efficacy and safety endpoints. Biometrics. 1993; 49:741-52. PubMed
 
Tang DI, Geller NL, Pocock SJ.  On the design and analysis of randomized clinical trials with multiple endpoints. Biometrics. 1993; 49:23-30. PubMed
 
Freedman L, Anderson G, Kipnis V, Prentice R, Wang CY, Rossouw J. et al.  Approached to monitoring the results of long-term disease prevention trials: examples from the Women's Health Initiative. Control Clin Trials. 1996; 17:509-25. PubMed
 
Wittes J, Barrett-Connor E, Braunwald E, Chesney M, Cohen HJ, Demets DL, et al.  Monitoring the randomized trials of the Women's Health Initiative: the experience of the data and safety monitoring board. Clin Trials. 2007. [Forthcoming].
 
DeMets DL, Furberg CD, Friedman LM.  Data Monitoring in Clinical Trials: A Case Studies Approach. New York: Springer; 2005.
 
Wittes J.  Behind closed doors: the data monitoring board in randomized clinical trials. Stat Med. 1993; 12:419-24. PubMed
 
.  Practical aspects of decision making in clinical trials: the coronary drug project as a case study. The Coronary Drug Project Research Group. Control Clin Trials. 1981; 1:363-76. PubMed
 
Pocock SJ.  Current controversies in data monitoring for clinical trials. Clin Trials. 2006; 3:513-21. PubMed
 
Meier P.  Statistics and medical experimentation. Biometrics. 1975; 31:511-29. PubMed
 
Fleming TR, DeMets DL.  Monitoring of clinical trials: issues and recommendations. Control Clin Trials. 1993; 14:183-97. PubMed
 
Walker AE, McLeer SK, DAMOCLES Group.  Small group processes relevant to data monitoring committees in controlled clinical trials: an overview of reviews. Clin Trials. 2004; 1:282-96. PubMed
 
Grant AM, Altman DG, Babiker AB, Campbell MK, Clemens FJ, Darbyshire JH, et al. DAMOCLES study group.  Issues in data monitoring and interim analysis of trials. Health Technol Assess. 2005; 9:1-238, iii-iv. PubMed
 
Jonas H.  Philosophical reflections on experimenting with human subjects. Freund PA Experimentation with Human Subjects. New York: George Braziller; 1969; 304-15.
 

Letters

NOTE:
Citing articles are presented as examples only. In non-demo SCM6 implementation, integration with CrossRef’s "Cited By" API will populate this tab (http://www.crossref.org/citedby.html).

Comments

Submit a Comment
Submit a Comment

Summary for Patients

Clinical Slide Sets

Terms of Use

The In the Clinic® slide sets are owned and copyrighted by the American College of Physicians (ACP). All text, graphics, trademarks, and other intellectual property incorporated into the slide sets remain the sole and exclusive property of the ACP. The slide sets may be used only by the person who downloads or purchases them and only for the purpose of presenting them during not-for-profit educational activities. Users may incorporate the entire slide set or selected individual slides into their own teaching presentations but may not alter the content of the slides in any way or remove the ACP copyright notice. Users may make print copies for use as hand-outs for the audience the user is personally addressing but may not otherwise reproduce or distribute the slides by any means or media, including but not limited to sending them as e-mail attachments, posting them on Internet or Intranet sites, publishing them in meeting proceedings, or making them available for sale or distribution in any unauthorized form, without the express written permission of the ACP. Unauthorized use of the In the Clinic slide sets will constitute copyright infringement.

Toolkit

Want to Subscribe?

Learn more about subscription options

Advertisement
Forgot your password?
Enter your username and email address. We'll send you a reminder to the email address on record.
(Required)
(Required)