Bayesian statistics and clinical trial conclusions: Why the OPTIMSE study should be considered positive

Statistical approaches to randomised controlled trial analysis

The statistical approach used in the design and analysis of the vast majority of clinical studies is often referred to as classical or frequentist. Conclusions are made on the results of hypothesis tests with generation of p-values and confidence intervals, and require that the correct conclusion be drawn with a high probability among a notional set of repetitions of the trial.

Bayesian inference is an alternative, which treats conclusions probabilistically and provides a different framework for thinking about trial design and conclusions. There are many differences between the two, but for this discussion there are two obvious distinctions with the Bayesian approach. The first is that prior knowledge can be accounted for to a greater or lesser extent, something life scientists sometimes have difficulty reconciling. Secondly, the conclusions of a Bayesian analysis often focus on the decision that requires to be made, e.g. should this new treatment be used or not.

There are pros and cons to both sides, nicely discussed here, but I would argue that the results of frequentist analyses are too often accepted with insufficient criticism. Here’s a good example.

OPTIMSE: Optimisation of Cardiovascular Management to Improve Surgical Outcome

Optimising the amount of blood being pumped out of the heart during surgery may improve patient outcomes. By specifically measuring cardiac output in the operating theatre and using it to guide intravenous fluid administration and the use of drugs acting on the circulation, the amount of oxygen that is delivered to tissues can be increased.

It sounds like common sense that this would be a good thing, but drugs can have negative effects, as can giving too much intravenous fluid. There are also costs involved, is the effort worth it? Small trials have suggested that cardiac output-guided therapy may have benefits, but the conclusion of a large Cochrane review was that the results remain uncertain.

A well designed and run multi-centre randomised controlled trial was performed to try and determine if this intervention was of benefit (OPTIMSE: Optimisation of Cardiovascular Management to Improve Surgical Outcome).

Patients were randomised to a cardiac output–guided hemodynamic therapy algorithm for intravenous fluid and a drug to increase heart muscle contraction (the inotrope, dopexamine) during and 6 hours following surgery (intervention group) or to usual care (control group).

The primary outcome measure was the relative risk (RR) of a composite of 30-day moderate or major complications and mortality.

OPTIMSE: reported results

Focusing on the primary outcome measure, there were 158/364 (43.3%) and 134/366 (36.6%) patients with complication/mortality in the control and intervention group respectively. Numerically at least, the results appear better in the intervention group compared with controls.

Using the standard statistical approach, the relative risk (95% confidence interval) = 0.84 (0.70-1.01), p=0.07 and absolute risk difference = 6.8% (−0.3% to 13.9%), p=0.07. This is interpreted as there being insufficient evidence that the relative risk for complication/death is different to 1.0 (all analyses replicated below). The authors reasonably concluded that:

In a randomized trial of high-risk patients undergoing major gastrointestinal surgery, use of a cardiac output–guided hemodynamic therapy algorithm compared with usual care did not reduce a composite outcome of complications and 30-day mortality.

A difference does exist between the groups, but is not judged to be a sufficient difference using this conventional approach.

OPTIMSE: Bayesian analysis

Repeating the same analysis using Bayesian inference provides an alternative way to think about this result. What are the chances the two groups actually do have different results? What are the chances that the two groups have clinically meaningful differences in results? What proportion of patients stand to benefit from the new intervention compared with usual care?

With regard to prior knowledge, this analysis will not presume any prior information. This makes the point that prior information is not always necessary to draw a robust conclusion. It may be very reasonable to use results from pre-existing meta-analyses to specify a weak prior, but this has not been done here. Very grateful to John Kruschke for the excellent scripts and book, Doing Bayesian Data Analysis.

The results of the analysis are presented in the graph below. The top panel is the prior distribution. All proportions for the composite outcome in both the control and intervention group are treated as equally likely.

The middle panel contains the main findings. This is the posterior distribution generated in the analysis for the relative risk of the composite primary outcome (technical details in script below).

The mean relative risk = 0.84 which as expected is the same as the frequentist analysis above. Rather than confidence intervals, in Bayesian statistics a credible interval or region is quoted (HDI = highest density interval is the same). This is philosphically different to a confidence interval and says:

Given the observed data, there is a 95% probability that the true RR falls within this credible interval.

This is a subtle distinction to the frequentist interpretation of a confidence interval:

Were I to repeat this trial multiple times and compute confidence intervals, there is a 95% probability that the true RR would fall within these confidence intervals.

This is an important distinction and can be extended to make useful probabilistic statements about the result.

The figures in green give us the proportion of the distribution above and below 1.0. We can therefore say:

The probability that the intervention group has a lower incidence of the composite endpoint is 97.3%.

It may be useful to be more specific about the size of difference between the control and treatment group that would be considered equivalent, e.g. 10% above and below a relative risk = 1.0. This is sometimes called the region of practical equivalence (ROPE; red text on plots). Experts would determine what was considered equivalent based on many factors. We could therefore say:

The probability of the composite end-point for the control and intervention group being equivalent is 22%.

Or, the probability of a clinically relevant difference existing in the composite endpoint between control and intervention groups is 78%

optimise_primary_bayesFinally, we can use the 200 000 estimates of the probability of complication/death in the control and intervention groups that were generated in the analysis (posterior prediction). In essence, we can act like these are 2 x 200 000 patients. For each “patient pair”, we can use their probability estimates and perform a random draw to simulate the occurrence of complication/death. It may be useful then to look at the proportion of “patients pairs” where the intervention patient didn’t have a complication but the control patient did:

Using posterior prediction on the generated Bayesian model, the probability that a patient in the intervention group did not have a complication/death when a patient in the control group did have a complication/death is 28%.

Conclusion

On the basis of a standard statistical analysis, the OPTIMISE trial authors reasonably concluded that the use of the intervention compared with usual care did not reduce a composite outcome of complications and 30-day mortality.

Using a Bayesian approach, it could be concluded with 97.3% certainty that use of the intervention compared with usual care reduces the composite outcome of complications and 30-day mortality; that with 78% certainty, this reduction is clinically significant; and that in 28% of patients where the intervention is used rather than usual care, complication or death may be avoided.

Considerations in the Early Termination of Clinical Trials in Surgery

One of the most difficult situations when running a clinical trial is the decision to terminate the trial early. But it shouldn’t be a difficult decision. With clear stopping rules defined before the trial starts, it should be straightforward to determine when the effect size is large enough that no further patients require to be randomised to definitively answer the question.

Whether there is benefit to leaving a temporary plastic tube drain in the belly after an operation to remove the head of the pancreas is controversial. It may help diagnose and treat the potential disaster that occurs when the join between pancreas and bowel leaks. Others think that the presence of the drain may in fact make a leak more likely.

This question was tackled in an important randomised clinical trial.

A randomised prospective multicenter trial of pancreaticoduodenectomy with and without routine intraperitoneal drainage

The trial was stopped early because there were more deaths in the group who didn’t have a drain. The question that remains: was it the absence of the drain which caused the deaths? As important, was stopping the trial at this point the correct course of action?

My feeling, the lack of a drain was not definitively demonstrated to be the cause of the deaths. And I think the trial was stopped too early. Difficult issues discussed in our letter in Annals of Surgery about it.

Ethics and statistics collide in decisions relating to the early termination of clinical trials. Investigators have a fundamental responsibility to stop a trial where an excess of harm is seen in one of the arms. Decisions on stopping are not straightforward and must balance the potential risk to trial patients against the likelihood that in fact there is no difference in outcome between groups. Indeed, in early termination, the potential loss of generalizable knowledge may itself harm future patients.

We therefore read with interest the article by Van Buren and colleagues (1) and congratulate the authors on the first multicenter randomized trial on the controversial topic of surgical drains after pancreaticoduodenectomy. As the authors report, the trial was stopped by the Data Safety Monitoring Board after only 18% recruitment due to a numerical excess of deaths in the “no-drain” arm.

We would be interested in learning from the process that led to the decision to terminate the trial. A common method to monitor adverse events advocated by the CONSORT group is to define formal sequential stopping rules based on the limit of acceptable adverse event rates (2). These guidelines suggest that authors report the number of planned “looks” at the data, the statistical methods used including any formal stopping rules, and whether these were planned before trial commencement.

This information is often not included in published trial reports, even when early termination has occurred (3). We feel that in the context of important surgical trials, these guidelines should be adhered to.

Early termination can reduce the statistical power of a trial. This can be addressed by examining results as data accumulate, preferably by an independent data monitoring committee. However, performing multiple statistical examinations of accumulating data without appropriate correction can lead to erroneous results and interpretation (4). For example, if accumulating data from a trial are examined at 5 interim analyses that use a P value of 0.05, the overall false-positive rate is nearer to 19% than to the nominal 5%.

Several group sequential statistical methods are available to adjust for multiple analyses (5,6) and their use should be prespecified in the trial protocol. Stopping rules may be formed by 2 broad methods, either using a Bayesian approach to evaluate the proportion of patients with adverse effects or using a hypothesis testing approach with a sequential probability ratio test to determine whether the acceptable adverse effects rate has been exceeded. Data are compared at each interim analysis and decisions based on prespecified criteria. As an example, stopping rules for harm from a recent study used modified Haybittle-Peto boundaries of 3 SDs in the first half of the study and 2 SDs in the second half (7). The study of Van Buren and colleagues is reported to have been stopped after 18% recruitment due to an excess of 6 deaths in the “no-drain” arm. The relative risk of death at 90 days in the “no-drain” group versus the “drain” group was 3.94 (95% confidence interval, 0.87–17.90), equivalent to a difference of 1.78 SD. The primary outcome measure was any grade 2 complication or more and had a relative risk of 1.32 (5% confidence interval, 1.00–1.75), or 1.95 SD.

The decision to terminate a trial early is not based on statistics alone. Judgements must be made using all the available evidence, including the biological and clinical plausibility of harm and the findings of previous studies. Statistical considerations should therefore be used as a starting point for decisions, rather than a definitive rule.

The Data Safety Monitoring Board for the study of Van Buren and colleagues clearly felt that there was no option other than to terminate the trial. However, at least on statistical grounds, this occurred very early in the trial using conservative criteria. The question remains therefore is the totality of evidence convincing that the question posed has been unequivocally answered? We would suggest that this is not the case. In general terms, stopping a clinical trial early is a rare event that sends out a message that, because of the “sensational” effect, may have greater impact on the medical community than intended, making future studies in that area challenging.

1. Van Buren G, Bloomston M, Hughes SJ, et al. A randomised prospective multicenter trial of pancreaticoduodenectomy with and without routine intraperitoneal drainage. Ann Surg. 2014;259: 605–612.

2. Moher D, Hopewell S, Schulz KF, et al. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trial. BMJ. 2010;340:c869.

3. Montori VM, Devereaux PJ, Adhikari NK, et al. Randomized trials stopped early for benefit: a systematic review. JAMA. 2005;294:2203–2209.

4. Geller NL, Pocock SJ. Interim analyses in randomized clinical trials: ramifications and guidelines for practitioners. Biometrics. 1987;43:213–223.

5. Pocock SJ. When to stop a clinical trial. BMJ. 1992;305:235–240.

6. Berry DA. Interim analyses in clinical trials: classical vs. Bayesian approaches. Stat Med. 1985;4:521– 526.

7. Connolly SJ, Pogue J, Hart RG, et al. Effect of clopidogrel added to aspirin in patients with atrial fibrillation. N Engl J Med. 2009;360:2066– 2078.

Investigator contact details should continue to be available after completion of clinical trials

ClinicalTrials.gov front page

Why oh why does the National Library for Medicine remove investigator contact details from clinicaltrials.gov after completion of a trial. We need to contact them  to ask why their trial is not published!

BMJ letter from us on the subject:

For successful translation of results from research into practice there must also be timely dissemination of research findings (1). The Food and Drug Administration Amendments Act requires trials that are subject to mandatory reporting to post results within 12 months of study completion on ClinicalTrials.gov (2). Despite this initiative, less than a quarter of trial investigators comply (3).

Maruani and colleagues report that email reminders of the legal requirement to post results significantly improve reporting at six months (4). Any intervention that increases dissemination of clinical trial results is welcome and the authors should be commended for their efforts.

However, we do not understand the authors’ described methods. They report that the cohort included trials “that had available contact details (email addresses) of responsible parties”, and go on to state that they “extracted the email addresses of responsible parties from ClinicalTrials.gov”. In the discussion they highlight “the need for updating email addresses of responsible parties in ClinicalTrials.gov”.

We would be interested to know how this is possible as all email addresses for completed trials are removed from ClinicalTrials.gov as a matter of policy by the National Library of Medicine (Table 1).

We asked the National Library of Medicine (NLM) to comment on this. In addition, we asked what advice they would give a patient who had taken part in a completed clinical trial, and wished to contact the investigators to enquire about trial results. They responded:
“If the record is closed or completed, we remove all contact information in the location and contact section since there is no reason why a potential patient would need to contact them.”

The NLM will not provide contact email addresses on request, despite these previously being available on ClinicalTrials.gov while the trial was recruiting.

The removal of previously published contact information from ClinicalTrials.gov has important implications for transparency in trial reporting. Interventions, such as that proposed by Maruani, cannot be delivered at scale while this practice exists. Searching for contact details manually with Google or Pubmed is difficult at best and impossible for many, who may include patients that have participated in a study and wish to contact the investigators about trial results.

References
1. Ross JS, Tse T, Zarin DA, Xu H, Zhou L, Krumholz HM. Publication of NIH funded trials registered in ClinicalTrials.gov: cross sectional analysis. Bmj 2012;344: d7292.
2. Zarin DA, Tse T, Williams RJ, Califf RM, Ide NC. The ClinicalTrials.gov results database–update and key issues. N Engl J Med 2011;364(9): 852-860.
3. Prayle AP, Hurley MN, Smyth AR. Compliance with mandatory reporting of clinical trial results on ClinicalTrials.gov: cross sectional study. Bmj 2012;344: d7373.
4. Maruani A, Boutron I, Baron G, Ravaud P. Impact of sending email reminders of the legal requirement for posting results on ClinicalTrials.gov: cohort embedded pragmatic randomized controlled trial. Bmj 2014;349: g5579.

GlobalSurg recruitment starting soon

I’m excited to be involved in an enthusiastic young collaborative called GlobalSurg. Research in surgery has been a predominately first world affair and it is absolutely essential to see international collaboration including developing nations. Our study focusses on emergency abdominal surgery and complements a similar initiative looking at elective abdominal surgery, ISOS.

Why this study and why now? Surgery has been referred to as the neglected step child of global public health, a sentiment I completely agree with. Diseases effectively treated with surgery are becoming the public health priority for developing nations, a fact highlighted by the excellent International Collaboration for Essential Surgery (ICES) and important Right to Heal campaign.

Least wealthy countries account for 35% of the global population yet undertook only 3.5% of all surgical procedures in 2004.

This GlobalSurg project aims to be the first of many. It will establish what happens to patients across the world after emergency abdominal surgery

The primary outcome measure here is pragmatic: which patients are still alive 24 h following emergency surgery? A number of secondary measures will provide depth. Case mix will be determined as far as is possible and an analysis of facilities included.globalsurg_contact_250

Anyone can still get involved in GlobalSurg and I would encourage you to do so. We have everyone from professors of surgery in large first-world urban centres to small community hospitals in developing countries.

It only requires data collection over any two week period in July-November 2014. Patients are easy to identify and there are only 30 patient related data-points to collect. Data can be collected on paper or directly into our REDCap system, which I will write more about in the future.