Publishing mortality rates for individual surgeons

This is our new analysis of an old topic.In Scotland, individual surgeon outcomes were published as far back as 2006. It wasn’t pursued in Scotland, but has been mandated for surgeons in England since 2013.

This new analysis took the current mortality data and sought to answer a simple question: how useful is this information in detecting differences in outcome at the individual surgeon level?

Well the answer, in short, is not very useful.

We looked at mortality after planned bowel and gullet cancer surgery, hip replacement, and thyroid, obesity and aneurysm surgery. Death rates are relatively low after planned surgery which is testament to hard working NHS teams up and down the country. This together with the fact that individual surgeons perform a relatively small proportion of all these procedures means that death rates are not a good way to detect under performance.

At the mortality rates reported for thyroid (0.08%) and obesity (0.07%) surgery, it is unlikely a surgeon would perform a sufficient number of procedures in his/her entire career to stand a good chance of detecting a mortality rate 5 times the national average.

Surgeon death rates are problematic in more fundamental ways. It is the 21st century and much of surgical care is delivered by teams of surgeons, other doctors, nurses, physiotherapists, pharmacists, dieticians etc. In liver transplantation it is common for one surgeon to choose the donor/recipient pair, for a second surgeon to do the transplant, and for a third surgeon to look after the patient after the operation. Does it make sense to look at the results of individuals? Why not of the team?

It is also important to ensure that analyses adequately account for the increased risk faced by some patients undergoing surgery. If my granny has had a heart attack and has a bad chest, I don’t want her to be deprived of much needed surgery because a surgeon is worried that her high risk might impact on the public perception of their competence. As Harry Burns the former Chief Medical Officer of Scotland said, those with the highest mortality rates may be the heroes of the health service, taking on patients with difficult disease that no one else will face.

We are only now beginning to understand the results of surgery using measures that are more meaningful to patients. These sometimes get called patient-centred outcome measures. Take a planned hip replacement, the aim of the operation is to remove pain and increase mobility. If after 3 months a patient still has significant pain and can’t get out for the groceries, the operation has not been a success. Thankfully death after planned hip replacement is relatively rare and in any case, might have little to do with the quality of the surgery.

Transparency in the results of surgery is paramount and publishing death rates may be a step towards this, even if they may in fact be falsely reassuring. We must use these data as part of a much wider initiative to capture the success and failures of surgery. Only by doing this will we improve the results of surgery and ensure every patient receives the highest quality of care.

Read the full article for free here.

All the transplant statistics you need

If you have a hunger for statistics on organ transplantation, check out NHS Blood and Transplant. There are regularly updated and reflect what is actually happening in UK transplant today. We should have a competition for novel ways of presenting these visually. Ideas?!

Having a low blood count increase complications from liver surgery

A low blood count is common with cancer. There are now more studies showing that this can contribute to complications after surgery. Blood transfusion increases blood count but is best avoided in cancer unless the blood count is very low. This new study in the journal HPB shows the effect of anaemia after liver surgery. Here is the editorial highlight I wrote for the journal.

Preoperative anaemia is common and affects 30-60% of patients undergoing major elective surgery. In major non-cardiac surgery, anaemia is associated with increased morbidity and mortality, as well as higher blood transfusion rates.

The importance of preoperative anaemia in liver resection patients is becoming recognised. In this issue, Tohme and colleagues present an evaluation of the American College of Surgeons’ National Surgical Quality Improvement Program (ACS-NSQIP) database.

Of around 13000 patients who underwent elective liver resection from 2005 to 2012, one third were anaemic prior to surgery. After adjustment, anaemia was associated with major complications after surgery (OR 1.21, 1.09-1.33) but not death.

Patients who are anaemic have different characteristics to those who are not, characteristics that are likely to make them more susceptible to complications. While this analysis extensively adjusts for observed factors, residual confounding almost certainly exists.

The question remains, does anaemia itself contribute to the occurrence of complications, or is it just a symptom of greater troubles? The authors rightly highlight the importance of identifying anaemia prior to surgery, but it remains to be seen whether treatment is possible and whether it will result in better patient outcomes.

Perioperative transfusion is independently associated with major complications. Although there is no additive effect in anaemic patients, the benefits of treating anaemia may be offset by the detrimental effect of transfusion. For those with iron deficiency, treatment with intravenous iron may be of use and is currently being studied in an RCT of all major surgery (preventt.lshtm.ac.uk). Results of studies such as these will help determine causal relationships and whether intervention is possible and beneficial.

Keyhole or open surgery for bowel cancer metastases in the liver?

Laparoscopic (keyhole) approaches for liver resection are well described and in common use, but we do not yet have robust randomised controlled trial data comparing safety and effectiveness (a number of randomised trials are on-going). Observational studies have been published suggesting approaches are comparable, but as always with studies of this type, bias in treatment allocation limits conclusions.

In the February 2016 issue of HPB, Lewin and colleagues present a retrospective observational study comparing survival in laparoscopic versus open resection for colorectal liver metastases.

Selection bias always exists in non-randomised studies, but the authors have tried to reduce this with a propensity score based technique. These approaches aim to reduce the bias between treatment groups by accounting for differences in measured variables. Unmeasured variables are clearly not accounted for, while in an RCT these would be expected to be distributed equally between groups.

The authors use an “inverse probability of treatment weighting” method, which creates a synthetic sample in which treatment assignment is independent of measured baseline variables. It has the advantage of handling censored survival data better than alternatives and produces estimates of average treatment effects for the entire population, rather than just the treated group.

The actual 5 year overall survival of 59% is impressive and the analysis showed equivalent outcomes between open and laparoscopic groups. Residual confounding is likely to exist and as the authors point out, the open group had twice the positive margin rate (18% vs 8%) suggesting these procedures were technically more challenging. We look forward to seeing how these results compare to the outcomes of on-going RCTs.

Predicting liver failure and death after liver surgery

There have been many attempts to define predictive models for the identification of patients at risk of liver failure after surgery (posthepatectomy liver failure (PHLF)) and death. These have previously been hindered by the lack of a robust definition of PHLF and the two most commonly used definitions – the 50-50 and International Study Group of Liver Surgery (ISGLS) criteria – have now helped with this. These definitions are based on a measure of blood clotting (prothrombin time) and the serum bilirubin concentration, reflecting the synthetic and excretory/detoxifying functions of the liver. One criticism of these is that the criteria are taken on day 5 after surgery, a time-point some have argued is too late to intervene upon.

In a new analysis, Herbert and colleagues present an analysis of 1528 major liver resection patients and examine the changes in serum phosphate levels and creatinine immediate after surgery. It was previously shown a failure of phosphate levels to fall after surgery is associated with liver failure and death (Squires, HPB, 2014). Low serum phosphate after liver resection is well recognised and originally thought to be a consequence of consumption during liver growth (hypertrophy). However, while active take-up of phosphate into the liver after surgery does happen, this is insufficient to fully explain low phosphate levels. The authors point to studies demonstrating a significant increase in the urinary excretion of phosphate following hepatectomy which may also contribute.

Herbert provides a practical definition: creatinine on day 1 post surgery (PoD1) > day of surgery (DoS) and phosphate fails to decrease by 20% from DoS to PoD1. There is a strong association in multivariable analyses with death (Odds ratio 2.53, 1.36–4.71) and PHLF (3.89, 1.85–8.37).

The serum phosphate/creatinine definition identified 52% of those that died, but also 25% that survived without evidence of PHLF. It may be that this can be improved by incorporating other parameters, or my identifying a high risk group a priori. Given the lack of specific therapies beyond that of high quality intensive care, whether death can actually be averted is separate question.

An alternative presentation of the ProPublica Surgeon Scorecard

ProPublica, an independent investigative journalism organisation, have published surgeon-level complications rates based on Medicare data. I have already highlighted problems with the reporting of the data: surgeons are described as having a “high adjusted rate of complications” if they fall in the red-zone, despite there being too little data to say whether this has happened by chance.

4
This surgeon should not be identified as having a “high adjusted rate of complications” as there are too few cases to estimate the complication rate accurately.

I say again, I fully support transparency and public access to healthcare. But the ProPublica reporting has been quite shocking. I’m not aware of them publishing the number of surgeons out of the 17000 that are statistically different to the average. This is a small handful.

ProPublica could have chosen a different approach. This is a funnel plot and I’ve written about them before.

A funnel plot is a summary of an estimate (such as complication rate) against a measure of the precision of that estimate. In the context of healthcare, a centre or individual outcome is often plotted against patient volume. A horizontal line parallel to the x-axis represents the outcome for the entire population and outcomes for individual surgeons are displayed as points around this. This allows a comparison of individuals with that of the population average, while accounting for the increasing certainty surrounding that outcome as the sample size increases. Limits can be determined, beyond which the chances of getting an individual outcome are low if that individual were really part of the whole population.

In other words, a surgeon above the line has a complication rate different to the average.

I’ve scraped the ProPublica data for gallbladder removal (laparoscopic cholecystectomy) from California, New York and Texas for surgeons highlighted in the red-zone. These are surgeons ProPublica says have high complication rates.

As can be seen from the funnel plot, these surgeons are no where near being outliers. There is insufficient information to say whether any of them are different to average. ProPublica decided to ignore the imprecision with which the complication rates are determined. For red-zone surgeons from these 3 states, none of them have complication rates different to average.

ProPublica_lap_chole_funnel
Black line, population average (4.4%), blue line 95% control limit, red line 99% control limit.

How likely is it that a surgeon with an average complication rate (4.4%) will appear in the red-zone just by chance (>5.2%)? The answer is, pretty likely given the small numbers of cases here: anything up to a 25% chance depending on the number of cases performed. Even at the top of the green-zone (low ACR, 3.9%), there is still around a 1 in 6 chance a surgeon will appear to have a high complication rate just by chance.

chance_of_being_in_redzoneProPublica have failed in their duty to explain these data in a way that can be understood. The surgeon score card should be revised. All “warning explanation points” should be removed for those other than the truly outlying cases.

Data

Download

Git

Link to repository.

Code

The problem with ProPublica’s surgeon scorecards

ProPublica is an organisation performing independent, non-profit investigative journalism in the public interest. Yesterday it published an analysis of surgeon-level complications rates based on Medicare data.

Publication of individual surgeons results is well established in the UK. Transparent, easily accessible healthcare data is essential and initiatives like this are welcomed.

It is important that data are presented in a way that can be clearly understood. Communicating risk is notoriously difficult. This is particularly difficult when it is necessary to describe the precision with which a risk has been estimated.

Unfortunately that is where ProPublica have got it all wrong.

There is an inherent difficulty faced when we dealing with individual surgeon data. In order to be sure that a surgeon has a complication rate higher than average, that surgeon needs to have performed a certain number of that particular procedure. If data are only available on a small number of cases, we can’t be certain whether the surgeon’s complication rate is truly high, or just appears to be high by chance.

If you tossed a coin 10 times and it came up with 7 heads, could you say whether the coin was fair or biased? With only 10 tosses we don’t know.

Similarly, if a surgeon performs 10 operations and has 1 complication, can we sure that their true complication rate is 10%, rather than 5% or 20%? With only 10 operations we don’t know.

The presentation of the ProPublica data is really concerning. Here’s why.

For a given hospital, data are presented for individual surgeons. Bands are provided which define “low”, “medium” and “high” adjusted complication rates. If the adjusted complication rate for an individual surgeon falls within the red-zone, they are described as having a “high adjusted rate of complications”.

1How confident can we be that a surgeon in the red-zone truly has a high complication rate? To get a handle on this, we need to turn to an off-putting statistical concept called a “confidence interval”. As it’s name implies, a confidence interval tells us what degree of confidence we can treat the estimated complication rate.

2If the surgeon has done many procedures, the confidence interval will be narrow. If we only have data on a few procedures, the confidence interval will be wide.

To be confident that a surgeon has a high complication rate, the 95% confidence interval needs to entirely lie in the red-zone.

A surgeon should be highlighted as having a high complication rate if and only if the confidence interval lies entirely in the red-zone.

Here is an example. This surgeon performs the procedure to remove the gallbladder (cholecystectomy). There are data on 20 procedures for this individual surgeon. The estimated complication rate is 4.7%. But the 95% confidence interval goes from the green-zone all the way to the red-zone. Due to the small number of procedures, all we can conclude is that this surgeon has either a low, medium, or high adjusted complication rate. Not very useful.

8Here are some other examples.

Adjusted complication rate: 1.5% on 339 procedures. Surgeon has low or medium complication rate. They are unlikely to have a high complication rate.

5Adjusted complication rate: 4.0% on 30 procedures. Surgeon has low or medium or high complication rate. Note due to the low numbers of cases, the analysis correctly suggests an estimated complication rate, despite the fact this surgeon has not had any complications for the 30 procedures.
3Adjusted complication rate: 5.4% on 21 procedures. ProPublica conclusion: surgeon has high adjusted complication rate. Actual conclusion: surgeon has low, medium or high complication rate.
4Adjusted complication rate: 6.6% on 22 procedures. ProPublica conclusion: surgeon has high adjusted complication rate. Actual conclusion: surgeon has medium or high complication rate, but is unlikely to have a low complication rate.
6Adjusted complication rate: 7.6% on 86 procedures. ProPublica conclusion: surgeon has high adjusted complication rate. Actual conclusion: surgeon has high complication rate. This is one of the few examples in the dataset, where the analysis suggest this surgeon does have a high likelihood of having a high complication rate.

7In the UK, only this last example would to highlighted as concerning. That is because we have no idea whether surgeons who happen to fall into the red-zone are truly different to average.

The analysis above does not deal with issues others have highlighted: that this is Medicare data only, that important data may be missing , that the adjustment for patient case mix may be inadequate, and that the complications rates seem different to what would be expected.

ProPublica have not moderated the language used in reporting these data. My view is that the data are being misrepresented.

ProPublica should highlight cases like the last mentioned above. For all the others, all that can be concluded is that there are too few cases to be able to make a judgement on whether the surgeon’s complication rate is different to average.

RStudio and GitHub

Version control has become essential for me keeping track of projects, as well as collaborating. It allows backup of scripts and easy collaboration on complex projects. RStudio works really well with Git, an open source open source distributed version control system, and GitHub, a web-based Git repository hosting service. I was always forget how to set up a repository, so here’s a reminder.

This example is done on RStudio Server, but the same procedure can be used for RStudio desktop. Git or similar needs to be installed first, which is straight forward to do.

Setup Git on RStudio and Associate with GitHub

In RStudio, Tools -> Version Control, select Git.

In RStudio, Tools -> Global Options, select Git//SVN tab. Ensure the path to the Git executable is correct. This is particularly important in Windows where it may not default correctly (e.g. C:/Program Files (x86)/Git/bin/git.exe).
1Now hit, Create RSA Key …

2_rsaClose this window.

Click, View public key, and copy the displayed public key.

4_rsaIf you haven’t already, create a GitHub account. Open your account settings and click the SSH keys tab. Click Add SSH key. Paste in the public key you have copied from RStudio.

6_add_keyTell Git who you are. Remember Git is a piece of software running on your own computer. This is distinct to GitHub, which is the repository website. In RStudio, click Tools -> Shell … . Enter:

git config --global user.email "mail@ewenharrison.com"
git config --global user.name "ewenharrison"

Use your GitHub username.

10_who_are_you

Create New project AND git

In RStudio, click New project as normal. Click New Directory.

7_new_project

Name the project and check Create a git repository.

8_new_project_with_git

Now in RStudio, create a new script which you will add to your repository.

9_test_scriptAfter saving your new script (test.R), it should appear in the Git tab on the Environment / history panel.

11_initial_commitClick the file you wish to add, and the status should turn to a green ‘A’. Now click Commit and enter an identifying message in Commit message.

12_inital_commit2You have now committed the current version of this file to your repository on your computer/server. In the future you may wish to create branches to organise your work and help when collaborating.

Now you want to push the contents of this commit to GitHub, so it is also backed-up off site and available to collaborators. In GitHub, create a New repository, called here test.

5_create_git In RStudio, again click Tools -> Shell … . Enter:

git remote add origin https://github.com/ewenharrison/test.git
git config remote.origin.url git@github.com:ewenharrison/test.git
git pull -u origin master
git push -u origin master

13_push_pullYou have now pushed your commit to GitHub, and should be able to see your files in your GitHub account. The Pull Push buttons in RStudio will now also work. Remember, after each Commit, you have to Push to GitHub, this doesn’t happen automatically.

Clone an existing GitHub project to new RStudio project

In RStudio, click New project as normal. Click Version Control.

7_new_projectIn Clone Git Repository, enter the GitHub repository URL as per below. Change the project directory name if necessary.

14_new_version_controlIn RStudio, again click Tools -> Shell … . Enter:

git config remote.origin.url git@github.com:ewenharrison/test.git

Interested in international trials? Take part in GlobalSurg.

GlobalSurg: Global Surgery results to be published soon

The GlobalSurg project is continuing to be a great experience.

The first cohort study reports data from patients undergoing emergency abdominal surgery across the world.

10,745 patients from 357 centres across 58 countries.

Around half are from high (n=6538, 60.8%) and half from low/middle (n=4207, 39.2%) human development index (HDI) settings. We are limited in what findings we can release prior to formal publication. But here are some of the areas we have looked at.

  • Appendicectomy is the most commonly performed operation across all countries (HDI: high 38.1%, middle 52.8%, low 38.2%)
  • Trauma is the indication for surgery in a higher proportion of cases in middle and low HDI countries (10.0% and 12.1% respectively) than in HDI countries (2.2%).
  • Use of midline laparotomy increased across development indices (high 27.3%, middle 29.0%, low 41.4%).

R Graphics Output

7 day NHS

High quality care for patients seven days a week seems like a good idea to me. There is nothing worse than going round the ward on Saturday or Sunday and having to tell patients that they will get their essential test or treatment on Monday.

It was stated in the Queen’s Speech this year that seven day services would be implemented in England as part of a new five-year plan.

In England my Government will secure the future of the National Health Service by implementing the National Health Service’s own five-year plan, by increasing the health budget, integrating healthcare and social care, and ensuring the National Health Service works on a seven day basis.

Work has started in pilot trusts. Of course funding is the biggest issue and details are sketchy. Some hope that the provision of weekend services will allow patients to be discharged quicker and so save money. With the high capital cost of expensive equipment like MRI scanners, it makes financial sense to ‘sweat the assets’ more at weekends where workload is growing or consolidated across fewer providers.

But that may be wishful thinking. The greatest cost to the NHS is staffing and weekend working inevitably means more staff. Expensive medically qualified staff at that. It is in this regard that the plan seems least developed: major areas of the NHS cannot recruit to posts at the moment. Emergency medicine and acute medicine for instance. Where are these weekend working individuals going to come from?

I thought I’d look at our operating theatre utilisation across the week. These are data from the middle of 2010 to present and do not include emergency/unplanned operating. The first plot shows the spread of total hours of operating by day of the week. How close are we to a 7 day NHS?

Well, 3 days short.

I don’t know why we are using are operating theatres less on Fridays. Surgeons in the past may have preferred not to operate on a Friday, avoiding those crucial first post-operative days being on the weekend. But surely that is not still the case? Yet there has been no change in this pattern over the last 4 years.

Here’s a thought. Perhaps until weekend NHS services are equivalent to weekdays, it is safer not to perform elective surgery on a Friday? It is worse than I thought.

elective_theatre_by_wdayelective_theatre_mon_fri

Journal bans p-values

Editors from the journal Basic and Applied Social Psychology have banned p-values. Or rather null hypothesis significance testing – which includes all the common statistical tests usually reported in studies.

A bold move and an interesting one. In an editorial, the new editor David Trafimow states,

null hypothesis significance testing procedure has been shown to be logically invalid and to provide little information about the actual likelihood of either the null or experimental hypothesis.

He seems to be on a mission and cites his own paper from 12 years ago in support of the position.

So what should authors provide instead to support or refute a hypothesis? Strong descriptive statistics including effect sizesl and the presentation of frequency or distributional data is encouraged. Which sounds reasonable. And larger sample sizes are also required. Ah, were it that easy.

Bayesian approaches are encouraged but not required.

Challenging the dominance of poorly considered p-value is correct. I’d like to see a medical journal do the same.

Bayesian statistics and clinical trial conclusions: Why the OPTIMSE study should be considered positive

Statistical approaches to randomised controlled trial analysis

The statistical approach used in the design and analysis of the vast majority of clinical studies is often referred to as classical or frequentist. Conclusions are made on the results of hypothesis tests with generation of p-values and confidence intervals, and require that the correct conclusion be drawn with a high probability among a notional set of repetitions of the trial.

Bayesian inference is an alternative, which treats conclusions probabilistically and provides a different framework for thinking about trial design and conclusions. There are many differences between the two, but for this discussion there are two obvious distinctions with the Bayesian approach. The first is that prior knowledge can be accounted for to a greater or lesser extent, something life scientists sometimes have difficulty reconciling. Secondly, the conclusions of a Bayesian analysis often focus on the decision that requires to be made, e.g. should this new treatment be used or not.

There are pros and cons to both sides, nicely discussed here, but I would argue that the results of frequentist analyses are too often accepted with insufficient criticism. Here’s a good example.

OPTIMSE: Optimisation of Cardiovascular Management to Improve Surgical Outcome

Optimising the amount of blood being pumped out of the heart during surgery may improve patient outcomes. By specifically measuring cardiac output in the operating theatre and using it to guide intravenous fluid administration and the use of drugs acting on the circulation, the amount of oxygen that is delivered to tissues can be increased.

It sounds like common sense that this would be a good thing, but drugs can have negative effects, as can giving too much intravenous fluid. There are also costs involved, is the effort worth it? Small trials have suggested that cardiac output-guided therapy may have benefits, but the conclusion of a large Cochrane review was that the results remain uncertain.

A well designed and run multi-centre randomised controlled trial was performed to try and determine if this intervention was of benefit (OPTIMSE: Optimisation of Cardiovascular Management to Improve Surgical Outcome).

Patients were randomised to a cardiac output–guided hemodynamic therapy algorithm for intravenous fluid and a drug to increase heart muscle contraction (the inotrope, dopexamine) during and 6 hours following surgery (intervention group) or to usual care (control group).

The primary outcome measure was the relative risk (RR) of a composite of 30-day moderate or major complications and mortality.

OPTIMSE: reported results

Focusing on the primary outcome measure, there were 158/364 (43.3%) and 134/366 (36.6%) patients with complication/mortality in the control and intervention group respectively. Numerically at least, the results appear better in the intervention group compared with controls.

Using the standard statistical approach, the relative risk (95% confidence interval) = 0.84 (0.70-1.01), p=0.07 and absolute risk difference = 6.8% (−0.3% to 13.9%), p=0.07. This is interpreted as there being insufficient evidence that the relative risk for complication/death is different to 1.0 (all analyses replicated below). The authors reasonably concluded that:

In a randomized trial of high-risk patients undergoing major gastrointestinal surgery, use of a cardiac output–guided hemodynamic therapy algorithm compared with usual care did not reduce a composite outcome of complications and 30-day mortality.

A difference does exist between the groups, but is not judged to be a sufficient difference using this conventional approach.

OPTIMSE: Bayesian analysis

Repeating the same analysis using Bayesian inference provides an alternative way to think about this result. What are the chances the two groups actually do have different results? What are the chances that the two groups have clinically meaningful differences in results? What proportion of patients stand to benefit from the new intervention compared with usual care?

With regard to prior knowledge, this analysis will not presume any prior information. This makes the point that prior information is not always necessary to draw a robust conclusion. It may be very reasonable to use results from pre-existing meta-analyses to specify a weak prior, but this has not been done here. Very grateful to John Kruschke for the excellent scripts and book, Doing Bayesian Data Analysis.

The results of the analysis are presented in the graph below. The top panel is the prior distribution. All proportions for the composite outcome in both the control and intervention group are treated as equally likely.

The middle panel contains the main findings. This is the posterior distribution generated in the analysis for the relative risk of the composite primary outcome (technical details in script below).

The mean relative risk = 0.84 which as expected is the same as the frequentist analysis above. Rather than confidence intervals, in Bayesian statistics a credible interval or region is quoted (HDI = highest density interval is the same). This is philosphically different to a confidence interval and says:

Given the observed data, there is a 95% probability that the true RR falls within this credible interval.

This is a subtle distinction to the frequentist interpretation of a confidence interval:

Were I to repeat this trial multiple times and compute confidence intervals, there is a 95% probability that the true RR would fall within these confidence intervals.

This is an important distinction and can be extended to make useful probabilistic statements about the result.

The figures in green give us the proportion of the distribution above and below 1.0. We can therefore say:

The probability that the intervention group has a lower incidence of the composite endpoint is 97.3%.

It may be useful to be more specific about the size of difference between the control and treatment group that would be considered equivalent, e.g. 10% above and below a relative risk = 1.0. This is sometimes called the region of practical equivalence (ROPE; red text on plots). Experts would determine what was considered equivalent based on many factors. We could therefore say:

The probability of the composite end-point for the control and intervention group being equivalent is 22%.

Or, the probability of a clinically relevant difference existing in the composite endpoint between control and intervention groups is 78%

optimise_primary_bayesFinally, we can use the 200 000 estimates of the probability of complication/death in the control and intervention groups that were generated in the analysis (posterior prediction). In essence, we can act like these are 2 x 200 000 patients. For each “patient pair”, we can use their probability estimates and perform a random draw to simulate the occurrence of complication/death. It may be useful then to look at the proportion of “patients pairs” where the intervention patient didn’t have a complication but the control patient did:

Using posterior prediction on the generated Bayesian model, the probability that a patient in the intervention group did not have a complication/death when a patient in the control group did have a complication/death is 28%.

Conclusion

On the basis of a standard statistical analysis, the OPTIMISE trial authors reasonably concluded that the use of the intervention compared with usual care did not reduce a composite outcome of complications and 30-day mortality.

Using a Bayesian approach, it could be concluded with 97.3% certainty that use of the intervention compared with usual care reduces the composite outcome of complications and 30-day mortality; that with 78% certainty, this reduction is clinically significant; and that in 28% of patients where the intervention is used rather than usual care, complication or death may be avoided.