7 day NHS

High quality care for patients seven days a week seems like a good idea to me. There is nothing worse than going round the ward on Saturday or Sunday and having to tell patients that they will get their essential test or treatment on Monday.

It was stated in the Queen’s Speech this year that seven day services would be implemented in England as part of a new five-year plan.

In England my Government will secure the future of the National Health Service by implementing the National Health Service’s own five-year plan, by increasing the health budget, integrating healthcare and social care, and ensuring the National Health Service works on a seven day basis.

Work has started in pilot trusts. Of course funding is the biggest issue and details are sketchy. Some hope that the provision of weekend services will allow patients to be discharged quicker and so save money. With the high capital cost of expensive equipment like MRI scanners, it makes financial sense to ‘sweat the assets’ more at weekends where workload is growing or consolidated across fewer providers.

But that may be wishful thinking. The greatest cost to the NHS is staffing and weekend working inevitably means more staff. Expensive medically qualified staff at that. It is in this regard that the plan seems least developed: major areas of the NHS cannot recruit to posts at the moment. Emergency medicine and acute medicine for instance. Where are these weekend working individuals going to come from?

I thought I’d look at our operating theatre utilisation across the week. These are data from the middle of 2010 to present and do not include emergency/unplanned operating. The first plot shows the spread of total hours of operating by day of the week. How close are we to a 7 day NHS?

Well, 3 days short.

I don’t know why we are using are operating theatres less on Fridays. Surgeons in the past may have preferred not to operate on a Friday, avoiding those crucial first post-operative days being on the weekend. But surely that is not still the case? Yet there has been no change in this pattern over the last 4 years.

Here’s a thought. Perhaps until weekend NHS services are equivalent to weekdays, it is safer not to perform elective surgery on a Friday? It is worse than I thought.

elective_theatre_by_wdayelective_theatre_mon_fri

Bayesian statistics and clinical trial conclusions: Why the OPTIMSE study should be considered positive

Statistical approaches to randomised controlled trial analysis

The statistical approach used in the design and analysis of the vast majority of clinical studies is often referred to as classical or frequentist. Conclusions are made on the results of hypothesis tests with generation of p-values and confidence intervals, and require that the correct conclusion be drawn with a high probability among a notional set of repetitions of the trial.

Bayesian inference is an alternative, which treats conclusions probabilistically and provides a different framework for thinking about trial design and conclusions. There are many differences between the two, but for this discussion there are two obvious distinctions with the Bayesian approach. The first is that prior knowledge can be accounted for to a greater or lesser extent, something life scientists sometimes have difficulty reconciling. Secondly, the conclusions of a Bayesian analysis often focus on the decision that requires to be made, e.g. should this new treatment be used or not.

There are pros and cons to both sides, nicely discussed here, but I would argue that the results of frequentist analyses are too often accepted with insufficient criticism. Here’s a good example.

OPTIMSE: Optimisation of Cardiovascular Management to Improve Surgical Outcome

Optimising the amount of blood being pumped out of the heart during surgery may improve patient outcomes. By specifically measuring cardiac output in the operating theatre and using it to guide intravenous fluid administration and the use of drugs acting on the circulation, the amount of oxygen that is delivered to tissues can be increased.

It sounds like common sense that this would be a good thing, but drugs can have negative effects, as can giving too much intravenous fluid. There are also costs involved, is the effort worth it? Small trials have suggested that cardiac output-guided therapy may have benefits, but the conclusion of a large Cochrane review was that the results remain uncertain.

A well designed and run multi-centre randomised controlled trial was performed to try and determine if this intervention was of benefit (OPTIMSE: Optimisation of Cardiovascular Management to Improve Surgical Outcome).

Patients were randomised to a cardiac output–guided hemodynamic therapy algorithm for intravenous fluid and a drug to increase heart muscle contraction (the inotrope, dopexamine) during and 6 hours following surgery (intervention group) or to usual care (control group).

The primary outcome measure was the relative risk (RR) of a composite of 30-day moderate or major complications and mortality.

OPTIMSE: reported results

Focusing on the primary outcome measure, there were 158/364 (43.3%) and 134/366 (36.6%) patients with complication/mortality in the control and intervention group respectively. Numerically at least, the results appear better in the intervention group compared with controls.

Using the standard statistical approach, the relative risk (95% confidence interval) = 0.84 (0.70-1.01), p=0.07 and absolute risk difference = 6.8% (−0.3% to 13.9%), p=0.07. This is interpreted as there being insufficient evidence that the relative risk for complication/death is different to 1.0 (all analyses replicated below). The authors reasonably concluded that:

In a randomized trial of high-risk patients undergoing major gastrointestinal surgery, use of a cardiac output–guided hemodynamic therapy algorithm compared with usual care did not reduce a composite outcome of complications and 30-day mortality.

A difference does exist between the groups, but is not judged to be a sufficient difference using this conventional approach.

OPTIMSE: Bayesian analysis

Repeating the same analysis using Bayesian inference provides an alternative way to think about this result. What are the chances the two groups actually do have different results? What are the chances that the two groups have clinically meaningful differences in results? What proportion of patients stand to benefit from the new intervention compared with usual care?

With regard to prior knowledge, this analysis will not presume any prior information. This makes the point that prior information is not always necessary to draw a robust conclusion. It may be very reasonable to use results from pre-existing meta-analyses to specify a weak prior, but this has not been done here. Very grateful to John Kruschke for the excellent scripts and book, Doing Bayesian Data Analysis.

The results of the analysis are presented in the graph below. The top panel is the prior distribution. All proportions for the composite outcome in both the control and intervention group are treated as equally likely.

The middle panel contains the main findings. This is the posterior distribution generated in the analysis for the relative risk of the composite primary outcome (technical details in script below).

The mean relative risk = 0.84 which as expected is the same as the frequentist analysis above. Rather than confidence intervals, in Bayesian statistics a credible interval or region is quoted (HDI = highest density interval is the same). This is philosphically different to a confidence interval and says:

Given the observed data, there is a 95% probability that the true RR falls within this credible interval.

This is a subtle distinction to the frequentist interpretation of a confidence interval:

Were I to repeat this trial multiple times and compute confidence intervals, there is a 95% probability that the true RR would fall within these confidence intervals.

This is an important distinction and can be extended to make useful probabilistic statements about the result.

The figures in green give us the proportion of the distribution above and below 1.0. We can therefore say:

The probability that the intervention group has a lower incidence of the composite endpoint is 97.3%.

It may be useful to be more specific about the size of difference between the control and treatment group that would be considered equivalent, e.g. 10% above and below a relative risk = 1.0. This is sometimes called the region of practical equivalence (ROPE; red text on plots). Experts would determine what was considered equivalent based on many factors. We could therefore say:

The probability of the composite end-point for the control and intervention group being equivalent is 22%.

Or, the probability of a clinically relevant difference existing in the composite endpoint between control and intervention groups is 78%

optimise_primary_bayesFinally, we can use the 200 000 estimates of the probability of complication/death in the control and intervention groups that were generated in the analysis (posterior prediction). In essence, we can act like these are 2 x 200 000 patients. For each “patient pair”, we can use their probability estimates and perform a random draw to simulate the occurrence of complication/death. It may be useful then to look at the proportion of “patients pairs” where the intervention patient didn’t have a complication but the control patient did:

Using posterior prediction on the generated Bayesian model, the probability that a patient in the intervention group did not have a complication/death when a patient in the control group did have a complication/death is 28%.

Conclusion

On the basis of a standard statistical analysis, the OPTIMISE trial authors reasonably concluded that the use of the intervention compared with usual care did not reduce a composite outcome of complications and 30-day mortality.

Using a Bayesian approach, it could be concluded with 97.3% certainty that use of the intervention compared with usual care reduces the composite outcome of complications and 30-day mortality; that with 78% certainty, this reduction is clinically significant; and that in 28% of patients where the intervention is used rather than usual care, complication or death may be avoided.

# OPTIMISE trial in a Bayesian framework
# JAMA. 2014;311(21):2181-2190. doi:10.1001/jama.2014.5305
# Ewen Harrison
# 15/02/2015

# Primary outcome: composite of 30-day moderate or major complications and mortality
N1 <- 366
y1 <- 134
N2 <- 364
y2 <- 158
# N1 is total number in the Cardiac Output–Guided Hemodynamic Therapy Algorithm (intervention) group
# y1 is number with the outcome in the Cardiac Output–Guided Hemodynamic Therapy Algorithm (intervention) group
# N2 is total number in usual care (control) group
# y2 is number with the outcome in usual care (control) group

# Risk ratio
(y1/N1)/(y2/N2)

library(epitools)
riskratio(c(N1-y1, y1, N2-y2, y2), rev="rows", method="boot", replicates=100000)

# Using standard frequentist approach
# Risk ratio (bootstrapped 95% confidence intervals) = 0.84 (0.70-1.01) 
# p=0.07 (Fisher exact p-value)

# Reasonably reported as no difference between groups.

# But there is a difference, it just not judged significant using conventional
# (and much criticised) wisdom.

# Bayesian analysis of same ratio
# Base script from John Krushcke, Doing Bayesian Analysis

#------------------------------------------------------------------------------
source("~/Doing_Bayesian_Analysis/openGraphSaveGraph.R")
source("~/Doing_Bayesian_Analysis/plotPost.R")
require(rjags) # Kruschke, J. K. (2011). Doing Bayesian Data Analysis, Academic Press / Elsevier.
#------------------------------------------------------------------------------
# Important
# The model will be specified with completely uninformative prior distributions (beta(1,1,).
# This presupposes that no pre-exisiting knowledge exists as to whehther a difference
# may of may not exist between these two intervention. 

# Plot Beta(1,1)
# 3x1 plots
par(mfrow=c(3,1))
# Adjust size of prior plot
par(mar=c(5.1,7,4.1,7))
plot(seq(0, 1, length.out=100), dbeta(seq(0, 1, length.out=100), 1, 1), 
         type="l", xlab="Proportion",
         ylab="Probability", 
         main="OPTIMSE Composite Primary Outcome\nPrior distribution", 
         frame=FALSE, col="red", oma=c(6,6,6,6))
legend("topright", legend="beta(1,1)", lty=1, col="red", inset=0.05)

# THE MODEL.
modelString = "
# JAGS model specification begins here...
model {
# Likelihood. Each complication/death is Bernoulli. 
for ( i in 1 : N1 ) { y1[i] ~ dbern( theta1 ) }
for ( i in 1 : N2 ) { y2[i] ~ dbern( theta2 ) }
# Prior. Independent beta distributions.
theta1 ~ dbeta( 1 , 1 )
theta2 ~ dbeta( 1 , 1 )
}
# ... end JAGS model specification
" # close quote for modelstring

# Write the modelString to a file, using R commands:
writeLines(modelString,con="model.txt")


#------------------------------------------------------------------------------
# THE DATA.

# Specify the data in a form that is compatible with JAGS model, as a list:
dataList =  list(
    N1 = N1 ,
    y1 = c(rep(1, y1), rep(0, N1-y1)),
    N2 = N2 ,
    y2 = c(rep(1, y2), rep(0, N2-y2))
)

#------------------------------------------------------------------------------
# INTIALIZE THE CHAIN.

# Can be done automatically in jags.model() by commenting out inits argument.
# Otherwise could be established as:
# initsList = list( theta1 = sum(dataList$y1)/length(dataList$y1) , 
#                   theta2 = sum(dataList$y2)/length(dataList$y2) )

#------------------------------------------------------------------------------
# RUN THE CHAINS.

parameters = c( "theta1" , "theta2" )     # The parameter(s) to be monitored.
adaptSteps = 500              # Number of steps to "tune" the samplers.
burnInSteps = 1000            # Number of steps to "burn-in" the samplers.
nChains = 3                   # Number of chains to run.
numSavedSteps=200000           # Total number of steps in chains to save.
thinSteps=1                   # Number of steps to "thin" (1=keep every step).
nIter = ceiling( ( numSavedSteps * thinSteps ) / nChains ) # Steps per chain.
# Create, initialize, and adapt the model:
jagsModel = jags.model( "model.txt" , data=dataList , # inits=initsList , 
        n.chains=nChains , n.adapt=adaptSteps )
# Burn-in:
cat( "Burning in the MCMC chain...\n" )
update( jagsModel , n.iter=burnInSteps )
# The saved MCMC chain:
cat( "Sampling final MCMC chain...\n" )
codaSamples = coda.samples( jagsModel , variable.names=parameters , 
        n.iter=nIter , thin=thinSteps )
# resulting codaSamples object has these indices: 
#   codaSamples[[ chainIdx ]][ stepIdx , paramIdx ]

#------------------------------------------------------------------------------
# EXAMINE THE RESULTS.

# Convert coda-object codaSamples to matrix object for easier handling.
# But note that this concatenates the different chains into one long chain.
# Result is mcmcChain[ stepIdx , paramIdx ]
mcmcChain = as.matrix( codaSamples )

theta1Sample = mcmcChain[,"theta1"] # Put sampled values in a vector.
theta2Sample = mcmcChain[,"theta2"] # Put sampled values in a vector.

# Plot the chains (trajectory of the last 500 sampled values).
par( pty="s" )
chainlength=NROW(mcmcChain)
plot( theta1Sample[(chainlength-500):chainlength] ,
            theta2Sample[(chainlength-500):chainlength] , type = "o" ,
            xlim = c(0,1) , xlab = bquote(theta[1]) , ylim = c(0,1) ,
            ylab = bquote(theta[2]) , main="JAGS Result" , col="skyblue" )

# Display means in plot.
theta1mean = mean(theta1Sample)
theta2mean = mean(theta2Sample)
if (theta1mean > .5) { xpos = 0.0 ; xadj = 0.0
} else { xpos = 1.0 ; xadj = 1.0 }
if (theta2mean > .5) { ypos = 0.0 ; yadj = 0.0
} else { ypos = 1.0 ; yadj = 1.0 }
text( xpos , ypos ,
            bquote(
                "M=" * .(signif(theta1mean,3)) * "," * .(signif(theta2mean,3))
            ) ,adj=c(xadj,yadj) ,cex=1.5  )

# Plot a histogram of the posterior differences of theta values.
thetaRR = theta1Sample / theta2Sample # Relative risk
thetaDiff = theta1Sample - theta2Sample # Absolute risk difference

par(mar=c(5.1, 4.1, 4.1, 2.1))
plotPost( thetaRR , xlab= expression(paste("Relative risk (", theta[1]/theta[2], ")")) , 
    compVal=1.0, ROPE=c(0.9, 1.1),
    main="OPTIMSE Composite Primary Outcome\nPosterior distribution of relative risk")
plotPost( thetaDiff , xlab=expression(paste("Absolute risk difference (", theta[1]-theta[2], ")")) ,
    compVal=0.0, ROPE=c(-0.05, 0.05),
    main="OPTIMSE Composite Primary Outcome\nPosterior distribution of absolute risk difference")

#-----------------------------------------------------------------------------
# Use posterior prediction to determine proportion of cases in which 
# using the intervention would result in no complication/death 
# while not using the intervention would result in complication death 

chainLength = length( theta1Sample )

# Create matrix to hold results of simulated patients:
yPred = matrix( NA , nrow=2 , ncol=chainLength ) 

# For each step in chain, use posterior prediction to determine outcome
for ( stepIdx in 1:chainLength ) { # step through the chain
    # Probability for complication/death for each "patient" in intervention group:
    pDeath1 = theta1Sample[stepIdx]
    # Simulated outcome for each intervention "patient"
    yPred[1,stepIdx] = sample( x=c(0,1), prob=c(1-pDeath1,pDeath1), size=1 )
    # Probability for complication/death for each "patient" in control group:
    pDeath2 = theta2Sample[stepIdx]
    # Simulated outcome for each control "patient"
    yPred[2,stepIdx] = sample( x=c(0,1), prob=c(1-pDeath2,pDeath2), size=1 )
}

# Now determine the proportion of times that the intervention group has no complication/death
# (y1 == 0) and the control group does have a complication or death (y2 == 1))
(pY1eq0andY2eq1 = sum( yPred[1,]==0 & yPred[2,]==1 ) / chainLength)
(pY1eq1andY2eq0 = sum( yPred[1,]==1 & yPred[2,]==0 ) / chainLength)
(pY1eq0andY2eq0 = sum( yPred[1,]==0 & yPred[2,]==0 ) / chainLength)
(pY10eq1andY2eq1 = sum( yPred[1,]==1 & yPred[2,]==1 ) / chainLength)

# Conclusion: in 27% of cases based on these probabilities,
# a patient in the intervention group would not have a complication,
# when a patient in control group did. 

House of God and bile leak after liver resection

While death after liver resection is reported at ever lower levels, complication rates remain stubbornly high. Morbidity is associated with longer intensive care and hospital stay, and poorer oncological outcomes. Variability in the reported rate of complications may partly be due to differences in definitions. The International Study Group for Liver Surgery (ISGLS) has now published definitions in three areas: liver failure and haemorrhage after hepatectomy, and bile leak after liver and pancreas surgery. These have stimulated debate and different predictive models vie for supremacy. In HPB January 2015, the ISGLS use their definition and grading system to prospectively evaluate bile leak after liver resection.

Of 949 patients in 11 centres undergoing liver resection for predominately colorectal liver metastases, 7.3% were diagnosed with a bile leak. Of these, just over half required something done about it. “If you don’t take a temperature you can’t find a fever”, a medical truism from Samuel Shem’s 1978 novel The House of God, equally applies here: grade A bile leaks requiring no/little change in patients’ management are only diagnosed in the presence of an abdominal drain. Of course, a patient without a drain found to have a bile leak, by definition, has a grade B leak. Yet, even in those with seemingly inconsequential grade A bile leaks, a greater number and severity of other complications were seen, together with a longer hospital stay (median 14 versus 7 days on average). Indeed, bile leak was significantly associated with intra-operative blood loss which may explain these poor outcomes.

There is little strong evidence supporting drainage after liver resection, yet in this series drains were used in 64% of patients. In nearly half of patients with a bile leak and a drain, there was no significant change in the clinical course; the authors suggest that up to 94% of patients did not benefit from intra-operative drainage.

In this up-to-date series, the overall complication rate of 38% is striking. Although only 8.8% of complications were classified as severe, this rate is not improving. Interventions to reduce this rate should surely be a priority in seeking to improve long-term liver resection outcomes.

From HPB January 2015

Considerations in the Early Termination of Clinical Trials in Surgery

One of the most difficult situations when running a clinical trial is the decision to terminate the trial early. But it shouldn’t be a difficult decision. With clear stopping rules defined before the trial starts, it should be straightforward to determine when the effect size is large enough that no further patients require to be randomised to definitively answer the question.

Whether there is benefit to leaving a temporary plastic tube drain in the belly after an operation to remove the head of the pancreas is controversial. It may help diagnose and treat the potential disaster that occurs when the join between pancreas and bowel leaks. Others think that the presence of the drain may in fact make a leak more likely.

This question was tackled in an important randomised clinical trial.

A randomised prospective multicenter trial of pancreaticoduodenectomy with and without routine intraperitoneal drainage

The trial was stopped early because there were more deaths in the group who didn’t have a drain. The question that remains: was it the absence of the drain which caused the deaths? As important, was stopping the trial at this point the correct course of action?

My feeling, the lack of a drain was not definitively demonstrated to be the cause of the deaths. And I think the trial was stopped too early. Difficult issues discussed in our letter in Annals of Surgery about it.

Ethics and statistics collide in decisions relating to the early termination of clinical trials. Investigators have a fundamental responsibility to stop a trial where an excess of harm is seen in one of the arms. Decisions on stopping are not straightforward and must balance the potential risk to trial patients against the likelihood that in fact there is no difference in outcome between groups. Indeed, in early termination, the potential loss of generalizable knowledge may itself harm future patients.

We therefore read with interest the article by Van Buren and colleagues (1) and congratulate the authors on the first multicenter randomized trial on the controversial topic of surgical drains after pancreaticoduodenectomy. As the authors report, the trial was stopped by the Data Safety Monitoring Board after only 18% recruitment due to a numerical excess of deaths in the “no-drain” arm.

We would be interested in learning from the process that led to the decision to terminate the trial. A common method to monitor adverse events advocated by the CONSORT group is to define formal sequential stopping rules based on the limit of acceptable adverse event rates (2). These guidelines suggest that authors report the number of planned “looks” at the data, the statistical methods used including any formal stopping rules, and whether these were planned before trial commencement.

This information is often not included in published trial reports, even when early termination has occurred (3). We feel that in the context of important surgical trials, these guidelines should be adhered to.

Early termination can reduce the statistical power of a trial. This can be addressed by examining results as data accumulate, preferably by an independent data monitoring committee. However, performing multiple statistical examinations of accumulating data without appropriate correction can lead to erroneous results and interpretation (4). For example, if accumulating data from a trial are examined at 5 interim analyses that use a P value of 0.05, the overall false-positive rate is nearer to 19% than to the nominal 5%.

Several group sequential statistical methods are available to adjust for multiple analyses (5,6) and their use should be prespecified in the trial protocol. Stopping rules may be formed by 2 broad methods, either using a Bayesian approach to evaluate the proportion of patients with adverse effects or using a hypothesis testing approach with a sequential probability ratio test to determine whether the acceptable adverse effects rate has been exceeded. Data are compared at each interim analysis and decisions based on prespecified criteria. As an example, stopping rules for harm from a recent study used modified Haybittle-Peto boundaries of 3 SDs in the first half of the study and 2 SDs in the second half (7). The study of Van Buren and colleagues is reported to have been stopped after 18% recruitment due to an excess of 6 deaths in the “no-drain” arm. The relative risk of death at 90 days in the “no-drain” group versus the “drain” group was 3.94 (95% confidence interval, 0.87–17.90), equivalent to a difference of 1.78 SD. The primary outcome measure was any grade 2 complication or more and had a relative risk of 1.32 (5% confidence interval, 1.00–1.75), or 1.95 SD.

The decision to terminate a trial early is not based on statistics alone. Judgements must be made using all the available evidence, including the biological and clinical plausibility of harm and the findings of previous studies. Statistical considerations should therefore be used as a starting point for decisions, rather than a definitive rule.

The Data Safety Monitoring Board for the study of Van Buren and colleagues clearly felt that there was no option other than to terminate the trial. However, at least on statistical grounds, this occurred very early in the trial using conservative criteria. The question remains therefore is the totality of evidence convincing that the question posed has been unequivocally answered? We would suggest that this is not the case. In general terms, stopping a clinical trial early is a rare event that sends out a message that, because of the “sensational” effect, may have greater impact on the medical community than intended, making future studies in that area challenging.

1. Van Buren G, Bloomston M, Hughes SJ, et al. A randomised prospective multicenter trial of pancreaticoduodenectomy with and without routine intraperitoneal drainage. Ann Surg. 2014;259: 605–612.

2. Moher D, Hopewell S, Schulz KF, et al. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trial. BMJ. 2010;340:c869.

3. Montori VM, Devereaux PJ, Adhikari NK, et al. Randomized trials stopped early for benefit: a systematic review. JAMA. 2005;294:2203–2209.

4. Geller NL, Pocock SJ. Interim analyses in randomized clinical trials: ramifications and guidelines for practitioners. Biometrics. 1987;43:213–223.

5. Pocock SJ. When to stop a clinical trial. BMJ. 1992;305:235–240.

6. Berry DA. Interim analyses in clinical trials: classical vs. Bayesian approaches. Stat Med. 1985;4:521– 526.

7. Connolly SJ, Pogue J, Hart RG, et al. Effect of clopidogrel added to aspirin in patients with atrial fibrillation. N Engl J Med. 2009;360:2066– 2078.

Adverse outcomes demand clear justification when introducing new surgical procedure

The introduction of new surgical procedures is fraught with difficulty. Determining that a procedure is safe to perform while surgeons are still learning how to do it has obvious problems. Comparing a new procedure to existing treatments requires the surgery to be performed on a scale rarely available at early stages of development. The IDEAL framework helps greatly with this process.

When performing liver surgery, it is crucial that sufficient liver is left behind at the end of the operation to do the necessary job of the liver. This is particularly important in the first days and weeks following surgery. When disease demands that a large proportion of the liver is removed, manoeuvres can be performed before surgery to increase the size of the liver left behind. The disease is invariably cancer and the manoeuvres usually involves blocking the vein supplying the part of the liver to be removed, a procedure called portal vein embolisation. This causes the liver to think part of it has already been removed. The part which will stay behind after surgery increases in size, hopefully sufficient to do the job of the liver after surgery. This often works but does require a delay in definitive surgery and in some patients does not work sufficiently well.

An alternative procedure has come to the fore recently. The ALPPS procedure (Associating Liver Partition and Portal vein Ligation for Staged Liver resection) combines this embolisation procedure with an operation to cut the liver along the line required to remove the diseased portion. But after making the cut, the operation is stopped and the patient woken up. Over the course of the following week the liver being left behind increases in size – quicker and more effectively say proponents of the ALPPS procedure. After a week, the patient is taken back to the operating room and the disease liver portion removed.

So should we start using the procedure to treat cancer which is widely spread in the liver?

The difficulty is knowing whether the new procedure is safe and effective. Early results suggest quite a high mortality associated with the procedure. But of course for patients with untreated cancer in the liver who do not have surgery, the mortality rate is high.

A study has been published which contains some positive data: ALPPS offers a better chance of complete resection in patients with primarily unresectable liver tumors compared with conventional-staged hepatectomies: results of a multicenter analysis.

However, it is still my feeling that the results of the procedure are not good and the traditional portal vein embolisation procedure seems to work well in our patients. Here is our letter with our concerns in response.

We read with interest the multicenter study by Schadde and colleagues in the April issue regarding the novel procedure of Associating Liver Partition and Portal vein Ligation for Staged Liver resection (ALPPS) [1]. Since the initial description 2 years ago [2] ALPPS has gained popularity as a surgical option for treating patients with advanced liver lesions not considered amenable to conventional two-stage or future liver remnant-enhancing procedures propagated by Rene Adam et al. [3] a decade ago. Indeed, the explosion of interest in ALPPS by surgeons and its adoption as a procedure of choice is concerning, given that the procedure appears to come with considerable cost to the patient, as shown in this study. The increased severe morbidity of 27 versus 15 % and the mortality of 15 versus 6 % may not achieve traditional measures of statistical significance in this study, but the effect size is concerning, and the direction of effect is consistent across outcome measures and studies. Is ALPPS in its current form safe enough for the widespread adoption that has occurred given increasingly effective nonsurgical approaches, including ablation, chemotherapy, selective internal radiation therapy [4], and growth factor/receptor inhibition?

As the authors rightly point out, the risk of selection bias is significant given the study design. It is unclear whether the logistic regression analysis adequately adjusts for the imbalance in baseline risk in favor of the ALPPS group: why, for instance, was operative risk (ASA grade) not controlled for in the multivariate analysis?

One of the potential benefits of a two-stage procedure is that it may disclose biologically unfavorable disease. By its very nature, ALPPS does not lend itself to such selection given the short time interval between the first and second stages. The authors appear to reject this argument, citing a similar overall recurrence rate seen in this study. We were puzzled with this position given that the study highlights an interesting observation: in the PVE/PVL group 11 % of patients had systemic progression prior to the second stage. Presumably this group of patients would not have benefitted from ALPPS.

In our practice, patients who may be deemed by others to be ideal candidates for ALPPS are seldom not amenable to either a two-stage liver resection or a single-stage resection with prior volume-enhancing maneuvers. Indeed, it is difficult to understand why an ALPPS approach was used at all in some of the cases presented at recent international conferences. We wonder what proportion and kind of patients with advanced liver lesions would really benefit from the ALPPS approach. The international ALPPS registry will perhaps provide clearer evidence for the role of this challenging approach to liver resection.

1. Schadde E, Ardiles V, Slankamenac K et al (2014) ALPPS offers a better chance of complete resection in patients with primarily unresectable liver tumors compared with conventional-staged hepatectomies: results of a multicenter analysis. World J Surg 38:1510–1519. doi:10.1007/s00268-014-2513-3

2. Schnitzbauer AA, Lang SA, Goessmann H et al (2012) Right portal vein ligation combined with in situ splitting induces rapid left lateral liver lobe hypertrophy enabling 2-staged extended right hepatic resection in small-for-size settings. Ann Surg 255:405–414

3. Adam R, Delvart V, Pascal G et al (2004) Rescue surgery for unresectable colorectal liver metastases downstaged by chemotherapy: a model to predict long-term survival. Ann Surg 240:644–657 discussion 657–658

4. Gulec SA, Pennington K, Wheeler J et al (2013) Yttrium-90 microsphere-selective internal radiation therapy with chemotherapy (chemo-SIRT) for colorectal cancer liver metastases: an in vivo double-arm-controlled phase II trial. Am J Clin Oncol 36:455–460

 

[gview file=”http://www.datasurg.net/wp-content/uploads/2014/11/Rohatgi-et-al.-2014-ALPPS-Adverse-Outcomes-Demand-Clear-Justification.pdf”]

GlobalSurg recruitment starting soon

I’m excited to be involved in an enthusiastic young collaborative called GlobalSurg. Research in surgery has been a predominately first world affair and it is absolutely essential to see international collaboration including developing nations. Our study focusses on emergency abdominal surgery and complements a similar initiative looking at elective abdominal surgery, ISOS.

Why this study and why now? Surgery has been referred to as the neglected step child of global public health, a sentiment I completely agree with. Diseases effectively treated with surgery are becoming the public health priority for developing nations, a fact highlighted by the excellent International Collaboration for Essential Surgery (ICES) and important Right to Heal campaign.

Least wealthy countries account for 35% of the global population yet undertook only 3.5% of all surgical procedures in 2004.

This GlobalSurg project aims to be the first of many. It will establish what happens to patients across the world after emergency abdominal surgery

The primary outcome measure here is pragmatic: which patients are still alive 24 h following emergency surgery? A number of secondary measures will provide depth. Case mix will be determined as far as is possible and an analysis of facilities included.globalsurg_contact_250

Anyone can still get involved in GlobalSurg and I would encourage you to do so. We have everyone from professors of surgery in large first-world urban centres to small community hospitals in developing countries.

It only requires data collection over any two week period in July-November 2014. Patients are easy to identify and there are only 30 patient related data-points to collect. Data can be collected on paper or directly into our REDCap system, which I will write more about in the future.

 

Introduction of Surgical Safety Checklists in Ontario, Canada – don’t blame the study size

The recent publication of the Ontario experience in the introduction of Surgical Safety Checklists has caused a bit of a stooshie.

Checklists have consistently been shown to be associated with a reduction in death and complications following surgery. Since the publication of Atul Gawande’s seminal paper in 2009, checklists have been successfully introduced in a number of countries including Scotland. David Urbach and Nancy Baxter’s New England Journal of Medicine publication stands apart: the checklist made no difference.

Atul Gawande himself responded quickly asking two important questions. Firstly, were there sufficient patients included in the study to show a difference? Secondly, was the implementation robust and was the programme in place for long enough to expect a difference be seen.

He and others have reported the power of the study to be low – about 40% – meaning that were the study to be repeated multiple times and a true difference in mortality actually did exist, the chance of detecting it would be 40%. But power calculations performed after the event (post hoc) are completely meaningless – when no effect is seen in a study, the power is low by definition (mathsy explanation here).

There is no protocol provided with the Ontario study, so it is not clear if an estimate of the required sample size had been performed. Were it done, it may have gone something like this.

The risk of death in the Ontario population is 0.71%. This could have been determined from the same administrative dataset that the study used. Say we expect a similar reduction in death following checklist introduction as Gawande showed in 2009, 1.5% to 0.8%. For the Ontario population, this would be equivalent to an expected risk of death of 0.38%. This may or may not be reasonable. It is not clear that the “checklist effect” is the same across patients or procedures of different risks. Accepting this assumption for now, the study would have only required around 8000 patients per group to show a significant difference. The study actually included over 100000 patients per group. In fact, it was powered to show very small differences in the risk of death – a reduction of around 0.1% would probably have been detected.

Sample size for Ontario study.

Similar conclusions can be drawn for complication rate. Gawande showed a reduction from 11% to 7%, equivalent in Ontario to a reduction from 3.86% to 2.46%. The Ontario study was likely to show a reduction to 3.59% (at 90% power).

The explanation for the failure to show a difference does not lie in the numbers.

So assuming then that checklists do work, this negative result stems either from a failure of implementation – checklists were not being used or not being used properly – or a difference in the effect of checklists in this population. The former seems most likely. The authors report that …

… available data did not permit us to determine whether a checklist was used in a particular procedure, and we were unable to measure compliance with checklists at monthly intervals in our analysis. However, reported compliance with checklists is extraordinarily high …

Quality improvement interventions need sufficient time for introduction. In this study, only a minimum of 3 months was allowed which seems crazily short. Teams need to want to do it. In my own hospital there was a lot of grumbling (including from me) before acceptance. When I worked in the Netherlands, SURPASS was introduced. In this particular hospital it was delivered via the electronic patient record. A succession of electronic “baton passes” meant that a patient could not get to the operating theatre without a comprehensive series of checklists being completed. I like this use of technology to deliver safety. With robust implementation, training, and acceptance by staff, maybe the benefits of checklists will also be seen in Ontario.

Hepatitis C virus, tumour and liver transplantation

HCV virus exploding (iStock)

From my HPB highlights this month.

Do patients with hepatocellular carcinoma (HCC) on a background of hepatitis C virus (HCV) have worse outcomes after liver transplantation than non-HCV patients? This relatively straightforward question continues to vex and published studies are contradictory. Molecular features of HCC which are associated with aggressive behaviour are up-regulated in the presence of HCV, providing a biological mechanism to support the hypothesis. The theory is borne out in early single centre studies, but the largest published analysis using the United Network for Organ Sharing database published by Thuluvath in 2009 contradicted these. HCV+ patients were shown to have a lower survival rate than HCV- patients, regardless of their HCC status. This is to be expected. However, HCV had no additional negative impact on survival in patients with HCC

In this edition of HPB, Dumitra and colleagues describe a further single-centre study from Montreal. They conclude that HCC+/HCV+ patients have a significantly worse outcome than those with HCC or HCV alone. So why the contradiction? It may be that length of follow-up is important. This study provides survival curves out to 10 years. A cluster of deaths after 5 years in the HCV+/HCC+ group results in a significantly worse outcome in this group, although the number-at-risk are low. However, loss to follow-up is an unusually low 1.2% and explant pathology is available for almost all patients – detail not often available in studies using administrative databases. In a multivariable analysis controlling for recipient age, gender, MELD score and donor risk index (DRI), the combined effect of HCC+/HCV+ gives a hazard twice that of HCC+/HCV-.

HCV graft infection after liver transplantation is universal and the course of recurrent cirrhosis accelerated. Controlling HCV recurrence with newer antiviral agents will improve long-term survival and this study suggests the possibility of additional benefits in HCC+/HCV+ patients. Other modifiable variables such as donor age and DRI are unlikely to have an impact, given HCC patients rarely have the luxury of a wide choice of donor grafts.

Images in operative ultrasound

Mickey Mouse and the tubes connecting the liver

In liver surgery, it’s often important to know the exact layout of the connections the liver has to the rest of the body. Here are some images which hopefully make it clear. The liver is unusual because it has two blood supplies. The first is an an artery, the hepatic artery, which carries oxygen to the liver. The other is the portal vein which carries blood from the guts to the liver and contains the nutrients from food. The portal vein carries 3 times as much blood as the artery and is not to be messed with – 34% of patients with a portal vein injury do not survive.

The other important tube is the bile duct. This drains bile from the liver to the guts. If it gets blocked – by a gallstone or cancer – the patient becomes jaundiced (the skin going yellow).

We use an ultrasound machine to visualise the vessels and the bile duct. It can be tricky and difficult to interpret. The boss has a good technique for getting orientated – the Mickey Mouse sign. When seen in the transverse plane – imagine sitting at the patient’s feet looking up through the body towards the head – the large portal vein with the artery and bile duct in front looks like Mickey. I use this technique every time.

Intraoperative ultrasound to portal pedicle. Patient consent for publication obtained.
Intraoperative ultrasound to portal pedicle. Patient consent for publication obtained.
Portal pedicle on ultrasound. CHD, common hepatic duct; LHA, left hepatic artery; RHA, right hepatic artery; PV, portal vein; CBD, common bile duct; PHA, proper hepatic artery; aRHA, accessory right hepatic artery (if present); CHA, common hepatic artery; GDA, gastroduodenal artery; SMV/SV, superior mesenteric vein / splenic vein
Portal pedicle on ultrasound. CHD, common hepatic duct; LHA, left hepatic artery; RHA, right hepatic artery; PV, portal vein; CBD, common bile duct; PHA, proper hepatic artery; aRHA, accessory right hepatic artery (if present); CHA, common hepatic artery; GDA, gastroduodenal artery; SMV/SV, superior mesenteric vein / splenic vein

Tweets of Surgical Colleges – what does it say about them?

What do the UK and Ireland Surgical Royal Colleges tweet about and how do they compare to the American College of Surgeons?

Twitter allow retrieval of the last 3200 tweets of a given user. Here are all tweets ever sent by the Royal Colleges a few days ago. The American College has tweeted over 6000 times, so only the latest 3200 are included. The Glasgow College is just getting going.

There is a bit of processing first. Charts are generated after removal of “stop words” – all the little words that go in between. Words then have common endings removed (e.g. -ing; stemming) and the most common ending for the group replaced (stem completion).

So what can be said? I was interested in whether Colleges tweet about training. I was pleased to see that the UK colleges do – a fair amount. Terms that are associated with training were less apparent in tweets from the RCSI and ACS.

Frequency of words in tweets from five surgical colleges
Frequency of words in tweets from five surgical colleges

The figures below show clustering of terms within tweets, with term frequency increasing from left to right. There are some nice themes that emerge. In the RCSEng tweets there are themes relating to “training”, “events”, “working time”, and “the NHS”.  Similar subjects are apparent in RCSEd tweets, with prominence of their medical students surgical skill competition and issues specifically relating to the NHS in Scotland. As the RCPSG have only started tweeting, associations are greatly influenced by individual tweets.  The RCSI’s “Transition Year Mini Med School Programme” “MiniMed School Open Lecture Series” (updated 22/04/13) can be seen together with conference promotion. The ACS appear to use Twitter to communicate issues relating to patient health improvement programmes more prominently than other Colleges.

Term association in surgical college tweets (cluster dendrogram)
Term association in surgical college tweets (cluster dendrogram)

Network plots illustrate the strength of association of terms (weight of edges) and frequency of terms (font size of vertices). Do the terms in these plots represent the core values of these organisations?

Term network of College tweets
Term network of College tweets