Finalfit, knitr and R Markdown for quick results

Thank you for the many requests to provide some extra info on how best to get `finalfit` results out of RStudio, and particularly into Microsoft Word.

Here is how.

Make sure you are on the most up-to-date version of `finalfit`.

devtools::install_github("ewenharrison/finalfit")

What follows is for demonstration purposes and is not meant to illustrate model building.

Does a tumour characteristic (differentiation) predict 5-year survival?

Demographics table

First explore variable of interest (exposure) by making it the dependent.

library(finalfit)
library(dplyr)

dependent = "differ.factor"

# Specify explanatory variables of interest
explanatory = c("age", "sex.factor", 
  "extent.factor", "obstruct.factor", 
  "nodes")

Note this useful alternative way of specifying explanatory variable lists:

colon_s %>% 
  select(age, sex.factor, 
  extent.factor, obstruct.factor, nodes) %>% 
  names() -> explanatory

Look at associations between our exposure and other explanatory variables. Include missing data.

colon_s %>% 
  summary_factorlist(dependent, explanatory, 
  p=TRUE, na_include=TRUE)
 
              label              levels        Well    Moderate       Poor      p
       Age (years)           Mean (SD) 60.2 (12.8) 59.9 (11.7)  59 (12.8)  0.788
               Sex              Female   51 (11.6)  314 (71.7)  73 (16.7)  0.400
                                  Male    42 (9.0)  349 (74.6)  77 (16.5)       
  Extent of spread           Submucosa    5 (25.0)   12 (60.0)   3 (15.0)  0.081
                                Muscle   12 (11.8)   78 (76.5)  12 (11.8)       
                                Serosa   76 (10.2)  542 (72.8) 127 (17.0)       
                   Adjacent structures     0 (0.0)   31 (79.5)   8 (20.5)       
       Obstruction                  No    69 (9.7)  531 (74.4) 114 (16.0)  0.110
                                   Yes   19 (11.0)  122 (70.9)  31 (18.0)       
                               Missing    5 (25.0)   10 (50.0)   5 (25.0)       
             nodes           Mean (SD)   2.7 (2.2)   3.6 (3.4)  4.7 (4.4) <0.001
Warning messages:
1: In chisq.test(tab, correct = FALSE) :
  Chi-squared approximation may be incorrect
2: In chisq.test(tab, correct = FALSE) :
  Chi-squared approximation may be incorrect
 

Note missing data in `obstruct.factor`. We will drop this variable for now (again, this is for demonstration only). Also see that `nodes` has not been labelled.
There are small numbers in some variables generating chisq.test warnings (predicted less than 5 in any cell). Generate final table.

Hmisc::label(colon_s$nodes) = "Lymph nodes involved"
explanatory = c("age", "sex.factor", 
  "extent.factor", "nodes")

colon_s %>% 
  summary_factorlist(dependent, explanatory, 
  p=TRUE, na_include=TRUE, 
  add_dependent_label=TRUE) -> table1
table1
 
 
 Dependent: Differentiation                            Well    Moderate       Poor      p
                Age (years)           Mean (SD) 60.2 (12.8) 59.9 (11.7)  59 (12.8)  0.788
                        Sex              Female   51 (11.6)  314 (71.7)  73 (16.7)  0.400
                                           Male    42 (9.0)  349 (74.6)  77 (16.5)       
           Extent of spread           Submucosa    5 (25.0)   12 (60.0)   3 (15.0)  0.081
                                         Muscle   12 (11.8)   78 (76.5)  12 (11.8)       
                                         Serosa   76 (10.2)  542 (72.8) 127 (17.0)       
                            Adjacent structures     0 (0.0)   31 (79.5)   8 (20.5)       
       Lymph nodes involved           Mean (SD)   2.7 (2.2)   3.6 (3.4)  4.7 (4.4) <0.001
 

Logistic regression table

Now examine explanatory variables against outcome. Check plot runs ok.

 
explanatory = c("age", "sex.factor", 
  "extent.factor", "nodes", 
  "differ.factor")
dependent = "mort_5yr"
colon_s %>% 
  finalfit(dependent, explanatory, 
  dependent_label_prefix = "") -> table2
 
 
     Mortality 5 year                           Alive        Died           OR (univariable)         OR (multivariable)
          Age (years)           Mean (SD) 59.8 (11.4) 59.9 (12.5)  1.00 (0.99-1.01, p=0.986)  1.01 (1.00-1.02, p=0.195)
                  Sex              Female  243 (47.6)  194 (48.0)                          -                          -
                                     Male  268 (52.4)  210 (52.0)  0.98 (0.76-1.27, p=0.889)  0.98 (0.74-1.30, p=0.885)
     Extent of spread           Submucosa    16 (3.1)     4 (1.0)                          -                          -
                                   Muscle   78 (15.3)    25 (6.2)  1.28 (0.42-4.79, p=0.681)  1.28 (0.37-5.92, p=0.722)
                                   Serosa  401 (78.5)  349 (86.4) 3.48 (1.26-12.24, p=0.027) 3.13 (1.01-13.76, p=0.076)
                      Adjacent structures    16 (3.1)    26 (6.4) 6.50 (1.98-25.93, p=0.004) 6.04 (1.58-30.41, p=0.015)
 Lymph nodes involved           Mean (SD)   2.7 (2.4)   4.9 (4.4)  1.24 (1.18-1.30, p<0.001)  1.23 (1.17-1.30, p<0.001)
      Differentiation                Well   52 (10.5)   40 (10.1)                          -                          -
                                 Moderate  382 (76.9)  269 (68.1)  0.92 (0.59-1.43, p=0.694)  0.70 (0.44-1.12, p=0.132)
                                     Poor   63 (12.7)   86 (21.8)  1.77 (1.05-3.01, p=0.032)  1.08 (0.61-1.90, p=0.796)
 

Odds ratio plot

colon_s %>% 
  or_plot(dependent, explanatory, 
  breaks = c(0.5, 1, 5, 10, 20, 30))

To MS Word via knitr/R Markdown

Important. In most R Markdown set-ups, environment objects require to be saved and loaded to R Markdown document.

# Save objects for knitr/markdown
save(table1, table2, dependent, explanatory, file = "out.rda")

We use RStudio Server Pro set-up on Ubuntu. But these instructions should work fine for most/all RStudio/Markdown default set-ups.

In RStudio, select `File > New File > R Markdown`.

A useful template file is produced by default. Try hitting `knit to Word` on the `knitr` button at the top of the `.Rmd` script window.

Now paste this into the file:

---
title: "Example knitr/R Markdown document"
author: "Ewen Harrison"
date: "22/5/2018"
output:
  word_document: default
---

```{r setup, include=FALSE}
# Load data into global environment. 
library(finalfit)
library(dplyr)
library(knitr)
load("out.rda")
```

## Table 1 - Demographics
```{r table1, echo = FALSE, results='asis'}
kable(table1, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r"))
```

## Table 2 - Association between tumour factors and 5 year mortality
```{r table2, echo = FALSE, results='asis'}
kable(table2, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r"))
```

## Figure 1 - Association between tumour factors and 5 year mortality
```{r figure1, echo = FALSE}
colon_s %>% 
  or_plot(dependent, explanatory)
```

It's ok, but not great.

Create Word template file

Now, edit the Word template. Click on a table. The `style` should be `compact`. Right click > `Modify... > font size = 9`. Alter heading and text styles in the same way as desired. Save this as `template.docx`. Upload to your project folder. Add this reference to the `.Rmd` YAML heading, as below. Make sure you get the space correct.

The plot also doesn't look quite right and it prints with warning messages. Experiment with `fig.width` to get it looking right.

Now paste this into your `.Rmd` file and run:

---
title: "Example knitr/R Markdown document"
author: "Ewen Harrison"
date: "21/5/2018"
output:
  word_document:
    reference_docx: template.docx  
---

```{r setup, include=FALSE}
# Load data into global environment. 
library(finalfit)
library(dplyr)
library(knitr)
load("out.rda")
```

## Table 1 - Demographics
```{r table1, echo = FALSE, results='asis'}
kable(table1, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r"))
```

## Table 2 - Association between tumour factors and 5 year mortality
```{r table2, echo = FALSE, results='asis'}
kable(table2, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r"))
```

## Figure 1 - Association between tumour factors and 5 year mortality
```{r figure1, echo = FALSE, warning=FALSE, message=FALSE, fig.width=10}
colon_s %>% 
  or_plot(dependent, explanatory)
```

This is now looking good for me, and further tweaks can be made.

To PDF via knitr/R Markdown

Default settings for PDF:

---
title: "Example knitr/R Markdown document"
author: "Ewen Harrison"
date: "21/5/2018"
output:
  pdf_document: default
---

```{r setup, include=FALSE}
# Load data into global environment. 
library(finalfit)
library(dplyr)
library(knitr)
load("out.rda")
```

## Table 1 - Demographics
```{r table1, echo = FALSE, results='asis'}
kable(table1, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r"))
```

## Table 2 - Association between tumour factors and 5 year mortality
```{r table2, echo = FALSE, results='asis'}
kable(table2, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r"))
```

## Figure 1 - Association between tumour factors and 5 year mortality
```{r figure1, echo = FALSE}
colon_s %>% 
  or_plot(dependent, explanatory)
```

Again, ok but not great.
[gview file="http://www.datasurg.net/wp-content/uploads/2018/05/example.pdf"]

We can fix the plot in exactly the same way. But the table is off the side of the page. For this we use the `kableExtra` package. Install this in the normal manner. You may also want to alter the margins of your page using `geometry` in the preamble.

---
title: "Example knitr/R Markdown document"
author: "Ewen Harrison"
date: "21/5/2018"
output:
  pdf_document: default
geometry: margin=0.75in
---

```{r setup, include=FALSE}
# Load data into global environment. 
library(finalfit)
library(dplyr)
library(knitr)
library(kableExtra)
load("out.rda")
```

## Table 1 - Demographics
```{r table1, echo = FALSE, results='asis'}
kable(table1, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r"),
						booktabs=TRUE)
```

## Table 2 - Association between tumour factors and 5 year mortality
```{r table2, echo = FALSE, results='asis'}
kable(table2, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r"),
			booktabs=TRUE) %>% 
	kable_styling(font_size=8)
```

## Figure 1 - Association between tumour factors and 5 year mortality
```{r figure1, echo = FALSE, warning=FALSE, message=FALSE, fig.width=10}
colon_s %>% 
  or_plot(dependent, explanatory)
```

This is now looking pretty good for me as well.
[gview file="http://www.datasurg.net/wp-content/uploads/2018/05/example2.pdf"]

There you have it. A pretty quick workflow to get final results into Word and a PDF.

P-values from random effects linear regression models

lme4::lmer is a useful frequentist approach to hierarchical/multilevel linear regression modelling. For good reason, the model output only includes t-values and doesn’t include p-values (partly due to the difficulty in estimating the degrees of freedom, as discussed here).

Yes, p-values are evil and we should continue to try and expunge them from our analyses. But I keep getting asked about this. So here is a simple bootstrap method to generate two-sided parametric p-values on the fixed effects coefficients. Interpret with caution.

library(lme4)

# Run model with lme4 example data
fit = lmer(angle ~ recipe + temp + (1|recipe:replicate), cake)

# Model summary
summary(fit)

# lme4 profile method confidence intervals
confint(fit)

# Bootstrapped parametric p-values
boot.out = bootMer(fit, fixef, nsim=1000) #nsim determines p-value decimal places 
p = rbind(
  (1-apply(boot.out$t<0, 2, mean))*2,
  (1-apply(boot.out$t>0, 2, mean))*2)
apply(p, 2, min)

# Alternative "pipe" syntax
library(magrittr)

lmer(angle ~ recipe + temp + (1|recipe:replicate), cake) %>% 
  bootMer(fixef, nsim=100) %$% 
  rbind(
  (1-apply(t<0, 2, mean))*2,
  (1-apply(t>0, 2, mean))*2) %>% 
  apply(2, min)

 

Having a low blood count increase complications from liver surgery

A low blood count is common with cancer. There are now more studies showing that this can contribute to complications after surgery. Blood transfusion increases blood count but is best avoided in cancer unless the blood count is very low. This new study in the journal HPB shows the effect of anaemia after liver surgery. Here is the editorial highlight I wrote for the journal.

Preoperative anaemia is common and affects 30-60% of patients undergoing major elective surgery. In major non-cardiac surgery, anaemia is associated with increased morbidity and mortality, as well as higher blood transfusion rates.

The importance of preoperative anaemia in liver resection patients is becoming recognised. In this issue, Tohme and colleagues present an evaluation of the American College of Surgeons’ National Surgical Quality Improvement Program (ACS-NSQIP) database.

Of around 13000 patients who underwent elective liver resection from 2005 to 2012, one third were anaemic prior to surgery. After adjustment, anaemia was associated with major complications after surgery (OR 1.21, 1.09-1.33) but not death.

Patients who are anaemic have different characteristics to those who are not, characteristics that are likely to make them more susceptible to complications. While this analysis extensively adjusts for observed factors, residual confounding almost certainly exists.

The question remains, does anaemia itself contribute to the occurrence of complications, or is it just a symptom of greater troubles? The authors rightly highlight the importance of identifying anaemia prior to surgery, but it remains to be seen whether treatment is possible and whether it will result in better patient outcomes.

Perioperative transfusion is independently associated with major complications. Although there is no additive effect in anaemic patients, the benefits of treating anaemia may be offset by the detrimental effect of transfusion. For those with iron deficiency, treatment with intravenous iron may be of use and is currently being studied in an RCT of all major surgery (preventt.lshtm.ac.uk). Results of studies such as these will help determine causal relationships and whether intervention is possible and beneficial.

An alternative presentation of the ProPublica Surgeon Scorecard

ProPublica, an independent investigative journalism organisation, have published surgeon-level complications rates based on Medicare data. I have already highlighted problems with the reporting of the data: surgeons are described as having a “high adjusted rate of complications” if they fall in the red-zone, despite there being too little data to say whether this has happened by chance.

4
This surgeon should not be identified as having a “high adjusted rate of complications” as there are too few cases to estimate the complication rate accurately.

I say again, I fully support transparency and public access to healthcare. But the ProPublica reporting has been quite shocking. I’m not aware of them publishing the number of surgeons out of the 17000 that are statistically different to the average. This is a small handful.

ProPublica could have chosen a different approach. This is a funnel plot and I’ve written about them before.

A funnel plot is a summary of an estimate (such as complication rate) against a measure of the precision of that estimate. In the context of healthcare, a centre or individual outcome is often plotted against patient volume. A horizontal line parallel to the x-axis represents the outcome for the entire population and outcomes for individual surgeons are displayed as points around this. This allows a comparison of individuals with that of the population average, while accounting for the increasing certainty surrounding that outcome as the sample size increases. Limits can be determined, beyond which the chances of getting an individual outcome are low if that individual were really part of the whole population.

In other words, a surgeon above the line has a complication rate different to the average.

I’ve scraped the ProPublica data for gallbladder removal (laparoscopic cholecystectomy) from California, New York and Texas for surgeons highlighted in the red-zone. These are surgeons ProPublica says have high complication rates.

As can be seen from the funnel plot, these surgeons are no where near being outliers. There is insufficient information to say whether any of them are different to average. ProPublica decided to ignore the imprecision with which the complication rates are determined. For red-zone surgeons from these 3 states, none of them have complication rates different to average.

ProPublica_lap_chole_funnel
Black line, population average (4.4%), blue line 95% control limit, red line 99% control limit.

How likely is it that a surgeon with an average complication rate (4.4%) will appear in the red-zone just by chance (>5.2%)? The answer is, pretty likely given the small numbers of cases here: anything up to a 25% chance depending on the number of cases performed. Even at the top of the green-zone (low ACR, 3.9%), there is still around a 1 in 6 chance a surgeon will appear to have a high complication rate just by chance.

chance_of_being_in_redzoneProPublica have failed in their duty to explain these data in a way that can be understood. The surgeon score card should be revised. All “warning explanation points” should be removed for those other than the truly outlying cases.

Data

Download

Git

Link to repository.

Code

# ProPublica Surgeon Scorecard 
# https://projects.propublica.org/surgeons

# Laparoscopic cholecystectomy (gallbladder removal) data
# Surgeons with "high adjusted rate of complications"
# CA, NY, TX only

# Libraries needed ----
library(ggplot2)
library(binom)

# Upload dataframe ----
dat = read.csv("http://www.datasurg.net/wp-content/uploads/2015/07/ProPublica_CA_NY_TX.csv")

# Total number reported
dim(dat)[1] # 59

# Remove duplicate surgeons who operate in more than one hospital
duplicates = which(
    duplicated(dat$Surgeon)
)

dat_unique = dat[-duplicates,]
dim(dat_unique) # 27

# Funnel plot for gallbladder removal adjusted complication rate -------------------------
# Set up blank funnel plot ----
# Set control limits
pop.rate = 0.044 # Mean population ACR, 4.4%
binom_n = seq(5, 100, length.out=40)
ci.90 = binom.confint(pop.rate*binom_n, binom_n, conf.level = 0.90, methods = "wilson")
ci.95 = binom.confint(pop.rate*binom_n, binom_n, conf.level = 0.95, methods = "wilson")
ci.99 = binom.confint(pop.rate*binom_n, binom_n, conf.level = 0.99, methods = "wilson")

theme_set(theme_bw(24))
g1 = ggplot()+
    geom_line(data=ci.95, aes(ci.95$n, ci.95$lower*100), colour = "blue")+ 
    geom_line(data=ci.95, aes(ci.95$n, ci.95$upper*100), colour = "blue")+
    geom_line(data=ci.99, aes(ci.99$n, ci.99$lower*100), colour = "red")+ 
    geom_line(data=ci.99, aes(ci.99$n, ci.99$upper*100), colour = "red")+
    geom_line(aes(x=ci.90$n, y=pop.rate*100), colour="black", size=1)+
    xlab("Case volume")+
    ylab("Adjusted complication rate (%)")+
    scale_colour_brewer("", type = "qual", palette = 6)+
    theme(legend.justification=c(1,1), legend.position=c(1,1))
g1

g1 + 
    geom_point(data=dat_unique, aes(x=Volume, y=ACR), colour="black", alpha=0.6, size = 6, 
                         show_guide=TRUE)+
    geom_point(data=dat_unique, aes(x=Volume, y=ACR, colour=State), alpha=0.6, size=4) +
    ggtitle("Funnel plot of adjusted complication rate in CA, NY, TX")


# Probability of being shown as having high complication rate ----
# At 4.4%, what are the changes of being 5.2% by chance?
n <- seq(15, 150, 1)
average = 1-pbinom(ceiling(n*0.052), n, 0.044)
low = 1-pbinom(ceiling(n*0.052), n, 0.039)

dat_prob = data.frame(n, average, low)

ggplot(melt(dat_prob, id="n"))+
    geom_point(aes(x=n, y=value*100, colour=variable), size=4)+
    scale_x_continuous("Case volume", breaks=seq(10, 150, 10))+
    ylab("Adjusted complication rate (%)")+
    scale_colour_brewer("True complication rate", type="qual", palette = 2, labels=c("Average (4.4%)", "Low (3.9%)"))+
    ggtitle("ProPublica chance of being in high complication rate zone by\nchance when true complication rate \"average\" or \"low\"")+
    theme(legend.position=c(1,0), legend.justification=c(1,0))

Considerations in the Early Termination of Clinical Trials in Surgery

One of the most difficult situations when running a clinical trial is the decision to terminate the trial early. But it shouldn’t be a difficult decision. With clear stopping rules defined before the trial starts, it should be straightforward to determine when the effect size is large enough that no further patients require to be randomised to definitively answer the question.

Whether there is benefit to leaving a temporary plastic tube drain in the belly after an operation to remove the head of the pancreas is controversial. It may help diagnose and treat the potential disaster that occurs when the join between pancreas and bowel leaks. Others think that the presence of the drain may in fact make a leak more likely.

This question was tackled in an important randomised clinical trial.

A randomised prospective multicenter trial of pancreaticoduodenectomy with and without routine intraperitoneal drainage

The trial was stopped early because there were more deaths in the group who didn’t have a drain. The question that remains: was it the absence of the drain which caused the deaths? As important, was stopping the trial at this point the correct course of action?

My feeling, the lack of a drain was not definitively demonstrated to be the cause of the deaths. And I think the trial was stopped too early. Difficult issues discussed in our letter in Annals of Surgery about it.

Ethics and statistics collide in decisions relating to the early termination of clinical trials. Investigators have a fundamental responsibility to stop a trial where an excess of harm is seen in one of the arms. Decisions on stopping are not straightforward and must balance the potential risk to trial patients against the likelihood that in fact there is no difference in outcome between groups. Indeed, in early termination, the potential loss of generalizable knowledge may itself harm future patients.

We therefore read with interest the article by Van Buren and colleagues (1) and congratulate the authors on the first multicenter randomized trial on the controversial topic of surgical drains after pancreaticoduodenectomy. As the authors report, the trial was stopped by the Data Safety Monitoring Board after only 18% recruitment due to a numerical excess of deaths in the “no-drain” arm.

We would be interested in learning from the process that led to the decision to terminate the trial. A common method to monitor adverse events advocated by the CONSORT group is to define formal sequential stopping rules based on the limit of acceptable adverse event rates (2). These guidelines suggest that authors report the number of planned “looks” at the data, the statistical methods used including any formal stopping rules, and whether these were planned before trial commencement.

This information is often not included in published trial reports, even when early termination has occurred (3). We feel that in the context of important surgical trials, these guidelines should be adhered to.

Early termination can reduce the statistical power of a trial. This can be addressed by examining results as data accumulate, preferably by an independent data monitoring committee. However, performing multiple statistical examinations of accumulating data without appropriate correction can lead to erroneous results and interpretation (4). For example, if accumulating data from a trial are examined at 5 interim analyses that use a P value of 0.05, the overall false-positive rate is nearer to 19% than to the nominal 5%.

Several group sequential statistical methods are available to adjust for multiple analyses (5,6) and their use should be prespecified in the trial protocol. Stopping rules may be formed by 2 broad methods, either using a Bayesian approach to evaluate the proportion of patients with adverse effects or using a hypothesis testing approach with a sequential probability ratio test to determine whether the acceptable adverse effects rate has been exceeded. Data are compared at each interim analysis and decisions based on prespecified criteria. As an example, stopping rules for harm from a recent study used modified Haybittle-Peto boundaries of 3 SDs in the first half of the study and 2 SDs in the second half (7). The study of Van Buren and colleagues is reported to have been stopped after 18% recruitment due to an excess of 6 deaths in the “no-drain” arm. The relative risk of death at 90 days in the “no-drain” group versus the “drain” group was 3.94 (95% confidence interval, 0.87–17.90), equivalent to a difference of 1.78 SD. The primary outcome measure was any grade 2 complication or more and had a relative risk of 1.32 (5% confidence interval, 1.00–1.75), or 1.95 SD.

The decision to terminate a trial early is not based on statistics alone. Judgements must be made using all the available evidence, including the biological and clinical plausibility of harm and the findings of previous studies. Statistical considerations should therefore be used as a starting point for decisions, rather than a definitive rule.

The Data Safety Monitoring Board for the study of Van Buren and colleagues clearly felt that there was no option other than to terminate the trial. However, at least on statistical grounds, this occurred very early in the trial using conservative criteria. The question remains therefore is the totality of evidence convincing that the question posed has been unequivocally answered? We would suggest that this is not the case. In general terms, stopping a clinical trial early is a rare event that sends out a message that, because of the “sensational” effect, may have greater impact on the medical community than intended, making future studies in that area challenging.

1. Van Buren G, Bloomston M, Hughes SJ, et al. A randomised prospective multicenter trial of pancreaticoduodenectomy with and without routine intraperitoneal drainage. Ann Surg. 2014;259: 605–612.

2. Moher D, Hopewell S, Schulz KF, et al. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trial. BMJ. 2010;340:c869.

3. Montori VM, Devereaux PJ, Adhikari NK, et al. Randomized trials stopped early for benefit: a systematic review. JAMA. 2005;294:2203–2209.

4. Geller NL, Pocock SJ. Interim analyses in randomized clinical trials: ramifications and guidelines for practitioners. Biometrics. 1987;43:213–223.

5. Pocock SJ. When to stop a clinical trial. BMJ. 1992;305:235–240.

6. Berry DA. Interim analyses in clinical trials: classical vs. Bayesian approaches. Stat Med. 1985;4:521– 526.

7. Connolly SJ, Pogue J, Hart RG, et al. Effect of clopidogrel added to aspirin in patients with atrial fibrillation. N Engl J Med. 2009;360:2066– 2078.

Introduction of Surgical Safety Checklists in Ontario, Canada – don’t blame the study size

The recent publication of the Ontario experience in the introduction of Surgical Safety Checklists has caused a bit of a stooshie.

Checklists have consistently been shown to be associated with a reduction in death and complications following surgery. Since the publication of Atul Gawande’s seminal paper in 2009, checklists have been successfully introduced in a number of countries including Scotland. David Urbach and Nancy Baxter’s New England Journal of Medicine publication stands apart: the checklist made no difference.

Atul Gawande himself responded quickly asking two important questions. Firstly, were there sufficient patients included in the study to show a difference? Secondly, was the implementation robust and was the programme in place for long enough to expect a difference be seen.

He and others have reported the power of the study to be low – about 40% – meaning that were the study to be repeated multiple times and a true difference in mortality actually did exist, the chance of detecting it would be 40%. But power calculations performed after the event (post hoc) are completely meaningless – when no effect is seen in a study, the power is low by definition (mathsy explanation here).

There is no protocol provided with the Ontario study, so it is not clear if an estimate of the required sample size had been performed. Were it done, it may have gone something like this.

The risk of death in the Ontario population is 0.71%. This could have been determined from the same administrative dataset that the study used. Say we expect a similar reduction in death following checklist introduction as Gawande showed in 2009, 1.5% to 0.8%. For the Ontario population, this would be equivalent to an expected risk of death of 0.38%. This may or may not be reasonable. It is not clear that the “checklist effect” is the same across patients or procedures of different risks. Accepting this assumption for now, the study would have only required around 8000 patients per group to show a significant difference. The study actually included over 100000 patients per group. In fact, it was powered to show very small differences in the risk of death – a reduction of around 0.1% would probably have been detected.

Sample size for Ontario study.

Similar conclusions can be drawn for complication rate. Gawande showed a reduction from 11% to 7%, equivalent in Ontario to a reduction from 3.86% to 2.46%. The Ontario study was likely to show a reduction to 3.59% (at 90% power).

The explanation for the failure to show a difference does not lie in the numbers.

So assuming then that checklists do work, this negative result stems either from a failure of implementation – checklists were not being used or not being used properly – or a difference in the effect of checklists in this population. The former seems most likely. The authors report that …

… available data did not permit us to determine whether a checklist was used in a particular procedure, and we were unable to measure compliance with checklists at monthly intervals in our analysis. However, reported compliance with checklists is extraordinarily high …

Quality improvement interventions need sufficient time for introduction. In this study, only a minimum of 3 months was allowed which seems crazily short. Teams need to want to do it. In my own hospital there was a lot of grumbling (including from me) before acceptance. When I worked in the Netherlands, SURPASS was introduced. In this particular hospital it was delivered via the electronic patient record. A succession of electronic “baton passes” meant that a patient could not get to the operating theatre without a comprehensive series of checklists being completed. I like this use of technology to deliver safety. With robust implementation, training, and acceptance by staff, maybe the benefits of checklists will also be seen in Ontario.

Mortality after paediatric heart surgery using public domain data

This post comes with some big health warnings.

The recent events in Leeds highlight the difficulties faced in judging the results of surgery by individual hospital. A clear requirement is timely access to data in a form easily digestible by the public.

Here I’ve scraped the publically available data from the central cardiac audit database (CCAD). All the data are available at the links provided and are as presented this afternoon (06/04/2013). Please read the caveats carefully.

Hospital-specific 30-day mortality data are available for certain paediatric heart surgery procedures for 2009-2012. These data are not complete for 2011-12 and there may be missing data for earlier years. There may be important data for procedures not included here that should be accounted for. There is no case-mix adjustment.

All data are included in spreadsheets below as well as the code to run the analysis yourself, to ensure no mistakes have been made. Hopefully these data will be quickly superseded with a quality-assured update.

Mortality by centre

The funnel plot below has been generated by taking all surgical procedures performed from pages such as this and expressing all deaths within 30 days as a proportion of the total procedures performed by hospital.

The red horizontal line is the mean mortality rate for these procedures – 2.3%. The green, blue and red curved lines are decreasingly stringent control limits within which unit results may vary by chance.

ccad_funnel_april_2013

Mortality by procedure

The mortality associated with different procedures can be explored with this google motion chart. Note when a procedure is uncommon (to the left of the chart) the great variation seen year to year. These bouncing balls trace out the limits of a funnel plot. They highlight why year to year differences in mortality rates for rare procedures must be interpreted with caution.

[cf]googleVis[/cf]

 

Data

ccad_public_data_april_2013.xls

ccad_public_data_april_2013_centre

ccad_public_data_april_2013_aggregate

ccad_public_data_april_2013_lookup

 

Script

####################################
# CCAD public domain data analysis #
# 6 April 2013                     #
# Ewen Harrison                    #
# www.datasurg.net                 #
#################################### 

# Read data
data<-read.table("ccad_public_data_april_2013_centre.csv", sep=",", header=TRUE)

# Correct variable-type
data$centre_code<-factor(data$centre_code)

# Read centre names table
centre<-read.table("ccad_public_data_april_2013_lookup.csv", sep=",", header=TRUE)

# Combine
data<-merge(data, centre, by="centre_code")

# Subset by only procedures termed "Surgery" and remove procedures with no data. 
surg<-subset(data, type=="Surgery" & !is.na(data$centre_code))

# Display data
surg
str(surg)

# install.packages("plyr") # remove "#" first time to install
library(plyr)

# Aggregate data by centre
agg.surg<-ddply(surg, .(centre_code), summarise, observed_mr=sum(death_30d)/sum(count), 
  sum_death=sum(death_30d), count=sum(count))

# Overall mortality rate for procedures lists in all centres
mean.mort<-sum(surg$death_30d)/sum(surg$count)
mean.mort #2.3%

# Generate binomial confidence limits

# install.packages("binom") # remove "#" first time to install
library(binom)
binom_n<-seq(90, 1100, length.out=40)
ci.90<-binom.confint(mean.mort*binom_n, binom_n, conf.level = 0.90, methods = "agresti-coull")
ci.95<-binom.confint(mean.mort*binom_n, binom_n, conf.level = 0.95, methods = "agresti-coull")
ci.99<-binom.confint(mean.mort*binom_n, binom_n, conf.level = 0.997, methods = "agresti-coull")

# Plot chart
# install.packages("ggplot2") # remove "#" first time to install
library(ggplot2)
ggplot()+
	geom_point(data=agg.surg, aes(count,observed_mr))+
	geom_line(aes(ci.90$n, ci.90$lower, colour = "90% CI"))+ #hack to get legend
	geom_line(aes(ci.90$n, ci.90$upper), colour = "green")+
	geom_line(aes(ci.95$n, ci.95$lower, colour = "95% CI"))+ 
	geom_line(aes(ci.95$n, ci.95$upper), colour = "blue")+
	geom_line(aes(ci.99$n, ci.99$lower, colour = "99.7% CI"))+ 
	geom_line(aes(ci.99$n, ci.99$upper), colour = "red")+
	geom_text(data=agg.surg, aes(count,observed_mr,label=centre_code), size=3,vjust=-1)+
	geom_line(aes(x=90:1100, y=mean.mort), colour="red")+
	ggtitle("Observed mortality rate following paediatric heart surgery\nby centre using CCAD public domain data 2009-2012 (incomplete)")+
	scale_x_continuous(name="Number cases performed per centre 2009-2012", limits=c(90,1100))+
	scale_y_continuous(name="Observed mortality rate")+
	scale_colour_manual("",
		breaks=c("90% CI", "95% CI", "99.7% CI"),
		values=c("green", "blue", "red"))+
	theme_bw()+
	theme(legend.position=c(.9, .9))

# Google motion chart
# Load national aggregate data by procedure
agg_data<-read.table("ccad_public_data_april_2013_aggregate.csv", sep=",", header=TRUE)

# check
str(agg_data)

# install.packages("googleVis") # remove "#" first time to install
library(googleVis)
Motion=gvisMotionChart(agg_data, idvar="procedure", timevar="year",
	options=list(height=500, width=600,
		state='{"showTrails":true,"yZoomedDataMax":1,"iconType":"BUBBLE","orderedByY":false,"playDuration":9705.555555555555,"xZoomedIn":false,"yLambda":1,"xAxisOption":"3","nonSelectedAlpha":0.4,"xZoomedDataMin":0,"iconKeySettings":[],"yAxisOption":"5","orderedByX":false,"yZoomedIn":false,"xLambda":1,"colorOption":"2","dimensions":{"iconDimensions":["dim0"]},"duration":{"timeUnit":"Y","multiplier":1},"xZoomedDataMax":833,"uniColorForNonSelected":false,"sizeOption":"3","time":"2000","yZoomedDataMin":0.33};'),
		chartid="Survival_by_procedure_following_congenital_cardiac_surgery_in_UK_2000_2010")
plot(Motion)

 

 

Statistical errors in published medical studies

I do a fair amount of peer-review for journals. My totally subjective impression – which I can’t back up with figures – is that fundamental errors in data analysis occur on a fairly frequent basis. Opaque descriptions of methods and no access to raw data often makes errors difficult to detect.

We’re performing a meta-analysis at the moment. This is a study in which two or more clinical trials of the same treatment are combined. This can be useful when there is uncertainty about the effectiveness of a treatment.

Relevent trials are rigorously searched for and the quality assessed. The results of good quality trials are then combined, usually with more weight being given to the more reliable trials. This weight reflects the number of patients in the trial and, for some measures, the variability in the results. This variation is important – trials with low variability are greatly influential in the final results of the meta-analysis.

What are we doing the meta-analysis on? We often operate to remove a piece of liver due to cancer. Sometimes we have to clamp the blood supply to the liver to prevent bleeding. An obvious consequence to this is damage to the liver tissue.

Multiple local liver resections. Patient provided consent for image publication.
Multiple local liver resections. Patient provided consent for image publication.

It may be possible to protect the liver (and any organ) from these damaging effects by temporarily clamping the blood supply for a short time, then releasing the clamp and allowing blood to flow back in. The clamp is then replaced and the liver resection performed. This is called “ischemic preconditioning” and may work by stimulating liver cells to protect themselves. “Batten down the hatches boys, there’s a storm coming!”

Results of this technique are controversial – when used in patients some studies show it works, some show no benefit. So should we be using it in our day-to-day practice?

We searched for studies examining ischemic preconditioning and found quite a few.

One in particularly performed by surgeons in Hungary seemed to show that the technique worked very well (1).The variability in this study was low as well, so it seemed reliable. Actually the variability was very low – lower than all the other trials we found.

 

Variation in biochemical outcome measures in studies of ischemic preconditioning.
Variation in biochemical outcome measures in studies of ischemic preconditioning.

The graph shows 3 of the measures used to determine success of the preconditioning. The first two are enzymes released from damaged liver cells and the third, bilirubin, is processed by the liver. All the studies show some lowering of these measures signifying potential improvement with the treatment. But most trials show a lot of variation between different patients (the vertical lines).

Except a Hungarian study, which shows almost no variation.

Even compared with a study in which these tests were repeated between healthy individuals in the US (9), the variation was low. That seemed strange. Surely the day-to-day variation in your or my liver tests should be lower than those of a group of patients undergoing surgery?

It looks like a mistake.

It may be that the authors wrote that they used one measure of variation when they actually used another (standard error of the mean vs. standard deviation). This could be a simple mistake, the details are here.

 

But we don’t know. We wrote three times, but they didn’t get back to us. We asked the journal and they are looking into it.


1 Hahn O, Blázovics A, Váli L, et al. The effect of ischemic preconditioning on redox status during liver resections-randomized controlled trial. Journal of Surgical Oncology 2011;104:647–53.
2 Clavien P-A, Selzner M, Rüdiger HA, et al. A Prospective Randomized Study in 100 Consecutive Patients Undergoing Major Liver Resection With Versus Without Ischemic Preconditioning. Annals of Surgery 2003;238:843–52.
3 Li S-Q, Liang L-J, Huang J-F, et al. Ischemic preconditioning protects liver from hepatectomy under hepatic inflow occlusion for hepatocellular carcinoma patients with cirrhosis. World J Gastroenterol 2004;10:2580–4.
4 Choukèr A, Martignoni A, Schauer R, et al. Beneficial effects of ischemic preconditioning in patients undergoing hepatectomy: the role of neutrophils. Arch Surg 2005;140:129–36.
5 Petrowsky H, McCormack L, Trujillo M, et al. A Prospective, Randomized, Controlled Trial Comparing Intermittent Portal Triad Clamping Versus Ischemic Preconditioning With Continuous Clamping for Major Liver Resection. Annals of Surgery 2006;244:921–30.
6 Heizmann O, Loehe F, Volk A, et al. Ischemic preconditioning improves postoperative outcome after liver resections: a randomized controlled study. European journal of medical research 2008;13:79.
7 Arkadopoulos N, Kostopanagiotou G, Theodoraki K, et al. Ischemic Preconditioning Confers Antiapoptotic Protection During Major Hepatectomies Performed Under Combined Inflow and Outflow Exclusion of the Liver. A Randomized Clinical Trial. World J Surg 2009;33:1909–15.
8 Scatton O, Zalinski S, Jegou D, et al. Randomized clinical trial of ischaemic preconditioning in major liver resection with intermittent Pringle manoeuvre. Br J Surg 2011;98:1236–43.
9 Lazo M, Selvin E, Clark JM. Brief communication: clinical implications of short-term variability in liver function test results. Ann Intern Med 2008;148:348–52.

Leeds paediatric heart surgery: managing outliers

Childrens’ heart surgery in Leeds has been suspended. Concerns about an excess in mortality have been raised and denied and I have written about seemingly large variations in mortality (“twice the national average”) being explained by chance.

In June 1998, the then Secretary of State for Health announced the establishment of an inquiry into the management of the care of children receiving complex cardiac surgery at Bristol Royal Infirmary between 1984 and 1995. The inquiry identified failures that contributed to the death children undergoing heart surgery and the 529-page report was a blueprint for wider reform of the NHS.

Funnel plots are useful for comparing the results of surgery between hospitals. The funnel plots below are from here and are for open cardiac surgery in children under one year in the UK 1991-1995. The Cardiac Surgery Registry (CSR) and Hospital episode statistics (HES) data were used to compare institutions. The horizontal dotted line is the national average and curved dotted line the limit of variation which might be expected by chance (95% confidence interval). The “O” is Bristol Royal Infirmary and “*” the eleven other UK centres. Bristol, as became apparent, was a clear outlier.

Funnel plots for open congenital heart surgery in chilldren under 1 year in UK 1991-1995
Funnel plots for open congenital heart surgery in chilldren under 1 year in UK 1991-1995

How should we deal with outliers?

The question is pertinent given the recent suspension of Leeds Royal Infirmary from performing children’s cardiac surgery. The UK Department of Health has produced guidelines in 2011 on the recommended process should a unit hit the dotted line, summarised below.

Stage 1 | 10 days

Hospitals with a performance indicator ‘alert’ or ‘alarm’ require scrutiny of the data handling and analyses performed to determine whether there is:

‘No case to answer’

  • potential outlier status not confirmed;
  • data and results revised in clinical audit records;
  • details formally recorded.

‘Case to answer’

  • potential outlier status;
  • proceed to stage 2.

Stage 2 | 5 days

The Lead Clinician in the hospital is informed about the potential outlier status and requested to identify any data errors or justifiable explanations. All relevant data and analyses should be made available to the Lead Clinician.

A copy of the request should also be sent to the Clinical Governance Lead of the hospital.

Stage 3 | 25 days

Lead Clinician to provide written response to national clinical audit team.

Stage 4 | 30 days

Review of Lead Clinician’s response to determine:
‘No case to answer’

  • It is confirmed that the data originally supplied by the provider contained inaccuracies. Reanalysis of accurate data no longer indicate outlier status;
  • Data and results should be revised in clinical audit records. Details of the hospital’s response and the review result recorded;
  • Lead Clinician notified in writing.

‘Case to answer’

  • It is confirmed that although the data originally supplied by the provider were inaccurate, analysis still indicates outlier status; or
  • It is confirmed that the originally supplied data were accurate, thus confirming the initial designation of outlier status;
  • proceed to stage 5.

Stage 5 | 5 days

Contact Lead Clinician by telephone, prior to written confirmation of potential outlier status; copied to clinical governance lead, medical director and chief executive. All relevant data and statistical analyses, including previous response from the lead clinician, made available to the medical director and chief executive.

Chief executive advised to inform relevant bodies about the concerns: primary care trusts, Strategic Health Authority, professional society/association, and Care Quality Commission. Informed that the audit body will proceed to publishing information of comparative performance that will identify providers.

Stage 6 | 10 days

Chief executive acknowledgement of receipt of the letter.

Stage 7

Public disclosure of comparative information that identifies providers (eg annual report of NCA).

The Situation in Leeds

It appears that in Leeds the process is at stage 2 – the local doctors have just been informed. The guidance suggests the identity of the statistical outliers should be anonymous at this stage. It may be that concerns were so great that special circumstances dictated the dramatic public announcement. We should find out in the next few weeks.