Elegant regression results tables and plots in R: the finalfit package

The finafit package brings together the day-to-day functions we use to generate final results tables and plots when modelling. I spent many years repeatedly manually copying results from R analyses and built these functions to automate our standard healthcare data workflow. It is particularly useful when undertaking a large study involving multiple different regression analyses. When combined with RMarkdown, the reporting becomes entirely automated. Its design follows Hadley Wickham’s tidy tool manifesto.

Installation and Documentation

The full documentation is now here: finalfit.org

The code lives on GitHub.

You can install finalfit from CRAN with:

install.packages("finalfit")

It is recommended that this package is used together with dplyr, which is a dependent.

Some of the functions require rstan and boot. These have been left as Suggests rather than Depends to avoid unnecessary installation. If needed, they can be installed in the normal way:

install.packages("rstan")
install.packages("boot")

To install off-line (or in a Safe Haven), download the zip file and use devtools::install_local().

Main Features

1. Summarise variables/factors by a categorical variable

summary_factorlist() is a wrapper used to aggregate any number of explanatory variables by a single variable of interest. This is often “Table 1” of a published study. When categorical, the variable of interest can have a maximum of five levels. It uses Hmisc::summary.formula().

library(finalfit)
library(dplyr)

# Load example dataset, modified version of survival::colon
data(colon_s)

# Table 1 - Patient demographics by variable of interest ----
explanatory = c("age", "age.factor", 
  "sex.factor", "obstruct.factor")
dependent = "perfor.factor" # Bowel perforation
colon_s %>%
  summary_factorlist(dependent, explanatory,
  p=TRUE, add_dependent_label=TRUE)

See other options relating to inclusion of missing data, mean vs. median for continuous variables, column vs. row proportions, include a total column etc.

summary_factorlist() is also commonly used to summarise any number of variables by an outcome variable (say dead yes/no).

# Table 2 - 5 yr mortality ----
explanatory = c("age.factor", 
  "sex.factor",
  "obstruct.factor")
dependent = 'mort_5yr'
colon_s %>%
  summary_factorlist(dependent, explanatory, 
  p=TRUE, add_dependent_label=TRUE)

Tables can be knitted to PDF, Word or html documents. We do this in RStudio from a .Rmd document. Example chunk:

```{r, echo = FALSE, results='asis'}
knitr::kable(example_table, row.names=FALSE, 
    align=c("l", "l", "r", "r", "r", "r"))
```

2. Summarise regression model results in final table format

The second main feature is the ability to create final tables for linear (lm()), logistic (glm()), hierarchical logistic (lme4::glmer()) and
Cox proportional hazards (survival::coxph()) regression models.

The finalfit() “all-in-one” function takes a single dependent variable with a vector of explanatory variable names (continuous or categorical variables) to produce a final table for publication including summary statistics, univariable and multivariable regression analyses. The first columns are those produced by summary_factorist(). The appropriate regression model is chosen on the basis of the dependent variable type and other arguments passed.

Logistic regression: glm()

Of the form: glm(depdendent ~ explanatory, family="binomial")

explanatory = c("age.factor", "sex.factor", 
  "obstruct.factor", "perfor.factor")
dependent = 'mort_5yr'
colon_s %>%
  finalfit(dependent, explanatory)

Logistic regression with reduced model: glm()

Where a multivariable model contains a subset of the variables included specified in the full univariable set, this can be specified.

explanatory = c("age.factor", "sex.factor", 
  "obstruct.factor", "perfor.factor")
explanatory_multi = c("age.factor", 
  "obstruct.factor")
dependent = 'mort_5yr'
colon_s %>%
  finalfit(dependent, explanatory, 
  explanatory_multi)

Mixed effects logistic regression: lme4::glmer()

Of the form: lme4::glmer(dependent ~ explanatory + (1 | random_effect), family="binomial")

Hierarchical/mixed effects/multilevel logistic regression models can be specified using the argument random_effect. At the moment it is just set up for random intercepts (i.e. (1 | random_effect), but in the future I’ll adjust this to accommodate random gradients if needed (i.e. (variable1 | variable2).

explanatory = c("age.factor", "sex.factor", 
  "obstruct.factor", "perfor.factor")
explanatory_multi = c("age.factor", "obstruct.factor")
random_effect = "hospital"
dependent = 'mort_5yr'
colon_s %>%
  finalfit(dependent, explanatory, 
  explanatory_multi, random_effect)

Cox proportional hazards: survival::coxph()

Of the form: survival::coxph(dependent ~ explanatory)

explanatory = c("age.factor", "sex.factor", 
"obstruct.factor", "perfor.factor")
dependent = "Surv(time, status)"
colon_s %>%
  finalfit(dependent, explanatory)

Add common model metrics to output

metrics=TRUE provides common model metrics. The output is a list of two dataframes. Note chunk specification for output below.

explanatory = c("age.factor", "sex.factor", 
  "obstruct.factor", "perfor.factor")
dependent = 'mort_5yr'
colon_s %>%
  finalfit(dependent, explanatory, 
  metrics=TRUE)
```{r, echo=FALSE, results="asis"}
knitr::kable(table7[[1]], row.names=FALSE, align=c("l", "l", "r", "r", "r"))
knitr::kable(table7[[2]], row.names=FALSE)
```

Rather than going all-in-one, any number of subset models can be manually added on to a summary_factorlist() table using finalfit_merge(). This is particularly useful when models take a long-time to run or are complicated.

Note the requirement for fit_id=TRUE in summary_factorlist(). fit2df extracts, condenses, and add metrics to supported models.

explanatory = c("age.factor", "sex.factor", 
  "obstruct.factor", "perfor.factor")
explanatory_multi = c("age.factor", "obstruct.factor")
random_effect = "hospital"
dependent = 'mort_5yr'

# Separate tables
colon_s %>%
  summary_factorlist(dependent, 
  explanatory, fit_id=TRUE) -> example.summary

colon_s %>%
  glmuni(dependent, explanatory) %>%
  fit2df(estimate_suffix=" (univariable)") -> example.univariable

colon_s %>%
  glmmulti(dependent, explanatory) %>%
  fit2df(estimate_suffix=" (multivariable)") -> example.multivariable

colon_s %>%
  glmmixed(dependent, explanatory, random_effect) %>%
  fit2df(estimate_suffix=" (multilevel)") -> example.multilevel

# Pipe together
example.summary %>%
  finalfit_merge(example.univariable) %>%
  finalfit_merge(example.multivariable) %>%
  finalfit_merge(example.multilevel) %>%
  select(-c(fit_id, index)) %>% # remove unnecessary columns
  dependent_label(colon_s, dependent, prefix="") # place dependent variable label

Bayesian logistic regression: with `stan`

Our own particular rstan models are supported and will be documented in the future. Broadly, if you are running (hierarchical) logistic regression models in Stan with coefficients specified as a vector labelled beta, then fit2df() will work directly on the stanfit object in a similar manner to if it was a glm or glmerMod object.

3. Summarise regression model results in plot

Models can be summarized with odds ratio/hazard ratio plots using or_plot, hr_plot and surv_plot.

OR plot

# OR plot
explanatory = c("age.factor", "sex.factor", 
  "obstruct.factor", "perfor.factor")
dependent = 'mort_5yr'
colon_s %>%
  or_plot(dependent, explanatory)
# Previously fitted models (`glmmulti()` or 
# `glmmixed()`) can be provided directly to `glmfit`

HR plot

# HR plot
explanatory = c("age.factor", "sex.factor", 
  "obstruct.factor", "perfor.factor")
dependent = "Surv(time, status)"
colon_s %>%
  hr_plot(dependent, explanatory, dependent_label = "Survival")
# Previously fitted models (`coxphmulti`) can be provided directly using `coxfit`

Kaplan-Meier survival plots

KM plots can be produced using the library(survminer)

# KM plot
explanatory = c("perfor.factor")
dependent = "Surv(time, status)"
colon_s %>%
  surv_plot(dependent, explanatory, 
  xlab="Time (days)", pval=TRUE, legend="none")

Notes

Use Hmisc::label() to assign labels to variables for tables and plots.

label(colon_s$age.factor) = "Age (years)"

Export dataframe tables directly or to R Markdown knitr::kable().

Note wrapper summary_missing() is also useful. Wraps mice::md.pattern.

colon_s %>%
  summary_missing(dependent, explanatory)

Development will be on-going, but any input appreciated.

Prediction is very difficult, especially about the future

As Niels Bohr, the Danish physicist, put it, “prediction is very difficult, especially about the future”. Prognostic models are commonplace and seek to help patients and the surgical team estimate the risk of a specific event, for instance, the recurrence of disease or a complication of surgery. “Decision-support tools” aim to help patients make difficult choices, with the most useful providing personalized estimates to assist in balancing the trade-offs between risks and benefits. As we enter the world of precision medicine, these tools will become central to all our practice.

In the meantime, there are limitations. Overwhelming evidence shows that the quality of reporting of prediction model studies is poor. In some instances, the details of the actual model are considered commercially sensitive and are not published, making the assessment of the risk of bias and potential usefulness of the model difficult.

In this edition of HPB, Beal and colleagues aim to validate the American College of Surgeons National Quality Improvement Program (ACS NSQIP) Surgical Risk Calculator (SRC) using data from 854 gallbladder cancer and extrahepatic cholangiocarcinoma patients from the US Extrahepatic Biliary Malignancy Consortium. The authors conclude that the “estimates of risk were variable in terms of accuracy and generally calculator performance was poor”. The SRC underpredicted the occurrence of all examined end-points (death, readmission, reoperation and surgical site infection) and discrimination and calibration were particularly poor for readmission and surgical site infection. This is not the first report of predictive failures of the SRC. Possible explanations cited previously include small sample size, homogeneity of patients, and too few institutions in the validation set. That does not seem to the case in the current study.

The SRC is a general-purpose risk calculator and while it may be applicable across many surgical domains, it should be used with caution in extrahepatic biliary cancer. It is not clear why the calculator does not provide measures of uncertainty around estimates. This would greatly help patients interpret its output and would go a long way to addressing some of the broader concerns around accuracy.

Radical but conservative liver surgery

Cutting-edge liver surgery is often associated with modern technology such as the robot. In this edition of HPB, Torzilli and colleagues provide a fascinating account of 12 years of “radical but conservative” open liver surgery.

This is extreme parenchymal-sparing hepatectomy (PSH) in 169 patients with colorectal liver metastases. In all cases, tumour was touching or infiltrating portal pedicles or hepatic veins, a situation where most surgeons would advocate a major hepatectomy where possible. The PSH by its nature results in a 0 mm resection margin when the vessel is preserved, which was the aim in many of these procedures. Although this is off-putting, the cut-edge recurrence rate was no higher than average.

PSH in the form of “easy atypicals” is performed by all HPB surgeons. There are two main differences here. First is the aim to detach tumours from intrahepatic vascular structures. For instance, hepatic veins in contact with tumour were preserved and only resected if infiltrated. Even then, they were tangentially incised if possible and reconstructed with a bovine pericardial patch. Second is the careful attention paid to identifying and using communicating hepatic veins. This is well described but used extensively here to allow complete resection of segments while avoiding congestion in the draining region.

Short-term mortality and morbidity rates are comparable with other published series. A median survival of 36 months and 5-year overall survival of around 30% is reasonable given some of these patients may not be offered surgery in certain centres. The authors describe the parenchymal sparing approach “failing” in 14 (10%) patients: 7 (5%) has recurrence at the cut edge and 8 (6%) within segments which would have been removed using a standard approach. 44% of the 55 patients with liver-only recurrence underwent re-resection.

This is not small surgery. The average operating time is 8.5 h with the longest taking 18.5 h. The 66% thoracotomy rate is also notable in an era of minimally invasive surgery and certainly differs from my own practice. This study is challenging and I look forward to the debates that should arise from it.

Preserving liver while removing all the cancer

“Radical-but-conservative” parenchymal-sparing hepatectomy (PSH) for colorectal liver metastases (Torzilli 2017) is increasing reported. The PSH revolution has two potential advantages: avoiding postoperative hepatic failure (POHF) and increasing the possibility of re-do surgery in the common event of future recurrence. However, early series reported worse long-term survival and higher positive margin rates with a parenchymal-sparing approach, with a debate ensuing about the significance of the latter in an era where energy-devices are more commonly employed in liver transection. No randomised controlled trials exist comparing PSH with major hepatectomy and case series are naturally biased by selection.

In this issues of HPB, Lordan and colleagues report a propensity-score matched case-control series of PSH vs. major hepatectomy. The results are striking. The PSH approach was associated with less blood transfusion (10.1 vs 27.7%), fewer major complications (3.8 vs 9.2%), and lower rates of POHF (0 vs 5.5%). Unusually, perioperative mortality (0.8 vs 3.8%) was also lower in the PSH group and longer-term oncologic and survival outcomes were similar.

Results of propensity-matched analyses must always be interpreted with selection bias in mind. Residual confounding always exists: the patients undergoing major hepatectomy almost certainly had undescribed differences from the PSH group and may not have been technically suitable for PSH. Matching did not account for year of surgery, so with PSH becoming more common the generally improved outcomes over time will bias in favour of the parenchymal-sparing approach. Yet putting those concerns aside, there are two salient results. Firstly, PSH promises less POHF and in this series, there was none. Secondly, PSH promises greater opportunity for redo liver surgery. There was 50% liver-only recurrence in both groups. Although not reported by the authors, a greater proportion of PSH patients underwent redo surgery (35/119 (29.4%) vs. 23/130 (17.7%) (p=0.03). Perhaps for some patients, the PSH revolution is delivering some of its promised advantages.

Realistic medicine

Realistic medicine is a useful concept describing healthcare that puts patients at the centre of decision making and treatment, with an aim to reduce harm, waste and unwarranted variation. One of the great challenges in medicine today is supporting patients with incurable disease in their treatment choices. Advising patients on interventions that offer reducing benefits in the face of increasing potential harms, when they may feel obliged to “take all treatments going”, requires honesty, candour and data. Realism is a better term than futility, but they are two sides of the same coin.

In HPB, Kim and colleagues examine survival after recurrence of bile duct cancer. The facts of this disease are always sobering: the median survival after diagnosis of recurrence is 7 months. The study is useful in that the authors have sufficient numbers to examine subgroups of those with recurrence to identify which patients may potentially benefit from salvage treatment (which was mostly chemotherapy). For those with poorly differentiated primary tumours, a short time to recurrence, poor performance status and elevated CA19-9, survival was only a handful of months.

This is a pragmatic non-randomised study with inherent selection bias, but the aim was not to determine the potential benefit of salvage treatment (we await the full publication of studies such as BILCAP). Also, the predictive ability of the model was not particularly high (c-statistic= 0.65). However, it does serve to illustrate the important point that for some very unfortunate patients with poor-prognosis recurrence, survival will be short and they may be better advised to focus on priorities other than chemotherapy. As Atul Gawande remarks in Being Mortal, “We’ve been wrong about what our job is in medicine. We think our job is to ensure health and survival. But really it is larger than that. It is to enable well-being.”

Effect of day of the week on mortality after emergency general surgery

Out latest paper published in the BJS describes short- and long-term outcomes after emergency surgery in Scotland. We looked for a weekend effect and didn’t find one.

  • In around 50,000 emergency general surgery patients, we didn’t find an association between day of surgery or day of admission and death rates;
  • In around 100,000 emergency surgery patients including orthopaedic and gynaecology procedures, we didn’t find an association between day of surgery or day of admission and death rates;
  • In around 500,000 emergency and planned surgery patients, we didn’t find an association between day of surgery or day of admission and death rates.

We also found that emergency surgery performed at weekends, or in those admitted at weekends, was performed a little quicker compared with weekdays.

More details can be found here:

Effect of day of the week on short- and long-term mortality after emergency general surgery
http://onlinelibrary.wiley.com/doi/10.1002/bjs.10507/full

bjs_dow-100

bjs_dow2-100

Press coverage

Broadcast: BBC GOOD MORNING SCOTLAND, HEART FM,

Print: DAILY TELEGRAPH, DAILY MIRROR, METRO, HERALD, HERALD (Leader), SCOTSMAN, THE NATIONAL, YORKSHIRE POST, GLASGOW EVENING TIMES

Online: BBC NEWS ONLINE, DAILY MAILEXPRESS.CO.UK, MIRROR.CO.UKHERALD SCOTLANDTHE COURIERWEBMD.BOOTS.COMNEWS-MEDICAL.NETNEW KERALA (India), BUSINESS STANDARDYAHOO NEWSABERDEEN EVENING EXPRESSBT.COMMEDICAL XPRESS.

Publishing mortality rates for individual surgeons

This is our new analysis of an old topic.In Scotland, individual surgeon outcomes were published as far back as 2006. It wasn’t pursued in Scotland, but has been mandated for surgeons in England since 2013.

This new analysis took the current mortality data and sought to answer a simple question: how useful is this information in detecting differences in outcome at the individual surgeon level?

Well the answer, in short, is not very useful.

We looked at mortality after planned bowel and gullet cancer surgery, hip replacement, and thyroid, obesity and aneurysm surgery. Death rates are relatively low after planned surgery which is testament to hard working NHS teams up and down the country. This together with the fact that individual surgeons perform a relatively small proportion of all these procedures means that death rates are not a good way to detect under performance.

At the mortality rates reported for thyroid (0.08%) and obesity (0.07%) surgery, it is unlikely a surgeon would perform a sufficient number of procedures in his/her entire career to stand a good chance of detecting a mortality rate 5 times the national average.

Surgeon death rates are problematic in more fundamental ways. It is the 21st century and much of surgical care is delivered by teams of surgeons, other doctors, nurses, physiotherapists, pharmacists, dieticians etc. In liver transplantation it is common for one surgeon to choose the donor/recipient pair, for a second surgeon to do the transplant, and for a third surgeon to look after the patient after the operation. Does it make sense to look at the results of individuals? Why not of the team?

It is also important to ensure that analyses adequately account for the increased risk faced by some patients undergoing surgery. If my granny has had a heart attack and has a bad chest, I don’t want her to be deprived of much needed surgery because a surgeon is worried that her high risk might impact on the public perception of their competence. As Harry Burns the former Chief Medical Officer of Scotland said, those with the highest mortality rates may be the heroes of the health service, taking on patients with difficult disease that no one else will face.

We are only now beginning to understand the results of surgery using measures that are more meaningful to patients. These sometimes get called patient-centred outcome measures. Take a planned hip replacement, the aim of the operation is to remove pain and increase mobility. If after 3 months a patient still has significant pain and can’t get out for the groceries, the operation has not been a success. Thankfully death after planned hip replacement is relatively rare and in any case, might have little to do with the quality of the surgery.

Transparency in the results of surgery is paramount and publishing death rates may be a step towards this, even if they may in fact be falsely reassuring. We must use these data as part of a much wider initiative to capture the success and failures of surgery. Only by doing this will we improve the results of surgery and ensure every patient receives the highest quality of care.

Read the full article for free here.

Press coverage

Radio: LBC, Radio Forth

Print:

  • New Scientist
  • Scotsman
  • Daily Mail
  • Express
  • the I

Online:

ONMEDICA, SHROPSHIRESTAR.COM, THE BOLTON NEWSEXPRESSANDSTAR.COMBELFAST TELEGRAPHAOL UKMEDICAL XPRESS, BT.COM, EXPRESS.CO.UK

Having a low blood count increase complications from liver surgery

A low blood count is common with cancer. There are now more studies showing that this can contribute to complications after surgery. Blood transfusion increases blood count but is best avoided in cancer unless the blood count is very low. This new study in the journal HPB shows the effect of anaemia after liver surgery. Here is the editorial highlight I wrote for the journal.

Preoperative anaemia is common and affects 30-60% of patients undergoing major elective surgery. In major non-cardiac surgery, anaemia is associated with increased morbidity and mortality, as well as higher blood transfusion rates.

The importance of preoperative anaemia in liver resection patients is becoming recognised. In this issue, Tohme and colleagues present an evaluation of the American College of Surgeons’ National Surgical Quality Improvement Program (ACS-NSQIP) database.

Of around 13000 patients who underwent elective liver resection from 2005 to 2012, one third were anaemic prior to surgery. After adjustment, anaemia was associated with major complications after surgery (OR 1.21, 1.09-1.33) but not death.

Patients who are anaemic have different characteristics to those who are not, characteristics that are likely to make them more susceptible to complications. While this analysis extensively adjusts for observed factors, residual confounding almost certainly exists.

The question remains, does anaemia itself contribute to the occurrence of complications, or is it just a symptom of greater troubles? The authors rightly highlight the importance of identifying anaemia prior to surgery, but it remains to be seen whether treatment is possible and whether it will result in better patient outcomes.

Perioperative transfusion is independently associated with major complications. Although there is no additive effect in anaemic patients, the benefits of treating anaemia may be offset by the detrimental effect of transfusion. For those with iron deficiency, treatment with intravenous iron may be of use and is currently being studied in an RCT of all major surgery (preventt.lshtm.ac.uk). Results of studies such as these will help determine causal relationships and whether intervention is possible and beneficial.

Keyhole or open surgery for bowel cancer metastases in the liver?

Laparoscopic (keyhole) approaches for liver resection are well described and in common use, but we do not yet have robust randomised controlled trial data comparing safety and effectiveness (a number of randomised trials are on-going). Observational studies have been published suggesting approaches are comparable, but as always with studies of this type, bias in treatment allocation limits conclusions.

In the February 2016 issue of HPB, Lewin and colleagues present a retrospective observational study comparing survival in laparoscopic versus open resection for colorectal liver metastases.

Selection bias always exists in non-randomised studies, but the authors have tried to reduce this with a propensity score based technique. These approaches aim to reduce the bias between treatment groups by accounting for differences in measured variables. Unmeasured variables are clearly not accounted for, while in an RCT these would be expected to be distributed equally between groups.

The authors use an “inverse probability of treatment weighting” method, which creates a synthetic sample in which treatment assignment is independent of measured baseline variables. It has the advantage of handling censored survival data better than alternatives and produces estimates of average treatment effects for the entire population, rather than just the treated group.

The actual 5 year overall survival of 59% is impressive and the analysis showed equivalent outcomes between open and laparoscopic groups. Residual confounding is likely to exist and as the authors point out, the open group had twice the positive margin rate (18% vs 8%) suggesting these procedures were technically more challenging. We look forward to seeing how these results compare to the outcomes of on-going RCTs.

Predicting liver failure and death after liver surgery

There have been many attempts to define predictive models for the identification of patients at risk of liver failure after surgery (posthepatectomy liver failure (PHLF)) and death. These have previously been hindered by the lack of a robust definition of PHLF and the two most commonly used definitions – the 50-50 and International Study Group of Liver Surgery (ISGLS) criteria – have now helped with this. These definitions are based on a measure of blood clotting (prothrombin time) and the serum bilirubin concentration, reflecting the synthetic and excretory/detoxifying functions of the liver. One criticism of these is that the criteria are taken on day 5 after surgery, a time-point some have argued is too late to intervene upon.

In a new analysis, Herbert and colleagues present an analysis of 1528 major liver resection patients and examine the changes in serum phosphate levels and creatinine immediate after surgery. It was previously shown a failure of phosphate levels to fall after surgery is associated with liver failure and death (Squires, HPB, 2014). Low serum phosphate after liver resection is well recognised and originally thought to be a consequence of consumption during liver growth (hypertrophy). However, while active take-up of phosphate into the liver after surgery does happen, this is insufficient to fully explain low phosphate levels. The authors point to studies demonstrating a significant increase in the urinary excretion of phosphate following hepatectomy which may also contribute.

Herbert provides a practical definition: creatinine on day 1 post surgery (PoD1) > day of surgery (DoS) and phosphate fails to decrease by 20% from DoS to PoD1. There is a strong association in multivariable analyses with death (Odds ratio 2.53, 1.36–4.71) and PHLF (3.89, 1.85–8.37).

The serum phosphate/creatinine definition identified 52% of those that died, but also 25% that survived without evidence of PHLF. It may be that this can be improved by incorporating other parameters, or my identifying a high risk group a priori. Given the lack of specific therapies beyond that of high quality intensive care, whether death can actually be averted is separate question.

An alternative presentation of the ProPublica Surgeon Scorecard

ProPublica, an independent investigative journalism organisation, have published surgeon-level complications rates based on Medicare data. I have already highlighted problems with the reporting of the data: surgeons are described as having a “high adjusted rate of complications” if they fall in the red-zone, despite there being too little data to say whether this has happened by chance.

4
This surgeon should not be identified as having a “high adjusted rate of complications” as there are too few cases to estimate the complication rate accurately.

I say again, I fully support transparency and public access to healthcare. But the ProPublica reporting has been quite shocking. I’m not aware of them publishing the number of surgeons out of the 17000 that are statistically different to the average. This is a small handful.

ProPublica could have chosen a different approach. This is a funnel plot and I’ve written about them before.

A funnel plot is a summary of an estimate (such as complication rate) against a measure of the precision of that estimate. In the context of healthcare, a centre or individual outcome is often plotted against patient volume. A horizontal line parallel to the x-axis represents the outcome for the entire population and outcomes for individual surgeons are displayed as points around this. This allows a comparison of individuals with that of the population average, while accounting for the increasing certainty surrounding that outcome as the sample size increases. Limits can be determined, beyond which the chances of getting an individual outcome are low if that individual were really part of the whole population.

In other words, a surgeon above the line has a complication rate different to the average.

I’ve scraped the ProPublica data for gallbladder removal (laparoscopic cholecystectomy) from California, New York and Texas for surgeons highlighted in the red-zone. These are surgeons ProPublica says have high complication rates.

As can be seen from the funnel plot, these surgeons are no where near being outliers. There is insufficient information to say whether any of them are different to average. ProPublica decided to ignore the imprecision with which the complication rates are determined. For red-zone surgeons from these 3 states, none of them have complication rates different to average.

ProPublica_lap_chole_funnel
Black line, population average (4.4%), blue line 95% control limit, red line 99% control limit.

How likely is it that a surgeon with an average complication rate (4.4%) will appear in the red-zone just by chance (>5.2%)? The answer is, pretty likely given the small numbers of cases here: anything up to a 25% chance depending on the number of cases performed. Even at the top of the green-zone (low ACR, 3.9%), there is still around a 1 in 6 chance a surgeon will appear to have a high complication rate just by chance.

chance_of_being_in_redzoneProPublica have failed in their duty to explain these data in a way that can be understood. The surgeon score card should be revised. All “warning explanation points” should be removed for those other than the truly outlying cases.

Data

Download

Git

Link to repository.

Code

# ProPublica Surgeon Scorecard 
# https://projects.propublica.org/surgeons

# Laparoscopic cholecystectomy (gallbladder removal) data
# Surgeons with "high adjusted rate of complications"
# CA, NY, TX only

# Libraries needed ----
library(ggplot2)
library(binom)

# Upload dataframe ----
dat = read.csv("http://www.datasurg.net/wp-content/uploads/2015/07/ProPublica_CA_NY_TX.csv")

# Total number reported
dim(dat)[1] # 59

# Remove duplicate surgeons who operate in more than one hospital
duplicates = which(
    duplicated(dat$Surgeon)
)

dat_unique = dat[-duplicates,]
dim(dat_unique) # 27

# Funnel plot for gallbladder removal adjusted complication rate -------------------------
# Set up blank funnel plot ----
# Set control limits
pop.rate = 0.044 # Mean population ACR, 4.4%
binom_n = seq(5, 100, length.out=40)
ci.90 = binom.confint(pop.rate*binom_n, binom_n, conf.level = 0.90, methods = "wilson")
ci.95 = binom.confint(pop.rate*binom_n, binom_n, conf.level = 0.95, methods = "wilson")
ci.99 = binom.confint(pop.rate*binom_n, binom_n, conf.level = 0.99, methods = "wilson")

theme_set(theme_bw(24))
g1 = ggplot()+
    geom_line(data=ci.95, aes(ci.95$n, ci.95$lower*100), colour = "blue")+ 
    geom_line(data=ci.95, aes(ci.95$n, ci.95$upper*100), colour = "blue")+
    geom_line(data=ci.99, aes(ci.99$n, ci.99$lower*100), colour = "red")+ 
    geom_line(data=ci.99, aes(ci.99$n, ci.99$upper*100), colour = "red")+
    geom_line(aes(x=ci.90$n, y=pop.rate*100), colour="black", size=1)+
    xlab("Case volume")+
    ylab("Adjusted complication rate (%)")+
    scale_colour_brewer("", type = "qual", palette = 6)+
    theme(legend.justification=c(1,1), legend.position=c(1,1))
g1

g1 + 
    geom_point(data=dat_unique, aes(x=Volume, y=ACR), colour="black", alpha=0.6, size = 6, 
                         show_guide=TRUE)+
    geom_point(data=dat_unique, aes(x=Volume, y=ACR, colour=State), alpha=0.6, size=4) +
    ggtitle("Funnel plot of adjusted complication rate in CA, NY, TX")


# Probability of being shown as having high complication rate ----
# At 4.4%, what are the changes of being 5.2% by chance?
n <- seq(15, 150, 1)
average = 1-pbinom(ceiling(n*0.052), n, 0.044)
low = 1-pbinom(ceiling(n*0.052), n, 0.039)

dat_prob = data.frame(n, average, low)

ggplot(melt(dat_prob, id="n"))+
    geom_point(aes(x=n, y=value*100, colour=variable), size=4)+
    scale_x_continuous("Case volume", breaks=seq(10, 150, 10))+
    ylab("Adjusted complication rate (%)")+
    scale_colour_brewer("True complication rate", type="qual", palette = 2, labels=c("Average (4.4%)", "Low (3.9%)"))+
    ggtitle("ProPublica chance of being in high complication rate zone by\nchance when true complication rate \"average\" or \"low\"")+
    theme(legend.position=c(1,0), legend.justification=c(1,0))

The problem with ProPublica’s surgeon scorecards

ProPublica is an organisation performing independent, non-profit investigative journalism in the public interest. Yesterday it published an analysis of surgeon-level complications rates based on Medicare data.

Publication of individual surgeons results is well established in the UK. Transparent, easily accessible healthcare data is essential and initiatives like this are welcomed.

It is important that data are presented in a way that can be clearly understood. Communicating risk is notoriously difficult. This is particularly difficult when it is necessary to describe the precision with which a risk has been estimated.

Unfortunately that is where ProPublica have got it all wrong.

There is an inherent difficulty faced when we dealing with individual surgeon data. In order to be sure that a surgeon has a complication rate higher than average, that surgeon needs to have performed a certain number of that particular procedure. If data are only available on a small number of cases, we can’t be certain whether the surgeon’s complication rate is truly high, or just appears to be high by chance.

If you tossed a coin 10 times and it came up with 7 heads, could you say whether the coin was fair or biased? With only 10 tosses we don’t know.

Similarly, if a surgeon performs 10 operations and has 1 complication, can we sure that their true complication rate is 10%, rather than 5% or 20%? With only 10 operations we don’t know.

The presentation of the ProPublica data is really concerning. Here’s why.

For a given hospital, data are presented for individual surgeons. Bands are provided which define “low”, “medium” and “high” adjusted complication rates. If the adjusted complication rate for an individual surgeon falls within the red-zone, they are described as having a “high adjusted rate of complications”.

1How confident can we be that a surgeon in the red-zone truly has a high complication rate? To get a handle on this, we need to turn to an off-putting statistical concept called a “confidence interval”. As it’s name implies, a confidence interval tells us what degree of confidence we can treat the estimated complication rate.

2If the surgeon has done many procedures, the confidence interval will be narrow. If we only have data on a few procedures, the confidence interval will be wide.

To be confident that a surgeon has a high complication rate, the 95% confidence interval needs to entirely lie in the red-zone.

A surgeon should be highlighted as having a high complication rate if and only if the confidence interval lies entirely in the red-zone.

Here is an example. This surgeon performs the procedure to remove the gallbladder (cholecystectomy). There are data on 20 procedures for this individual surgeon. The estimated complication rate is 4.7%. But the 95% confidence interval goes from the green-zone all the way to the red-zone. Due to the small number of procedures, all we can conclude is that this surgeon has either a low, medium, or high adjusted complication rate. Not very useful.

8Here are some other examples.

Adjusted complication rate: 1.5% on 339 procedures. Surgeon has low or medium complication rate. They are unlikely to have a high complication rate.

5Adjusted complication rate: 4.0% on 30 procedures. Surgeon has low or medium or high complication rate. Note due to the low numbers of cases, the analysis correctly suggests an estimated complication rate, despite the fact this surgeon has not had any complications for the 30 procedures.
3Adjusted complication rate: 5.4% on 21 procedures. ProPublica conclusion: surgeon has high adjusted complication rate. Actual conclusion: surgeon has low, medium or high complication rate.
4Adjusted complication rate: 6.6% on 22 procedures. ProPublica conclusion: surgeon has high adjusted complication rate. Actual conclusion: surgeon has medium or high complication rate, but is unlikely to have a low complication rate.
6Adjusted complication rate: 7.6% on 86 procedures. ProPublica conclusion: surgeon has high adjusted complication rate. Actual conclusion: surgeon has high complication rate. This is one of the few examples in the dataset, where the analysis suggest this surgeon does have a high likelihood of having a high complication rate.

7In the UK, only this last example would to highlighted as concerning. That is because we have no idea whether surgeons who happen to fall into the red-zone are truly different to average.

The analysis above does not deal with issues others have highlighted: that this is Medicare data only, that important data may be missing , that the adjustment for patient case mix may be inadequate, and that the complications rates seem different to what would be expected.

ProPublica have not moderated the language used in reporting these data. My view is that the data are being misrepresented.

ProPublica should highlight cases like the last mentioned above. For all the others, all that can be concluded is that there are too few cases to be able to make a judgement on whether the surgeon’s complication rate is different to average.